diff options
Diffstat (limited to 'network/youtube-dl/youtube-dl.1')
-rw-r--r-- | network/youtube-dl/youtube-dl.1 | 274 |
1 files changed, 244 insertions, 30 deletions
diff --git a/network/youtube-dl/youtube-dl.1 b/network/youtube-dl/youtube-dl.1 index 321792e8840d..087f4164dc29 100644 --- a/network/youtube-dl/youtube-dl.1 +++ b/network/youtube-dl/youtube-dl.1 @@ -133,8 +133,8 @@ Make all connections via IPv6 (experimental) .RS .RE .TP -.B \-\-cn\-verification\-proxy \f[I]URL\f[] -Use this proxy to verify the IP address for some Chinese sites. +.B \-\-geo\-verification\-proxy \f[I]URL\f[] +Use this proxy to verify the IP address for some geo\-restricted sites. The default proxy specified by \-\-proxy (or none, if the options is not present) is used for the actual downloading. (experimental) @@ -859,6 +859,8 @@ On Linux and OS X, the system wide configuration file is located at On Windows, the user wide configuration file locations are \f[C]%APPDATA%\\youtube\-dl\\config.txt\f[] or \f[C]C:\\Users\\<user\ name>\\youtube\-dl.conf\f[]. +Note that by default configuration file may not exist so you may need to +create it yourself. .PP For example, with the following configuration file youtube\-dl will always extract the audio, not copy the mtime, use a proxy and save all @@ -1757,11 +1759,26 @@ legally, you can follow this quick list (assuming your service is called .IP " 1." 4 Fork this repository (https://github.com/rg3/youtube-dl/fork) .IP " 2." 4 -Check out the source code with -\f[C]git\ clone\ git\@github.com:YOUR_GITHUB_USERNAME/youtube\-dl.git\f[] +Check out the source code with: +.RS 4 +.IP +.nf +\f[C] +git\ clone\ git\@github.com:YOUR_GITHUB_USERNAME/youtube\-dl.git +\f[] +.fi +.RE .IP " 3." 4 Start a new git branch with -\f[C]cd\ youtube\-dl;\ git\ checkout\ \-b\ yourextractor\f[] +.RS 4 +.IP +.nf +\f[C] +cd\ youtube\-dl +git\ checkout\ \-b\ yourextractor +\f[] +.fi +.RE .IP " 4." 4 Start with this simple template and save it to \f[C]youtube_dl/extractor/yourextractor.py\f[]: @@ -1831,32 +1848,12 @@ extractor should and may return (https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want. .IP " 8." 4 -Keep in mind that the only mandatory fields in info dict for successful -extraction process are \f[C]id\f[], \f[C]title\f[] and either -\f[C]url\f[] or \f[C]formats\f[], i.e. -these are the critical data the extraction does not make any sense -without. -This means that any -field (https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L148-L252) -apart from aforementioned mandatory ones should be treated \f[B]as -optional\f[] and extraction should be \f[B]tolerate\f[] to situations -when sources for these fields can potentially be unavailable (even if -they always available at the moment) and \f[B]future\-proof\f[] in order -not to break the extraction of general purpose mandatory fields. -For example, if you have some intermediate dict \f[C]meta\f[] that is a -source of metadata and it has a key \f[C]summary\f[] that you want to -extract and put into resulting info dict as \f[C]description\f[], you -should be ready that this key may be missing from the \f[C]meta\f[] -dict, i.e. -you should extract it as \f[C]meta.get(\[aq]summary\[aq])\f[] and not -\f[C]meta[\[aq]summary\[aq]]\f[]. -Similarly, you should pass \f[C]fatal=False\f[] when extracting data -from a webpage with \f[C]_search_regex/_html_search_regex\f[]. -.IP " 9." 4 -Check the code with flake8 (https://pypi.python.org/pypi/flake8). +Make sure your code follows youtube\-dl coding +conventions (#youtube-dl-coding-conventions) and check the code with +flake8 (https://pypi.python.org/pypi/flake8). Also make sure your code works under all Python (http://www.python.org/) versions claimed supported by youtube\-dl, namely 2.6, 2.7, and 3.2+. -.IP "10." 4 +.IP " 9." 4 When the tests pass, add (http://git-scm.com/docs/git-add) the new files and commit (http://git-scm.com/docs/git-commit) them and push (http://git-scm.com/docs/git-push) the result, like this: @@ -1871,12 +1868,229 @@ $\ git\ push\ origin\ yourextractor \f[] .fi .RE -.IP "11." 4 +.IP "10." 4 Finally, create a pull request (https://help.github.com/articles/creating-a-pull-request). We\[aq]ll then review and merge it. .PP In any case, thank you very much for your contributions! +.SS youtube\-dl coding conventions +.PP +This section introduces a guide lines for writing idiomatic, robust and +future\-proof extractor code. +.PP +Extractors are very fragile by nature since they depend on the layout of +the source data provided by 3rd party media hoster out of your control +and this layout tend to change. +As an extractor implementer your task is not only to write code that +will extract media links and metadata correctly but also to minimize +code dependency on source\[aq]s layout changes and even to make the code +foresee potential future changes and be ready for that. +This is important because it will allow extractor not to break on minor +layout changes thus keeping old youtube\-dl versions working. +Even though this breakage issue is easily fixed by emitting a new +version of youtube\-dl with fix incorporated all the previous version +become broken in all repositories and distros\[aq] packages that may not +be so prompt in fetching the update from us. +Needless to say some may never receive an update at all that is possible +for non rolling release distros. +.SS Mandatory and optional metafields +.PP +For extraction to work youtube\-dl relies on metadata your extractor +extracts and provides to youtube\-dl expressed by information +dictionary (https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) +or simply \f[I]info dict\f[]. +Only the following meta fields in \f[I]info dict\f[] are considered +mandatory for successful extraction process by youtube\-dl: +.IP \[bu] 2 +\f[C]id\f[] (media identifier) +.IP \[bu] 2 +\f[C]title\f[] (media title) +.IP \[bu] 2 +\f[C]url\f[] (media download URL) or \f[C]formats\f[] +.PP +In fact only the last option is technically mandatory (i.e. +if you can\[aq]t figure out the download location of the media the +extraction does not make any sense). +But by convention youtube\-dl also treats \f[C]id\f[] and \f[C]title\f[] +to be mandatory. +Thus aforementioned metafields are the critical data the extraction does +not make any sense without and if any of them fail to be extracted then +extractor is considered completely broken. +.PP +Any +field (https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) +apart from the aforementioned ones are considered \f[B]optional\f[]. +That means that extraction should be \f[B]tolerate\f[] to situations +when sources for these fields can potentially be unavailable (even if +they are always available at the moment) and \f[B]future\-proof\f[] in +order not to break the extraction of general purpose mandatory fields. +.SS Example +.PP +Say you have some source dictionary \f[C]meta\f[] that you\[aq]ve +fetched as JSON with HTTP request and it has a key \f[C]summary\f[]: +.IP +.nf +\f[C] +meta\ =\ self._download_json(url,\ video_id) +\f[] +.fi +.PP +Assume at this point \f[C]meta\f[]\[aq]s layout is: +.IP +.nf +\f[C] +{ +\ \ \ \ ... +\ \ \ \ "summary":\ "some\ fancy\ summary\ text", +\ \ \ \ ... +} +\f[] +.fi +.PP +Assume you want to extract \f[C]summary\f[] and put into resulting info +dict as \f[C]description\f[]. +Since \f[C]description\f[] is optional metafield you should be ready +that this key may be missing from the \f[C]meta\f[] dict, so that you +should extract it like: +.IP +.nf +\f[C] +description\ =\ meta.get(\[aq]summary\[aq])\ \ #\ correct +\f[] +.fi +.PP +and not like: +.IP +.nf +\f[C] +description\ =\ meta[\[aq]summary\[aq]]\ \ #\ incorrect +\f[] +.fi +.PP +The latter will break extraction process with \f[C]KeyError\f[] if +\f[C]summary\f[] disappears from \f[C]meta\f[] at some time later but +with former approach extraction will just go ahead with +\f[C]description\f[] set to \f[C]None\f[] that is perfectly fine +(remember \f[C]None\f[] is equivalent for absence of data). +.PP +Similarly, you should pass \f[C]fatal=False\f[] when extracting optional +data from a webpage with \f[C]_search_regex\f[], +\f[C]_html_search_regex\f[] or similar methods, for instance: +.IP +.nf +\f[C] +description\ =\ self._search_regex( +\ \ \ \ r\[aq]<span[^>]+id="title"[^>]*>([^<]+)<\[aq], +\ \ \ \ webpage,\ \[aq]description\[aq],\ fatal=False) +\f[] +.fi +.PP +With \f[C]fatal\f[] set to \f[C]False\f[] if \f[C]_search_regex\f[] +fails to extract \f[C]description\f[] it will emit a warning and +continue extraction. +.PP +You can also pass \f[C]default=<some\ fallback\ value>\f[], for example: +.IP +.nf +\f[C] +description\ =\ self._search_regex( +\ \ \ \ r\[aq]<span[^>]+id="title"[^>]*>([^<]+)<\[aq], +\ \ \ \ webpage,\ \[aq]description\[aq],\ default=None) +\f[] +.fi +.PP +On failure this code will silently continue the extraction with +\f[C]description\f[] set to \f[C]None\f[]. +That is useful for metafields that are known to may or may not be +present. +.SS Provide fallbacks +.PP +When extracting metadata try to provide several scenarios for that. +For example if \f[C]title\f[] is present in several places/sources try +extracting from at least some of them. +This would make it more future\-proof in case some of the sources became +unavailable. +.SS Example +.PP +Say \f[C]meta\f[] from previous example has a \f[C]title\f[] and you are +about to extract it. +Since \f[C]title\f[] is mandatory meta field you should end up with +something like: +.IP +.nf +\f[C] +title\ =\ meta[\[aq]title\[aq]] +\f[] +.fi +.PP +If \f[C]title\f[] disappeares from \f[C]meta\f[] in future due to some +changes on hoster\[aq]s side the extraction would fail since +\f[C]title\f[] is mandatory. +That\[aq]s expected. +.PP +Assume that you have some another source you can extract \f[C]title\f[] +from, for example \f[C]og:title\f[] HTML meta of a \f[C]webpage\f[]. +In this case you can provide a fallback scenario: +.IP +.nf +\f[C] +title\ =\ meta.get(\[aq]title\[aq])\ or\ self._og_search_title(webpage) +\f[] +.fi +.PP +This code will try to extract from \f[C]meta\f[] first and if it fails +it will try extracting \f[C]og:title\f[] from a \f[C]webpage\f[]. +.SS Make regular expressions flexible +.PP +When using regular expressions try to write them fuzzy and flexible. +.SS Example +.PP +Say you need to extract \f[C]title\f[] from the following HTML code: +.IP +.nf +\f[C] +<span\ style="position:\ absolute;\ left:\ 910px;\ width:\ 90px;\ float:\ right;\ z\-index:\ 9999;"\ class="title">some\ fancy\ title</span> +\f[] +.fi +.PP +The code for that task should look similar to: +.IP +.nf +\f[C] +title\ =\ self._search_regex( +\ \ \ \ r\[aq]<span[^>]+class="title"[^>]*>([^<]+)\[aq],\ webpage,\ \[aq]title\[aq]) +\f[] +.fi +.PP +Or even better: +.IP +.nf +\f[C] +title\ =\ self._search_regex( +\ \ \ \ r\[aq]<span[^>]+class=(["\\\[aq]])title\\1[^>]*>(?P<title>[^<]+)\[aq], +\ \ \ \ webpage,\ \[aq]title\[aq],\ group=\[aq]title\[aq]) +\f[] +.fi +.PP +Note how you tolerate potential changes in \f[C]style\f[] +attribute\[aq]s value or switch from using double quotes to single for +\f[C]class\f[] attribute: +.PP +The code definitely should not look like: +.IP +.nf +\f[C] +title\ =\ self._search_regex( +\ \ \ \ r\[aq]<span\ style="position:\ absolute;\ left:\ 910px;\ width:\ 90px;\ float:\ right;\ z\-index:\ 9999;"\ class="title">(.*?)</span>\[aq], +\ \ \ \ webpage,\ \[aq]title\[aq],\ group=\[aq]title\[aq]) +\f[] +.fi +.SS Use safe conversion functions +.PP +Wrap all extracted numeric data into safe functions from \f[C]utils\f[]: +\f[C]int_or_none\f[], \f[C]float_or_none\f[]. +Use them for string to number conversions as well. .SH EMBEDDING YOUTUBE\-DL .PP youtube\-dl makes the best effort to be a good command\-line program, |