Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
* Do not accept '>' between the property and content attributes.
* Recognize the properties if the content attribute is before the property attribute using two regexes (fixes the extraction of the description for SlideshareIE).
|
|
|
|
On tvp.pl some webpages contain OpenGraph
metadata and some don't.
If og property is not found, _og_search_description
fails with
WARNING: unable to extract OpenGraph description; please report this issue on http://yt-dl.org/bug
Traceback (most recent call last):
File "/usr/home/saper/bin/youtube-dl", line 18, in <module>
youtube_dl.main()
File "/usr/home/saper/sw/youtube-dl/youtube_dl/__init__.py", line 766, in main
_real_main(argv)
File "/usr/home/saper/sw/youtube-dl/youtube_dl/__init__.py", line 719, in _real_main
retcode = ydl.download(all_urls)
File "/usr/home/saper/sw/youtube-dl/youtube_dl/YoutubeDL.py", line 715, in download
videos = self.extract_info(url)
File "/usr/home/saper/sw/youtube-dl/youtube_dl/YoutubeDL.py", line 348, in extract_info
ie_result = ie.extract(url)
File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 125, in extract
return self._real_extract(url)
File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/tvp.py", line 56, in _real_extract
info['description'] = self._og_search_description(webpage)
File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 331, in _og_search_description
return self._og_search_property('description', html, fatal=False, **kargs)
File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 325, in _og_search_property
return unescapeHTML(escaped)
File "/usr/home/saper/sw/youtube-dl/youtube_dl/utils.py", line 494, in unescapeHTML
assert type(s) == type(u'')
AssertionError
The patch allows me to use:
try:
info['description'] = self._og_search_description(webpage)
info['thumbnail'] = self._og_search_thumbnail(webpage)
except RegexNotFoundError:
pass
|
|
The url for the video page, it must allow to reproduce the result.
It's automatically set by YoutubeDL if it's missing.
|
|
|
|
|
|
|
|
instagram.com domain
|
|
The old process can still be used for those videos.
Added RegexNotFoundError, which is raised by _search_regex if it can't extract the info.
|
|
Following the YoutubeIE format. The 'format_note' gives additional info about the format, for example '3D' or 'DASH video'.
|
|
|
|
With these changes, users can now restrict what videos are downloaded by the intented audience, by specifying their age with --age-limit YEARS .
Add rudimentary support in youtube, pornotube, and youporn.
|
|
|
|
|
|
Like in "<meta charset='utf-8'/>"
|
|
|
|
|
|
|
|
Subtitles rework
|
|
This speeds up TestAllURLsMatching.test_no_duplicates by about 8000% at the cost of minimal memory overhead.
|
|
The errors while getting the subtitles are reported as warnings, if no subtitles are found return and empty dict.
|
|
These are attribute values, so we don't need the more complex and whitespace-destroying cleanHTML - we just need to unescape quotes, that's it.
|
|
|
|
|
|
|
|
|
|
Calling it was more complex then actually including the type in the video info
|
|
When a IE is added to the list, it's also added to a dictionary. When a IE is requested it first looks in the dictionary and if there's no instance it will create a new one.
That way _real_initialize is only called once for each IE, saving time if it needs to login for example.
|
|
|
|
|
|
InfoExtractor to get the login info
|
|
|
|
|
|
|
|
|
|
|