aboutsummaryrefslogtreecommitdiff
path: root/youtube_dl/extractor/common.py
AgeCommit message (Collapse)Author
11 days[Misc] Correct [_]IE_DESC/NAME in a few IEsdirkf
* thx seproDev, yt-dlp/yt-dlp/pull/12694/commits/ae69e3c * also add documenting comment in `InfoExtractor`
2024-12-16[InfoExtractor] Use kwarg maxsplit for re.splitdirkf
* May become kw-only in future Pythons
2024-06-20[jsinterp] Add Debugger from yt-dlpdirkf
* https://github.com/yt-dlp/yt-dlp/commit/8f53dc4 * thx pukkandan
2024-05-30[InfoExtractor] Misc yt-dlp back-ports, etcdirkf
* add _yes_playlist() method * avoid crash using _NETRC_MACHINE * use _search_json() in _search_nextjs_data() * _search_nextjs_data() default is JSON, not text * test for above
2024-03-08[InfoExtractor] Rework and improve JWPlayer extractiondirkf
* use traverse_obj() and _search_json() * support playlist `.load({**video1},{**video2}, ...)` * support transform_source=... for _extract_jwplayer_data()
2024-03-08[InfoExtractor] Add `_search_json()`dirkf
* uses the error diagnostic to truncate the JSON string * may be confused by non-C-Pythons
2024-02-02[InfoExtractor] Correctly resolve BaseURL in DASH manifestdirkf
Specs: * ISO/IEC 23009-1:2012 section 5.6 * RFC 3986 section 5.
2024-02-02[InfoExtractor] Support byte range for DASHdirkf
* adapted from https://github.com/ytdl-org/youtube-dl/pull/30279 * thx former GH user kikuyan
2024-02-02[InfoExtractor] Support DASH subtitle extraction (yt-dlp back-port)dirkf
2024-01-22[InfoExtractor] Support some warning and `._downloader` shortcut methods ↵dirkf
from yt-dlp
2023-07-25[compat] Use `compat_open()`dirkf
2023-07-19[InfoExtractor] Add `_match_valid_url()` class method and refactordirkf
* API compatible with yt-dlp * also support Sequence of patterns in _VALID_URL * one place to compile _VALID_URL * TODO: remove existing extractor shims
2023-07-19[InfoExtractor] Add search methods for Next/Nuxt.js from yt-dlpdirkf
* add _search_nextjs_data(), from https://github.com/yt-dlp/yt-dlp/pull/1386 thanks selfisekai * add _search_nuxt_data(), from https://github.com/yt-dlp/yt-dlp/pull/1921, thanks Lesmiscore, pukkandan * add tests for the above * also fix HTML5 type recognition and tests, from https://github.com/yt-dlp/yt-dlp/commit/222a230871fe4fe63f35c49590379c9a77116819, thanks Lesmiscore * update extractors in PR using above, fix tests.
2023-07-19[InfoExtractor] Support groups in _`search_regex()`, etcdirkf
2023-02-14[InfoExtractor] Handle unquoted values in OpenGraph searchesdirkf
2022-11-11[common:jwplayer] Improve jwplayer extraction and parsing (#31000)dirkf
* don't crash parser if jwplayer_data is invalid (empty, or no formats) * use `label` in `sources[n]` as `format_id` * relax `jwplayer().setup(...)` RE (also rework PR #27274 enhancement) * detect more manifest formats in _parse_jwplayer_formats() (from PR #29596) * improve metadata extraction (from PR #25433) * remember URLs in a set * use parse_resolution() in format * extract filesize in format (from yt-dlp) Co-authored-by: kikuyan <kikuyan@users.noreply.github.com> Co-authored-by: martin54 <martin54@users.noreply.github.com>
2022-10-11[Common:JWPlayer] Fix x1000 scaling errordirkf
See https://github.com/yt-dlp/yt-dlp/issues/5106#issuecomment-1264625161
2021-04-06[compat] Use more conventional name for compat SimpleCookieSergey M․
2021-04-04[compat] add compat_SimpleCookieRemita Amine
2021-04-04[extractor/common] keep support for non standard JSON-LD VideoObject author ↵Remita Amine
values
2021-04-04[extractor/common] fix JSON-LD VideoObject author extractionRemita Amine
2021-04-03[extractor/common] fix _get_cookies method for python 2(#20673, #23256, ↵Remita Amine
#20326, closes #28640)
2021-03-10Introduce release_timestamp meta field (refs #28386)Sergey M․
2021-02-01[youtube] Rewrite ExtractorRemita Amine
- improve format sorting - remove unused code(swf parsing, ...) - fix series metadata extraction - fix trailer video extraction - improve error reporting - extract video location
2020-12-19[common] remove unwanted query params from unsigned akamai manifest URLsRemita Amine
2020-12-13[extractor/common] Improve JSON-LD interaction statistic extraction (refs ↵Sergey M․
#23306)
2020-12-13[extractor/common] Document duration meta field for playlistsSergey M․
2020-12-09[extractor/common] Fix inline HTML5 media tags processing and add test ↵Sergey M․
(closes #27345)
2020-12-07[extractor/common] Add support for dl8-* media tags (closes #27283)Sergey M․
2020-12-07[extractor/common] Eliminate media tag name regex duplicationSergey M․
2020-12-07[extractor/common] Fix media type extraction for HTML5 media tags in ↵Sergey M․
start/end form
2020-12-03[extractor/commons] improve Akamai HTTP formats extractionRemita Amine
2020-12-02[extractor/common] improve Akamai HTTP format extractionRemita Amine
- Allow m3u8 manifest without an additional audio format - Fix extraction for qualities starting with a number Solution provided by @nixxo based on: https://stackoverflow.com/a/5984688
2020-11-22[extractor/common] add generic support for akamai http format extractionRemita Amine
2020-11-21Fix typos (#27084)Josh Soref
* spelling: authorization Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: brightcove Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: creation Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: exceeded Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: exception Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: extension Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: extracting Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: extraction Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: frontline Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: improve Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: length Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: listsubtitles Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: multimedia Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: obfuscated Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: partitioning Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: playlist Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: playlists Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: restriction Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: services Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: split Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: srmediathek Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: support Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: thumbnail Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: verification Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: whitespaces Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
2020-11-18[extractor/common] Output error for invalid URLs in _is_valid_url (refs ↵Sergey M․
#21400, refs #24151, refs #25617, refs #25618, refs #25586, refs #26068, refs #27072)
2020-09-19[extractor/common] Relax interaction count extraction in _json_ldSergey M․
2020-09-19[extractor/common] Extract author as uploader for VideoObject in _json_ldSergey M․
2020-09-18[extractor/common] Handle ssl.CertificateError in _request_webpage (closes ↵Sergey M․
#26601) ssl.CertificateError is raised on some python versions <= 3.7.x
2020-05-05[extractor/common] Use compat_cookiejar_Cookie for _set_cookie (closes ↵Sergey M․
#23256, closes #24776) To always ensure cookie name and value are bytestrings on python 2.
2020-05-02[extractor/common] Extract multiple JSON-LD entriesSergey M․
2020-04-07[extractor/common] Skip malformed ISM manifest XMLs while extracting ISM ↵Sergey M․
formats (#24667)
2020-02-29Remove no longer needed compat_str around geturlSergey M․
2020-02-29[extractor/common] Convert ISM manifest to unicode before processing on ↵Sergey M․
python 2 (#24152)
2019-11-26[dailymotion] improve extractionRemita Amine
- extract http formats included in m3u8 manifest - fix user extraction(closes #3553)(closes #21415) - add suport for User Authentication(closes #11491) - fix password protected videos extraction(closes #23176) - respect age limit option and family filter cookie value(closes #18437) - handle video url playlist query param - report alowed countries for geo-restricted videos
2019-11-16[extractor/common] Add data, headers and query to all major extract methods ↵Sergey M․
preserving standard order for potential future use
2019-11-09[extractor/common] clean jwplayer description HTML tagsRemita Amine
2019-11-06[common] initialize headers param with empty dictRemita Amine
2019-11-05[common] fix typoRemita Amine
2019-11-05[common] pass headers to _extract_(m3u8|mpd)_formats methodsRemita Amine