Commit graph

416 commits

Author SHA1 Message Date
Philipp Hagemeister f1a9d64eea [extractor/common] Modernize 2014-08-28 01:04:43 +02:00
Philipp Hagemeister da9ec3b932 [muscivault] Add extractor (Fixes #3593) 2014-08-27 01:44:47 +02:00
Philipp Hagemeister 704df56da7 [sportdeutschland] add new extractor 2014-08-26 12:51:13 +02:00
Philipp Hagemeister b252735910 [extractor/common] Generate better f4m format IDs 2014-08-25 13:03:08 +02:00
Philipp Hagemeister 9480d1a566 Merge remote-tracking branch 'riking/twofactor' 2014-08-24 07:14:23 +02:00
Philipp Hagemeister d769be6c96 [grooveshark,http] Make HTTP POST downloads work 2014-08-24 01:31:35 +02:00
Philipp Hagemeister a36819731b [escapist] Add support for og:video:url (Fixes #3557) 2014-08-21 13:05:24 +02:00
riking 165250ff5e Remove debug prints 2014-08-16 14:49:30 -07:00
riking 83317f6938 [youtube] Add two-factor account signin (TOTP only)
Additional work is required to prompt the user for the SMS or phone call codes, as there is no framework currently to prompt the user during an extraction operation.

Fixes #3533
2014-08-16 14:48:17 -07:00
Jaime Marquínez Ferrándiz f036a6328e [extractor/common] _extract_f4m_formats: Use more specific messages when downloading the manifest 2014-07-28 15:42:19 +02:00
Jaime Marquínez Ferrándiz 31bb8d3f51 [bloomberg] Extract the available formats (closes #2776)
It uses a helper method in the InfoExtractor class.
The downloader will pick the requested formats using the bitrate in the info dict.
2014-07-28 15:32:38 +02:00
Philipp Hagemeister c3415d1bac [extractor/common] PEP8 2014-07-25 10:43:03 +02:00
Philipp Hagemeister b090af5922 [vube] Fix comment count 2014-07-23 01:27:25 +02:00
Philipp Hagemeister 1a30deca50 [teachertube] Fix title and playlist recognition 2014-07-21 12:47:01 +02:00
Philipp Hagemeister 9732d77ed2 [snotr] PEP8 and minor fixes (#3296) 2014-07-21 12:02:44 +02:00
Philipp Hagemeister 40c696e5c6 [screencast] Add suppot for more video types (#3236) 2014-07-11 15:39:24 +02:00
Philipp Hagemeister 4094b6e36d [vodlocker] PEP8, generalization, and simplification (#3223) 2014-07-11 10:57:40 +02:00
Jaime Marquínez Ferrándiz 78338f71ca [livestream:original] Add support for folder urls (closes #2631)
The webpage only contains shortened links for the videos, since the server
doesn't support HEAD requests, we use an specific extractor for them.
2014-06-26 16:34:36 +02:00
Philipp Hagemeister d551980823 [spiegeltv] Simplify and PEP8 2014-06-07 15:35:13 +02:00
Philipp Hagemeister ad3bc6acd5 Document and test categories (#2923) 2014-05-15 12:41:42 +02:00
Philipp Hagemeister 5afa7f8bee [extractor/common] --write-pages: Correct file name if video_id is None 2014-05-15 12:39:33 +02:00
Philipp Hagemeister 57c7411f46 [mixcloud] Shed API dependency (#2904) 2014-05-13 09:42:38 +02:00
Philipp Hagemeister c1bce22f23 [extractor/common] Protect against long video IDs and URLs 2014-05-12 21:58:23 +02:00
Philipp Hagemeister 2099125333 [soundcloud/generic] Add support for playlists 2014-05-05 03:15:17 +02:00
Philipp Hagemeister 28746fbd59 [bilibili] Add preliminary support (#2174)
The URL http://www.bilibili.tv/video/av636603/index_2.html does not work yet.
2014-04-21 13:46:41 +02:00
Anisse Astier ec0fafbb19 [extractor/common] fallback on utf-8 when charset is not found
fixes #2721
2014-04-07 23:10:16 +02:00
Philipp Hagemeister b6cfde99b7 Only mention websense URL once 2014-04-03 08:12:53 +02:00
Philipp Hagemeister 2410c43d83 Detect Websense censorship (Fixes #2670) 2014-04-03 06:09:38 +02:00
Philipp Hagemeister 38d63d846e [extractor/common] Clarify preference key in formats 2014-03-23 17:41:43 +01:00
Philipp Hagemeister 955c451456 Rename upload_timestamp to timestamp 2014-03-13 18:45:14 +01:00
Philipp Hagemeister 9d2ecdbc71 [vevo] Centralize timestamp handling 2014-03-13 15:30:25 +01:00
Philipp Hagemeister 5a25f39653 Correct extractor documentation 2014-03-10 13:09:55 +01:00
Philipp Hagemeister 9f62eaf4ef [canal13cl] Add test and improve extraction (#2498) 2014-03-03 12:53:11 +01:00
Philipp Hagemeister 0afef30b23 Add display_id field 2014-03-03 12:06:28 +01:00
Philipp Hagemeister 81c2f20b53 [youtube] Correct invalid JSON (Fixes #2353) 2014-02-09 17:56:10 +01:00
dst c1206423c4 Fix extraction of og content in single quotes 2014-01-31 03:57:33 +07:00
Jaime Marquínez Ferrándiz 0c708f11cb [bloomberg] Fix ooyala url extraction
Added a helper method to InfoExtractor for searching the ‘twitter:player’ meta property.
Now the OoyalaIE also recognizes the ‘ec’ parameter in the url as the embed code.
2014-01-29 18:03:32 +01:00
Philipp Hagemeister 7e8caf30c0 Throw an error if no video formats are found 2014-01-27 07:31:54 +01:00
Philipp Hagemeister db1f388878 [huffpost] Add support 2014-01-27 05:47:38 +01:00
Jaime Marquínez Ferrándiz 944d65c762 [extractor/common] Encode the url when calculating the md5 with —write-pages option
This doesn’t cause any problem in python 2.*, but on python 3 the `md5` function only accepts bytes.
2014-01-25 15:32:56 +01:00
Philipp Hagemeister 1394ce65b4 [youtube] Add new formats (Fixes #2221) 2014-01-23 23:54:06 +01:00
Philipp Hagemeister 50317b111d Merge branch 'youtube-dash-manifest'
Conflicts:
	youtube_dl/extractor/youtube.py
2014-01-22 19:58:31 +01:00
Philipp Hagemeister 9d4288b2d4 [extractor/common] Clarify when and when not we generate the filename 2014-01-21 01:41:13 +01:00
Philipp Hagemeister b60016e831 Deal with implicitly UTF-16 decoded webpages
These webpages don't specify an encoding and rely on the BOM
2014-01-21 01:39:40 +01:00
Philipp Hagemeister dd27fd1739 [youtube] Download DASH manifest
If given, download and parse the DASH manifest file, in order to get ultra-HQ formats.
Fixes #2166
2014-01-19 05:47:20 +01:00
Philipp Hagemeister 3ec05685f7 [extractor/common] Limit --write-pages filename to 200 chars
This avoids problems with very long URLs.
2014-01-17 14:47:47 +01:00
Philipp Hagemeister 9933b57430 [pornhub] Use centralized sorting 2014-01-07 10:25:34 +01:00
Philipp Hagemeister 3d3538e422 [khanacademy] Add support (Fixes #2066) 2014-01-07 09:35:34 +01:00
Philipp Hagemeister 5d73273f6f [orf] Use new extraction method (Fixes #2057) 2014-01-06 17:15:27 +01:00
Philipp Hagemeister 9887c9b2d6 [jpopsuki] Simplify 2014-01-03 12:51:37 +01:00
Philipp Hagemeister 08d13955dd [wistia] Prefer original video format above all others
We could also set up a formula which would weigh filesize/bitrate and vcodec/acodec (say, 1GB h264 < 3 GB MPEG2 < 2 GB h264), but that would get really messy real soon.
2014-01-01 20:23:49 +01:00
Philipp Hagemeister 5d4f3985be Document that format_id field should be present 2013-12-26 21:19:00 +01:00
Philipp Hagemeister 7217e148fb [yahoo] Use centralized sorting, and add tbr field 2013-12-25 15:18:40 +01:00
Philipp Hagemeister c7deaa4c74 [zdf] Use centralized sorting 2013-12-24 23:32:04 +01:00
Philipp Hagemeister e6812ac99d [spiegel] Use centralized sorting 2013-12-24 12:40:23 +01:00
Philipp Hagemeister 4bcc7bd1f2 Add temporary _sort_formats helper function 2013-12-24 12:31:42 +01:00
Philipp Hagemeister f49d89ee04 Add a resolution field and improve general --list-formats output 2013-12-24 11:56:02 +01:00
Philipp Hagemeister f45f96f8f8 [myvideo] Use RTMP instead of RTMPT (Fixes #2032) 2013-12-23 15:57:43 +01:00
Philipp Hagemeister 1538eff6d8 [bliptv] Remove support for direct downloads
This is now handled by the generic IE
2013-12-23 15:49:21 +01:00
Philipp Hagemeister aa94a6d315 [aparat] Add support (Fixes #2012) 2013-12-20 17:05:39 +01:00
Jaime Marquínez Ferrándiz c0d0b01f0e [generic] Detect ooyala videos (fixes #2013) 2013-12-19 20:32:12 +01:00
Philipp Hagemeister 46374a56b2 [youtube] Do not warn for videos with allow_rating=0
This fixes #1982
Test video: http://www.youtube.com/watch?v=gi2uH3YxohU
2013-12-17 02:49:56 +01:00
Itay Brandes 87a28127d2 _search_regex's "isatty" call fails with Py2exe's
_search_regex calls the sys.stderr.isatty() function for unix systems.

Py2exe uses a custom Stderr() stream which doesn't have an `isatty()`
function, leading to it's crash.

Fixes easily with checking that it's a unix system first.
2013-12-16 21:50:26 +01:00
Philipp Hagemeister d67b0b1596 Reorder info_dict documentation 2013-12-16 14:13:40 +01:00
Philipp Hagemeister c0ba0f4859 Document duration field 2013-12-16 04:09:43 +01:00
Philipp Hagemeister e2b38da931 [mtv] Fixup incorrectly encoded XML documents 2013-12-10 12:45:22 +01:00
Philipp Hagemeister 7cc3570e53 Add fatal=False parameter to _download_* functions.
This allows us to simplify the calls in the youtube extractor even further.
2013-12-09 01:49:03 +01:00
Philipp Hagemeister 19e3dfc9f8 [9gag] Like/dislike count (#1895) 2013-12-05 18:29:07 +01:00
Philipp Hagemeister aaebed13a8 [smotri] Simplify 2013-12-02 17:08:17 +01:00
Philipp Hagemeister 2a275ab007 [zdf] Use _download_xml 2013-11-28 05:47:50 +01:00
Philipp Hagemeister 79d09f47c2 Merge branch 'opener-to-ydl' 2013-11-25 03:30:37 +01:00
Philipp Hagemeister c059bdd432 Remove quality_name field and improve zdf extractor 2013-11-25 03:28:55 +01:00
Philipp Hagemeister 02dbf93f0e [zdf/common] Use API in ZDF extractor.
This also comes with a lot of extra format fields
Fixes #1518
2013-11-25 03:13:22 +01:00
Philipp Hagemeister e03db0a077 Merge branch 'master' into opener-to-ydl 2013-11-24 15:18:44 +01:00
Jaime Marquínez Ferrándiz 267ed0c5d3 [collegehumor] Encode the xml before calling xml.etree.ElementTree.fromstring (fixes #1822)
Uses a new helper method in InfoExtractor: _download_xml
2013-11-24 14:59:19 +01:00
Philipp Hagemeister 7012b23c94 Match --download-archive during playlist processing (Fixes #1745) 2013-11-22 22:46:46 +01:00
Philipp Hagemeister dca0872056 Move the opener to the YoutubeDL object.
This is the first step towards being able to just import youtube_dl and start using it.
Apart from removing global state, this would fix problems like #1805.
2013-11-22 19:57:52 +01:00
Philipp Hagemeister 5904088811 Add support for tou.tv (Fixes #1792) 2013-11-20 06:13:19 +01:00
Philipp Hagemeister 91c7271aab Add automatic generation of format note based on bitrate and codecs 2013-11-16 01:08:43 +01:00
Jaime Marquínez Ferrándiz 78fb87b283 Don't accept '>' inside the content attribute in OpenGraph regexes 2013-11-15 12:54:13 +01:00
Jaime Marquínez Ferrándiz ab2d524780 Improve the OpenGraph regex
* Do not accept '>' between the property and content attributes.
* Recognize the properties if the content attribute is before the property attribute using two regexes (fixes the extraction of the description for SlideshareIE).
2013-11-15 12:24:54 +01:00
Philipp Hagemeister eb0a839866 [common] Simplify og_search_property 2013-11-12 10:36:23 +01:00
Marcin Cieślak a8eeb0597b Fix AssertionError when og property not found
On tvp.pl some webpages contain OpenGraph
metadata and some don't.

If og property is not found, _og_search_description
fails with

WARNING: unable to extract OpenGraph description; please report this issue on http://yt-dl.org/bug
Traceback (most recent call last):
  File "/usr/home/saper/bin/youtube-dl", line 18, in <module>
    youtube_dl.main()
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/__init__.py", line 766, in main
    _real_main(argv)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/__init__.py", line 719, in _real_main
    retcode = ydl.download(all_urls)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/YoutubeDL.py", line 715, in download
    videos = self.extract_info(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/YoutubeDL.py", line 348, in extract_info
    ie_result = ie.extract(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 125, in extract
    return self._real_extract(url)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/tvp.py", line 56, in _real_extract
    info['description'] = self._og_search_description(webpage)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 331, in _og_search_description
    return self._og_search_property('description', html, fatal=False, **kargs)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/extractor/common.py", line 325, in _og_search_property
    return unescapeHTML(escaped)
  File "/usr/home/saper/sw/youtube-dl/youtube_dl/utils.py", line 494, in unescapeHTML
    assert type(s) == type(u'')
AssertionError

The patch allows me to use:

  try:
    info['description'] = self._og_search_description(webpage)
    info['thumbnail'] = self._og_search_thumbnail(webpage)
  except RegexNotFoundError:
    pass
2013-11-05 23:19:29 +01:00
Jaime Marquínez Ferrándiz 9103bbc5cd Add the 'webpage_url' field to info_dict
The url for the video page, it must allow to reproduce the result.
It's automatically set by YoutubeDL if it's missing.
2013-11-03 12:11:13 +01:00
Philipp Hagemeister b5d0d817bc Remove superfluous space 2013-10-30 01:09:44 +01:00
Philipp Hagemeister ebc14f251c Merge remote-tracking branch 'origin/master' 2013-10-28 10:44:13 +01:00
Philipp Hagemeister d41e6efc85 New debug option --write-pages 2013-10-28 10:44:02 +01:00
Filippo Valsorda 8ffa13e03e [Instagram] get the non-https link, as they are serving Akamai cert from a instagram.com domain 2013-10-28 02:34:29 -04:00
Jaime Marquínez Ferrándiz 55b3e45bba [vimeo] Fix pro videos and player.vimeo.com urls
The old process can still be used for those videos.
Added RegexNotFoundError, which is raised by _search_regex if it can't extract the info.
2013-10-23 14:38:03 +02:00
Jaime Marquínez Ferrándiz 8c51aa6506 The 'format' field now defaults to '{format_id} - {width}x{height}{format_note}'
Following the YoutubeIE format. The 'format_note' gives additional info about the format, for example '3D' or 'DASH video'.
2013-10-21 14:42:06 +02:00
Philipp Hagemeister 416a5efce7 fix typos 2013-10-18 00:49:45 +02:00
Philipp Hagemeister 8dbe9899a9 Allow users to specify an age limit (fixes #1545)
With these changes, users can now restrict what videos are downloaded by the intented audience, by specifying their age with --age-limit YEARS .
Add rudimentary support in youtube, pornotube, and youporn.
2013-10-06 06:08:56 +02:00
Philipp Hagemeister 2f5865cc6d Clarify that url and ext are optional when formats is given (#980) 2013-10-04 11:09:43 +02:00
Philipp Hagemeister deefc05b88 Document formats (for #980) 2013-10-04 10:40:42 +02:00
Jaime Marquínez Ferrándiz 0d75ae2ce3 Fix detection of the webpage charset if it's declared using ' instead of "
Like in "<meta charset='utf-8'/>"
2013-08-29 11:35:15 +02:00
Philipp Hagemeister f143d86ad2 [sohu] Handle encoding, and fix tests 2013-08-28 14:00:05 +02:00
Philipp Hagemeister 6d69d03bac Merge remote-tracking branch 'origin/reuse_ies' 2013-08-28 13:05:21 +02:00
Philipp Hagemeister 2eabb80254 [addanime] improve 2013-08-28 04:25:38 +02:00
Jaime Marquínez Ferrándiz 9e9c164052 Merge pull request #937 from jaimeMF/subtitles_rework
Subtitles rework
2013-08-23 02:40:25 -07:00
Philipp Hagemeister 79cb25776f Cache suitable regular expressions
This speeds up TestAllURLsMatching.test_no_duplicates by about 8000% at the cost of minimal memory overhead.
2013-08-21 04:06:48 +02:00
Jaime Marquínez Ferrándiz 5d51a883c2 Use a dictionary for storing the subtitles
The errors while getting the subtitles are reported as warnings, if no subtitles are found return and empty dict.
2013-07-20 12:52:25 +02:00
Philipp Hagemeister f38de77f6e Use unescapeHTML for OpenGraph properties
These are attribute values, so we don't need the more complex and whitespace-destroying cleanHTML - we just need to unescape quotes, that's it.
2013-07-17 10:38:23 +02:00
Philipp Hagemeister b9d3e1635f Strip hash info from URL when making requests (Fixes #1038) 2013-07-13 22:52:12 +02:00
Philipp Hagemeister 3c4e6d8337 Improve OpenGraph property matching 2013-07-13 20:39:47 +02:00
Jaime Marquínez Ferrándiz 44dbe89035 Use re.DOTALL by default when searching OpenGraph properties 2013-07-13 11:29:08 +02:00
Jaime Marquínez Ferrándiz 46720279c2 InfoExtractor: add some helper methods to extract OpenGraph info 2013-07-12 22:12:04 +02:00
Philipp Hagemeister 690e872c51 Remove video_result helper method
Calling it was more complex then actually including the type in the video info
2013-07-11 12:12:30 +02:00
Jaime Marquínez Ferrándiz 56c7366547 YoutubeIE: reuse instances of InfoExtractors (closes #998)
When a IE is added to the list, it's also added to a dictionary. When a IE is requested it first looks in the dictionary and if there's no instance it will create a new one.

That way _real_initialize is only called once for each IE, saving time if it needs to login for example.
2013-07-08 15:14:27 +02:00
Philipp Hagemeister d93e4dcbb7 Merge branch 'master' of github.com:rg3/youtube-dl 2013-07-08 01:15:19 +02:00
Philipp Hagemeister 73e79f2a1b [3sat] Add support (Fixes #1001) 2013-07-08 01:13:55 +02:00
Jaime Marquínez Ferrándiz fc79158de2 VimeoIE: authentication support (closes #885) and add a method in the base InfoExtractor to get the login info 2013-07-07 23:24:34 +02:00
Philipp Hagemeister 0f81866329 Add --list-extractor-descriptions (human-readable list of IEs) 2013-07-01 18:52:19 +02:00
Philipp Hagemeister f3d294617f Document view_count (Closes #963) 2013-06-29 16:32:28 +02:00
Filippo Valsorda 98bcd2834a improve generic and encrypted signature error messages 2013-06-25 16:47:16 +02:00
Philipp Hagemeister 3c25b9abae Remove useless headers 2013-06-23 20:35:50 +02:00
Philipp Hagemeister d6983cb460 Fix generic class move (add all files) 2013-06-23 19:57:38 +02:00