Compare commits

...

560 Commits

Author SHA1 Message Date
retouching
09eda16882
fix(dl): delete old file after repackage (#114)
* fix(dl): delete old file after repackage

* fix(dl): using original_path instead of self.path in repackage method
2024-06-03 16:57:26 +01:00
rlaphoenix
a95d32de9e chore: Add config to gitignore 2024-05-17 02:29:46 +01:00
rlaphoenix
221cd145c4 refactor(dl): Make Widevine CDM config optional
With this change you no longer have to define/configure a CDM to load. This is something that isn't necessary for a lot of services.

Note: It's also now less hand-holdy in terms of correct config formatting/values. I.e. if you define a cdm by profile for a service slightly incorrectly, say a typo on the service or profile name, it will no longer warn you.
2024-05-17 01:52:45 +01:00
rlaphoenix
0310646cb2 fix(Subtitle): Skip merging segmented WebVTT if only 1 segment 2024-05-17 01:42:44 +01:00
rlaphoenix
3426fc145f fix(HLS): Decrypt AES-encrypted segments separately
We cannot merge all the encrypted AES-128-CBC (ClearKey) segments and then decrypt them in one go because each segment should be padded to a 16-byte boundary in CBC mode.

Since it uses PKCS#5 or #7 style (cant remember which) then the merged file has a 15 in 16 chance to fail the boundary check. And in the 1 in 16 odds that it passes the boundary check, it will not decrypt properly as each segment's padding will be treated as actual data, and not padding.
2024-05-17 01:15:37 +01:00
rlaphoenix
e57d755837 fix(clearkey): Do not pad data before decryption
This is seemingly unnecessary and simply incorrect at least for two sources (VGTV, and TRUTV).

Without this change it is not possible to correctly merge all segments without at least some problem in the resulting file.
2024-05-17 01:00:11 +01:00
rlaphoenix
03f3fec5cc refactor(dl): Only log errors/warnings from mkvmerge, list after message 2024-05-16 18:12:57 +01:00
rlaphoenix
2acee30e54 fix(utilities): Prevent finding the same box index over and over
Since it removed the data before the found box's index(-4), all loops would only find the same box at the same index again, but this time the box index would be 4 since all previous data was removed in the prior loop. Since the index-=4 code is only run if the index > 4, this never run on the second loop, and since this data now does not have the box length, Box.parse failed with an IOError.

This corrects looping through boxes and correctly obtains and parses each box.
2024-05-15 17:54:21 +01:00
rlaphoenix
2e697d93fc fix(dl): Log output from mkvmerge on failure 2024-05-15 14:00:38 +01:00
rlaphoenix
f08402d795 refactor: Warn falling back to requests as aria2c doesn't support Range 2024-05-11 22:59:31 +01:00
rlaphoenix
5ef95e942a fix(DASH): Use SegmentTemplate endNumber if available 2024-05-11 22:15:05 +01:00
rlaphoenix
dde55fd708 fix(DASH): Correct SegmentTemplate range stop value
Since range(start, stop) is start-inclusive but stop-exclusive, and DASH startNumber of SegmentTemplate typically will be 1 or not specified (defaulting to 1) it effectively worked by coincidence.

However, if startNumber was anything other than 1 than we will have a problem.
2024-05-11 22:13:28 +01:00
rlaphoenix
345cc5aba6
Merge pull request #110 from adbbbb/master
Adding Arm64 OSX Shaka support
2024-05-11 20:13:30 +01:00
rlaphoenix
145e7a6c17 docs(contributors): Add adbbbb to Contributor list 2024-05-11 20:13:01 +01:00
Adam
5706bb1417 fix(binaries): Search for Arm64 builds of Shaka-Packager 2024-05-11 20:11:29 +01:00
rlaphoenix
85246ab419
Merge pull request #109 from pandamoon21/master
Fix uppercase letters in the fonts extension - Font attachment
2024-05-11 17:46:04 +01:00
rlaphoenix
71a3a4e2c4 docs(contributors): Add pandamoon21 to Contributor list 2024-05-11 17:45:10 +01:00
pandamoon21
06d414975c fix(Attachment): Check mime-type case-insensitively 2024-05-11 17:43:32 +01:00
rlaphoenix
f419e04fad refactor(Track): Ensure data property is a defaultdict with dict factory
This is so both internal code and service code can save data to sub-keys without the parent keys needing to exist.

A doc-string is now set to the data property denoting some keys as reserved as well as their typing and meaning.

This also fixes a bug introduced in v3.3.3 where it will fail to download tracks without the "hls" key in the data property. This can happen when manually making Audio tracks using the HLS descriptor, and not putting any of the hls data the HLS class sets in to_tracks().
2024-05-09 15:15:22 +01:00
rlaphoenix
50d6f3a64d docs(changelog): Add v3.3.3 Changes 2024-05-07 07:10:20 +01:00
rlaphoenix
259434b59d docs(version): Bump to v3.3.3 2024-05-07 07:10:02 +01:00
rlaphoenix
7df8be46da build(poetry): Update dependencies
We can remove explicit dependency on language-data and marisa-trie because langcodes v3.3.0 now depends on language-data 1.2.0 and language-data 1.2.0 now depends on marisa-trie 1.1.0.
2024-05-07 07:06:22 +01:00
rlaphoenix
7aa797a4cc
Merge pull request #67 from Shivelight/feature/fix-webvtt-timestamp
Correct timestamps when merging fragmented WebVTT
2024-05-07 06:54:42 +01:00
Shivelight
0ba45decc6 fix(Subtitle): Correct timestamps when merging fragmented WebVTT
This applies the X-TIMESTAMP-MAP data to timestamps as it reads through a concatenated (merged) WebVTT file to correct timestamps on segmented WebVTT streams. It then removes the X-TIMESTAMP-MAP header.

The timescale and segment duration information is saved in the Subtitle's data dictionary under the hls/dash key: timescale (dash-only) and segment_durations. Note that this information will only be available post-download.

This is done regardless if you are converting to another subtitle or not, since the downloader automatically and forcefully concatenated the segmented subtitle data. We do not support the use of segmented Subtitles for downloading or otherwise, nor do we plan to.
2024-05-06 18:18:23 +01:00
rlaphoenix
af95ba062a refactor(env): Shorten paths on Windows with env vars 2024-04-24 05:56:05 +01:00
rlaphoenix
3bfd96d53c fix(dl): Automatically convert TTML Subs to WebVTT for MKV support 2024-04-24 05:35:24 +01:00
rlaphoenix
f23100077e refactor(dl): Improve readability of download worker errors
Now it will no longer print the full traceback for errors caused by a missing binary file. Other errors still include it and now explicitly label them as unexpected. CalledProcessError handling is now merged with all non-environment related errors and explicitly mentions that a binary call failed.
2024-04-24 05:28:10 +01:00
rlaphoenix
fd64e6acf4 refactor(utilities): Remove get_binary_path, use binaries.find instead
The function now located at core/binaries should only be used by services to find a specific binary not listed in there already, or if the name of the binary needed by your service differs.
2024-04-24 05:10:34 +01:00
rlaphoenix
677fd9c56a feat(binaries): Move all binary definitions to core/binaries file
This simplifies and centralizes all definitions on where these binaries can be found to a singular reference, making it easier to modify, edit, and improve.
2024-04-24 05:07:25 +01:00
rlaphoenix
9768de8bf2 feat(env): List possible config path locations when not found 2024-04-19 19:28:15 +01:00
rlaphoenix
959b62222e fix(env): List all directories as table in info 2024-04-19 19:27:33 +01:00
rlaphoenix
c101136d55 refactor(Config): Move possible config paths out of func to constant 2024-04-19 19:23:56 +01:00
rlaphoenix
4f1dfd7dd1 refactor(curl-impersonate): Update the default browser to chrome124 2024-04-18 09:50:17 +01:00
rlaphoenix
c859465af2 refactor(curl-impersonate): Remove manual fix for curl proxy SSL
The new version of curl-cffi includes the proper fix for applying ca-bundles to proxy connections making this manual fix no longer required.
2024-04-18 09:49:35 +01:00
rlaphoenix
d1ae361afc docs(changelog): Add v3.3.2 Changes 2024-04-16 06:07:00 +01:00
rlaphoenix
a62dcff9ad docs(version): Bump to v3.3.2 2024-04-16 06:06:44 +01:00
rlaphoenix
920ce8375b build(poetry): Update dependencies 2024-04-16 06:06:05 +01:00
rlaphoenix
3abb869d80
Merge pull request #100 from retouching/patch-1
Check if width and height is digit if it's an str
2024-04-16 05:36:59 +01:00
rlaphoenix
cbcb7e31b0 docs(contributors): Add retouching to Contributor list 2024-04-16 05:35:57 +01:00
retouching
4335806ca2 fix(Video): Allow specifying width/height as str, cast to int
We simply check the type near the top of the constructor, and later in the code it casts to int and handles failures there too (e.g., if the str is not a number, it will be handled).
2024-04-16 05:33:37 +01:00
rlaphoenix
a850a35f3e fix(Basic): Return None not Exception if no proxy configured 2024-04-16 05:27:17 +01:00
rlaphoenix
09e80feee5 fix(cfg): Use loaded config path instead of hardcoded default 2024-04-14 03:44:30 +01:00
rlaphoenix
f521ced3fe refactor(env): Use -- to indicate no config found/loaded 2024-04-14 03:42:41 +01:00
rlaphoenix
b4e28050ab fix(env): List used config path, otherwise the default path 2024-04-14 03:35:17 +01:00
rlaphoenix
646c35fc1b fix(Subtitle): Optionalise constructor args, add doc-string & checks
Some HLS playlists can have extremely limited information so to accommodate this we need to make the Subtitle track support having almost no information. This isn't ideal but it's really the only solution.
2024-04-14 03:26:35 +01:00
rlaphoenix
7fa0ff1fc0 refactor(Subtitle): Do not print "?"/"Unknown" values in str() 2024-04-14 03:25:22 +01:00
rlaphoenix
5c7c080a34 fix(HLS): Ensure playlist.stream_info.resolution exists before use 2024-04-14 03:15:11 +01:00
rlaphoenix
1db8944b09 fix(HLS): Ensure playlist.stream_info.codecs exists before use 2024-04-14 03:14:45 +01:00
rlaphoenix
43585a76cb fix(Audio): Optionalise constructor args, add doc-string & checks
Some HLS playlists can have extremely limited information so to accommodate this we need to make the Audio track support having almost no information. This isn't ideal but it's really the only solution.
2024-04-14 03:13:46 +01:00
rlaphoenix
8ca91efbc5 refactor(Audio): Do not print "?"/"Unknown" values in str() 2024-04-14 03:12:50 +01:00
rlaphoenix
57b042fa4b refactor(Audio): List lang after codec for consistency with other Tracks 2024-04-14 03:08:40 +01:00
rlaphoenix
642ad393b6 style: Move __...__ methods after constructors 2024-04-14 03:07:23 +01:00
rlaphoenix
23485bc820 refactor(Video): Return None if no m3u RANGE, not SDR 2024-04-14 02:40:16 +01:00
rlaphoenix
15d73be532 fix(Video): Optionalise constructor args, add doc-string & checks
Some HLS playlists can have extremely limited information so to accommodate this we need to make the Video track support having almost no information. This isn't ideal but it's really the only solution.
2024-04-14 02:36:55 +01:00
rlaphoenix
9ddd9ad474 refactor(Video): Do not print "?"/"Unknown" values in str() 2024-04-14 02:32:34 +01:00
rlaphoenix
dae83b0bd5 fix(Video): Ensure track is supported in change_color_range() 2024-04-14 02:31:31 +01:00
rlaphoenix
20da213066 docs(changelog): Add v3.3.1 Changes 2024-04-05 12:31:59 +01:00
rlaphoenix
36222972ee docs(version): Bump to v3.3.1 2024-04-05 12:30:43 +01:00
rlaphoenix
6a25b09301 build(poetry): Update dependencies 2024-04-05 12:30:09 +01:00
rlaphoenix
b7ea94de29
Merge pull request #95 from knowhere01/master
Events not working as expected
2024-04-05 12:25:33 +01:00
rlaphoenix
e92f8ed067 docs(contributors): Add knowhere01 to Contributor list 2024-04-05 12:24:25 +01:00
knowhere01
5a4c1bd6a2 fix(Events): Dereference subscription store from ephemeral store 2024-04-05 12:23:27 +01:00
rlaphoenix
994ab152a4 fix(requests): Fix multithreaded downloads
For some reason moving the download speed calculation code from the requests() function to the download() function makes it actually multi-threaded instead of sequential downloads.
2024-04-05 00:37:44 +01:00
rlaphoenix
5d1b54b8fa fix(Chapter): Cast values to int prior to formatting
If left as float, then it parses as e.g., `7.0` instead of `07`. This leads to the timestamp format being completely off.
2024-04-03 23:22:30 +01:00
rlaphoenix
10285c3819 feat(dl): Add *new* --workers to set download threads/workers
The previously named --workers which is now --downloads specified how many tracks to download, not how many threads/workers are used per-download.

It defaults to nothing, which each downloader then has their own defaults. All current downloaders though currently default to `min(32, (os.cpu_count() or 1) + 4)`, which is also the default for `ThreadPoolExecutor` in general.

This also brings a side effect of changing DASH and HLS's forced max_workers of 16 to now a more appropriate default but more importantly actually configurable. You can set a default in your config under `dl.workers`.
2024-04-03 00:58:47 +01:00
rlaphoenix
0cf20f84a9 refactor(dl): Change --workers to --downloads 2024-04-02 23:34:45 +01:00
rlaphoenix
fb5580882b docs(changelog): Add v3.3.0 Changes 2024-04-02 21:59:43 +01:00
rlaphoenix
6d18402807 docs(version): Bump to v3.3.0 2024-04-02 21:53:07 +01:00
rlaphoenix
1db2230892 build(poetry): Update dependencies 2024-04-02 21:52:22 +01:00
rlaphoenix
c3d50cf12c ci(pre-commit): Update hooks 2024-04-02 21:51:31 +01:00
rlaphoenix
5a12cb33e2 refactor(Track): Move from OnXyz callables to Event observer
Fixes #85
2024-04-02 18:01:03 +01:00
rlaphoenix
226b609ff5 feat(Events): Add new global Event Observer API 2024-04-02 13:00:38 +01:00
rlaphoenix
c194bb5b3a fix(Basic): Fix variable typo regression 2024-04-02 11:06:34 +01:00
rlaphoenix
3b3345964a fix(WVD): Add exists/empty checks to WVD folder dumps 2024-04-01 18:36:51 +01:00
rlaphoenix
f99fad8e15 refactor(WVD): Seperate logs in loop for visual clarity 2024-04-01 18:35:56 +01:00
rlaphoenix
f683be01d4 fix(WVD): Move log with path before Device load
This is so if the Device.load() call fails, we can know which file of the many files failed.
2024-04-01 18:35:00 +01:00
rlaphoenix
9f4c4584da fix(WVD): Move log out of loop to save performance 2024-04-01 18:34:03 +01:00
rlaphoenix
117a1188cd fix(WVD): Fix empty path to WVDs folder check
It seems a change in click made it so this is now just an empty tuple rather than a one-item tuple with Path("") resolved.
2024-04-01 18:33:02 +01:00
rlaphoenix
a053423d23 refactor(WVD): Print error if path to parse doesn't exist 2024-04-01 18:29:56 +01:00
rlaphoenix
3659c81d6a fix(WVD): Ensure WVDs dir exists before moving WVD file
Fixes #83
2024-04-01 18:28:21 +01:00
rlaphoenix
491a0b3a5a feat(Basic): Allow proxy selection by index (one-indexed) 2024-04-01 17:50:08 +01:00
rlaphoenix
b36befb296 feat(Basic): Allow single string URIs for countries 2024-04-01 17:47:51 +01:00
rlaphoenix
03b8945273 docs(config): Explain brief usage of proxy_provider data 2024-04-01 17:46:58 +01:00
rlaphoenix
6121cc0896 refactor(Basic): Improve proxy format checks 2024-04-01 17:38:04 +01:00
rlaphoenix
bd8309e1d7 fix(Basic): Make query case-insensitive
Fixes #88
2024-04-01 17:32:52 +01:00
rlaphoenix
f25d2419cf fix(curl-impersonate): Set Cert-Authority Bundle for HTTPS Proxies
For some reason curl-impersonate (curl_cffi project) does not set the certificate-authority bundle for proxies, which to be fair is for some reason seperated into two curl-options.

Doing this change as well as removing the https->http scheme enforcement on proxies, fixes HTTPS proxies on the curl-impersonate downloaders. I also simplified the seperate http and https proxy definitions to the `all` definition which was not originally supported but does seem to be supported as of v0.6.2.

I tested this on NordVPN proxies which are explicitly HTTPS-only and it does work.
2024-04-01 16:54:21 +01:00
rlaphoenix
45ccc129ce feat(dl): Try find SSAv4 fonts in System OS fonts folder
Currently only Windows is supported. Feel free to make a pull request to add Linux or mac OS support.
2024-03-27 06:01:57 +00:00
rlaphoenix
eeab8a4f39 feat(dl): Automatically attach fonts used within SSAv4 subs
The fonts must be within the /devine/fonts folder. This folder location can be changed in the config. If a font is missing it will warn the user and continue.

Closes #82
2024-03-27 06:01:57 +00:00
rlaphoenix
057e4efb56 feat: Add support for MKV Attachments via Attachment class
You add these new Attachment objects to the Tracks object just like you would with Video, Audio, and Subtitle objects.
2024-03-27 06:01:56 +00:00
rlaphoenix
a51e1b4f3c docs(changelog): Add v3.2.0 Changes 2024-03-25 03:54:14 +00:00
rlaphoenix
7715a3e844 docs(version): Bump to v3.2.0 2024-03-25 03:53:42 +00:00
rlaphoenix
16faa7dadf build(poetry): Update dependencies 2024-03-25 03:52:38 +00:00
rlaphoenix
d9873dac25 fix(HLS): Delete video/audio segments after FFmpeg merge 2024-03-24 22:28:15 +00:00
rlaphoenix
774fec6d77 fix(HLS): Delete subtitle segments as they are merged 2024-03-24 22:27:32 +00:00
rlaphoenix
e7294c95d1 fix(requests): Block until connection freed if too many connections 2024-03-13 17:15:13 +00:00
rlaphoenix
36b070f729 fix(requests): Manually compute default max_workers or pool size is None 2024-03-13 17:12:06 +00:00
rlaphoenix
458ad70fae fix(Video): Delete original file after using remove_eia_cc() 2024-03-12 11:08:15 +00:00
rlaphoenix
9fce56cc66 fix(Video): Delete original file after using change_color_range() 2024-03-12 11:07:40 +00:00
rlaphoenix
1bff87bd70 fix(requests): Set HTTP pool connections/maxsize to max workers
This allows requests to open and save/cache up to *max_workers* amount of TCP connections. In most situations it will still only save and re-use one TCP Connection since it always tries to re-use the connection if one is available.

However, in situations where downloads are from more than 10 Host/Port combinations (the default pool connections/maxsize) then this will improve download speeds.
2024-03-12 01:06:42 +00:00
rlaphoenix
5376e4c042 refactor(Service): Go back to the default pool_maxsize in Session
The pool_maxsize value here isn't actually doing much. It should have also been applied to pool_connections. What we realistically needed was just pool_block to prevent opening too much connections (causing a warning). The default pool_connections=10 and pool_maxsize=10 is fine. The downloader doesn't currently use this value.
2024-03-12 00:59:30 +00:00
rlaphoenix
c77d521a42 refactor(Track): Default the track name to it's lang's script/territory
This allows you to override the whole track name instead of just prefixing before the script/territory. If you want no track name at all, you can set the track name to an empty string.

The script "Zzzz" (placeholder?) and territory "ZZ" (placeholder?) are not used. The script/territory values are only used if available and if necessary. I.e., fr-CA will use "Canada" but fr-FR will NOT use "France", it will be blank.
2024-03-10 15:19:39 +00:00
rlaphoenix
f0b589c8a5 refactor(Track): Remove TERRITORY_MAP constant, trim SAR China manually
e.g., Hong Kong SAR China, Macao SAR China
2024-03-10 15:13:01 +00:00
rlaphoenix
4f79550301 fix(Track): Fix order of operation mistake in get_track_name 2024-03-09 19:56:41 +00:00
rlaphoenix
73d9bc4f94 fix(HLS): Remove save dir even if final merge wasn't needed 2024-03-09 19:44:40 +00:00
rlaphoenix
35501bdb9c fix(DASH): Fix merge regression from recent commit
An else tree was used in 4d6c72ba30 when it shouldn't have been.

Fixes #81
2024-03-09 17:52:50 +00:00
rlaphoenix
1d5d4fd347 fix(dl): Use click.command() instead of click.group() 2024-03-09 01:40:21 +00:00
rlaphoenix
4d6c72ba30 fix(DASH/HLS): Don't merge folders, skip final merge if only 1 segment 2024-03-09 01:37:55 +00:00
rlaphoenix
77e663ebee feat(search): New Search command, Service method, SearchResult Class 2024-03-08 21:32:55 +00:00
rlaphoenix
10a01b0b47 fix(Track): Compute Track ID from the this variable, not self 2024-03-08 19:22:33 +00:00
rlaphoenix
4c395edc53 fix(dl): Add single mux job if there's no video tracks
Fixes regression from v3.1.0 with --audio-only, --subs-only and --chapters-only.
2024-03-08 19:06:21 +00:00
rlaphoenix
eeccdc37cf fix(MultipleChoice): Simplify super() call and value types
It was using the wrong instance, leaving the convert() method to seemingly default to str() for the returned chosen value types (or something, I don't really see why this works).
2024-03-08 17:09:20 +00:00
rlaphoenix
423ff289db feat(Track): Allow Track to choose downloader to use
The downloader property must be a Callable of the same signature as the aria2c, curl_impersonate, and requests downloader functions. You can pass it these functions by importing, or a custom function of a matching signature.

Note: It will still override the chosen downloader and use a fallback one in the case of using aria2c downloader but the download uses the HTTP Range header.

Closes #70
2024-03-08 16:48:44 +00:00
rlaphoenix
ba801739fe fix(aria2c): Support aria2(c) 1.37.0 by handling upstream regression
From aria2c's changelog (2007-09-02):

```
Now *.aria2 contorol file is first saved to *.aria2__temp and if it is successful, then renamed to *.aria2.
This prevents *.aria2 file from being truncated or corrupted when file system becomes out of space.
```

It seems something went wrong in 1.37.0 resulting in these files sometimes not being renamed back to `.aria2` and then being left there for good. The fix for devine would be to simply detect `.aria2__temp` and delete them once all segments finish downloading. My only worry here is the root cause for why it has failed to rename. Did the download actually complete without error? According to aria2c's RPC, no errors occurred. There's no way to add support for Aria2(c) 1.37.0 without this sort of change as the files to seem to download correctly regardless of the file not being renamed and then deleted.

Fixes #71
2024-03-08 16:15:50 +00:00
rlaphoenix
79506dda75 chore(HLS): Remove commented-out code from get_supported_key()
This is code I forgot to remove while testing the HLS rework which released in v3.0.0.
2024-03-08 15:48:39 +00:00
rlaphoenix
ccac55897c refactor(ClearKey): Only use User-Agent if none set in from_m3u_key 2024-03-08 15:45:52 +00:00
rlaphoenix
e0aa0e37d3 feat(ClearKey): Pass session not proxy str in from_m3u_key method
This reduces the amount of connections being made by quite a bit for playlists that constantly change keys, or have new key data for every single segment (e.g., Pluto sometimes).

It also allows you to pass headers and cookies, while still also being able to supply a proxy.
2024-03-08 15:44:41 +00:00
rlaphoenix
c974a41b6d fix(dl): Include chapters when muxing
This is a regression from the newer mux-job code that was brought in alongside the multiple `-r/--range` mux jobs feature in v3.1.0.

Fixes #79
2024-03-08 15:30:36 +00:00
rlaphoenix
2bbe033efb fix(Tracks): Improve constructor typing, add Chapter(s) to typing 2024-03-08 15:20:40 +00:00
rlaphoenix
5950a4d4fa docs(changelog): Add v3.1.0 Changes 2024-03-05 17:11:47 +00:00
rlaphoenix
8d44920120 docs(version): Bump to v3.1.0 2024-03-05 17:09:34 +00:00
rlaphoenix
f8871c1ef0 docs(changelog): Add git-cliff configuration
Conventional Commit scopes don't seem entirely compatible with Keep a Changelog's sections/headers, so I have abandoned the Keep a Changelog sections/headers for custom ones that more accurately represent the commit's scope.
2024-03-05 17:08:26 +00:00
rlaphoenix
f7f974529b build: Explicitly use marisa-trie==1.1.0 for Python 3.12 wheels
The current version of langcodes (v3.3.0) is quite old and doesn't have explicit support for Python 3.11+ yet. It does work on Python 3.12 but one of it's dependencies, marisa-trie==0.7.8, does not have wheels for Python 3.12.

By explicitly using the pre-release version of one of langcode's dependencies, language-data, which is what depends on marisa-trie, we can upgrade to marisa-trie==1.1.0 which does have a wheel for Python 3.12.
2024-03-05 16:31:25 +00:00
rlaphoenix
0201c41feb feat(dl): Support multiple -r/--range and mux ranges separately
Multiple -r/--range values can be used with multiple -q/--quality values.

Closes #63
2024-03-04 13:11:43 +00:00
rlaphoenix
6e8efc3f63 fix(HLS): Use filtered out segment key info
Also simplifies calculation of wanted segment range when decrypting. Instead of storing the starting segment index number with the encryption_data variable, we just grab the first segment that isn't already merged.

Fixes #77
2024-03-04 12:51:00 +00:00
rlaphoenix
499fc67ea0 feat(cli): Implement MultipleChoice click param based on Choice param
This can be used in-place to click.Choice() when you want to choose multiple values. Values must be separated by `,` character. This does mean the `,` character cannot be in the choice sequence.
2024-03-04 11:06:56 +00:00
rlaphoenix
b7b88f66ce feat(dl): Change --vcodec default to None, use any codec 2024-03-04 10:41:07 +00:00
rlaphoenix
1adc551926 refactor(dl): Remove unused get_profiles() method 2024-03-04 09:31:15 +00:00
rlaphoenix
77976c7e74 feat(Subtitle): Convert from fTTML->TTML & fVTT->WebVTT post-download 2024-03-02 15:37:12 +00:00
rlaphoenix
cae47017dc refactor: Move dl command's download_track() to Track.download() 2024-03-02 15:08:22 +00:00
rlaphoenix
f510095bcf feat(dl): Skip video lang filter if --v-lang unused & only 1 video lang
This hopefully improves user-experience for anyone using Devine mainly for content outside the English language. For example, if you do -l it and there's only English video track's available, then there's really no need to filter by language and fail.

However, it still attempts filtering if you explicitly used --v-lang. If the user expected all episodes to be French by using `--v-lang fr`, and the service had one random episode in English, then the user would very likely want to be informed to verify and decide how they want to deal with it if it really was English.
2024-03-02 12:54:17 +00:00
rlaphoenix
a7c2210f0b fix(version): The __version__ variable forgot to be updated 2024-03-02 06:10:01 +00:00
rlaphoenix
76dc54fc13 fix(dl): Have --sub-format default to None to keep original sub format 2024-03-01 05:18:46 +00:00
rlaphoenix
c516f54a07 refactor(DASH): Change how Video FPS is gotten to remove FutureWarning log 2024-03-01 05:15:47 +00:00
rlaphoenix
289808b80c refactor(DASH): Move data values from track url to track data property 2024-03-01 05:08:59 +00:00
rlaphoenix
90c544966a refactor(Track): Rename extra to data, enforce type as dict
Setting data as a dictionary allows more places of code (including DASH, HLS, Services, etc) to get/set what they want by key instead of typically by index (list/tuple). Tuples or lists were typically in services because DASH and HLS stored needed data as a tuple and services did not want to interrupt or remove that data, even though it would be fine.
2024-03-01 04:29:45 +00:00
rlaphoenix
a6a5699577 refactor(Track): Move delete and move methods near start of Class 2024-03-01 04:15:46 +00:00
rlaphoenix
866de402fb refactor(Track): Return new path on move(), raise exceptions on errors 2024-03-01 04:14:44 +00:00
rlaphoenix
3ceabd0c74 feat(Track): Add a name property to use for the Track Name 2024-03-01 04:11:53 +00:00
rlaphoenix
2a6fb96c3d fix(Track): Don't use fallback values "Zzzz"/"ZZ" for track name 2024-03-01 04:11:53 +00:00
rlaphoenix
c14b37a696 fix(Track): Don't modify lang when getting name 2024-03-01 04:11:53 +00:00
rlaphoenix
5b7c72d270 refactor(Track): Move the path class instance variable with the rest 2024-03-01 04:11:52 +00:00
rlaphoenix
3358c4d203 refactor(Track): Remove unnecessary bool casting 2024-03-01 04:11:52 +00:00
rlaphoenix
6e9f977642 docs(Track): Remove unnecessary comments 2024-03-01 04:11:52 +00:00
rlaphoenix
bd90bd6dca feat(Track): Make ID optional, Automatically compute one if not provided 2024-03-01 04:11:52 +00:00
rlaphoenix
fa9db335d6 refactor(Track): Rename Descriptor's M3U & MPD to HLS & DASH 2024-03-01 04:11:52 +00:00
rlaphoenix
ec5bd39c1b refactor(Track): Remove unused DRM enum 2024-03-01 04:11:52 +00:00
rlaphoenix
ba693e214b refactor(Track): Remove swap() method and it's uses
Re-using the same track path and file name with a different output file, is not ideal as the files contents are different and the target file name specifies what processing it had done on it, which is useful during debugging when browsing the temp directory.
2024-03-01 03:04:07 +00:00
rlaphoenix
470e051100 refactor(Track): Add type checks, improve typing 2024-03-01 02:43:43 +00:00
rlaphoenix
944cfb0273 ci(pre-commit): Add a conventional-commit hook 2024-03-01 02:17:41 +00:00
rlaphoenix
27b3693cc1 docs(changelog): Add v3.0.0 changes 2024-03-01 00:03:09 +00:00
rlaphoenix
9aeab18dc3 Bump to v3.0.0 2024-03-01 00:01:33 +00:00
rlaphoenix
a5fb5d33f1 Update default curl-impersonate browser to chrome120 2024-02-29 23:58:14 +00:00
rlaphoenix
a55f4f6ac7 Update dependencies 2024-02-29 23:57:57 +00:00
rlaphoenix
1039de021b Update the copyright year and project description 2024-02-29 23:25:23 +00:00
rlaphoenix
be0ed0b0fb Simplify Tracks.__add__ method, support Chapter(s) & Track objects 2024-02-29 23:19:05 +00:00
rlaphoenix
97efb59e5f Only decode text direction entities in Sub files (cont.)
Already did this for HLS, but somehow forgot to for DASH and direct URLs.
2024-02-29 22:06:57 +00:00
rlaphoenix
4073cefc74 Remove Subtitle.remove_multi_lang_srt_header()
The root cause of the error which required calling this function was identified and fixed in this release.
2024-02-29 22:06:02 +00:00
Arias800
75641bc8ee
Add default shaka-packager build name (#74)
If the user build Shaka-packager manually, the default name will be “packager”.
Adding it to the list will ensure that Devine detects the app in this situation.
2024-02-27 22:48:54 +00:00
rlaphoenix
0c20160ddc Implement __add__ to Tracks class 2024-02-20 22:06:39 +00:00
rlaphoenix
eef397f2e8 HLS: Don't include map data if discontinuity/end of playlist was decrypted
The decrypt() call just before it would have included the map data for us, as it was needed to decrypt. Therefore, it would not need to be added again when merge_discontinuity() is called. In some cases re-adding the map data can cause playback or final merge failure.
2024-02-20 20:12:09 +00:00
rlaphoenix
b829ea5c5e DASH: Detect SDH subtitles via AudioPurposeCS:2007=2 2024-02-20 19:29:21 +00:00
rlaphoenix
7f898cf2df HLS: Fix map data exists check when merging segments
`map_data` may resolve Truthy, while `map_data[1]` itself could be None, resulting in `None` being written to the stream.
2024-02-20 02:14:58 +00:00
rlaphoenix
2635d06d58 Set stop event & mark track failed if new HLS DRM fails to license 2024-02-20 01:46:47 +00:00
rlaphoenix
8de3a95c6b Flush file buffers when merging DASH or HLS segments 2024-02-20 01:35:58 +00:00
rlaphoenix
1259a26b14 Create and use new utility to get file extension from URLs/Paths
Fixes #73
2024-02-19 18:14:50 +00:00
rlaphoenix
c826a702ab DASH: Fix URL concatenation in some edge cases
In some of the urljoin()'s it would end with `/None`, e.g., `http://.../some_base_value/None`, when it should just join with the base value only.
2024-02-19 17:45:40 +00:00
rlaphoenix
1b76e8ee28 Aria2c: Fix shutdown condition edge condition when URLs > 1000
`stopped_downloads` is capped to just 1000 objects even though I asked for 999999 downloads, so if aria2c is downloading more than 1000 URLs the count of stopped downloads will never match the count of download URLs and never stop.
2024-02-17 23:33:52 +00:00
rlaphoenix
d65d29efa3 Remove unnecessary LANGUAGE_MUX_MAP
This language tag/code mapping table is no longer needed as of MKVToolNix v67, which has been the minimum supported version for some time now already.
2024-02-17 23:19:07 +00:00
rlaphoenix
81dca063fa Consolidate typing of Requests/MozillaCookieJar typing to CookieJar 2024-02-16 21:02:06 +00:00
rlaphoenix
9e0515609f HLS: Ignore possible folders when doing naive final merge 2024-02-16 18:41:05 +00:00
rlaphoenix
323577a5fd HLS: Update first segment of EXT-X-KEY state data on discontinuity 2024-02-16 18:21:21 +00:00
rlaphoenix
e26e55caf3 HLS: Don't reset EXT-X-KEY state data on discontinuity 2024-02-16 16:50:12 +00:00
rlaphoenix
506ba0f615 HLS: Only merge relevant segments on discontinuity 2024-02-16 16:49:42 +00:00
rlaphoenix
2388c85894 HLS: Ensure all segments to decrypt in range exist 2024-02-16 16:49:13 +00:00
rlaphoenix
7587243aa2 HLS: Don't decrypt on key change if there were no prior segments 2024-02-16 16:48:38 +00:00
rlaphoenix
6a37fe9d1b HLS: Don't merge on discontinuity, if it's the first segment
How the m3u8 parser handles/groups #EXT-X to segment objects means the #EXT-X-DISCONTINUITY (`discontinuity` property) is tied to whatever segment is below it's line. Therefore, there's never a scenario where we need to merge+decrypt and the first every segment of the for loop, as there's no segments before it.

This can happen from just slightly off-spec playlists (can't blame it) but also from the OnSegmentFilter filtering out all segments before the first EXT-X-DISCONTINUITY. Common to happen when filtering out bumpers/intros.
2024-02-16 00:15:36 +00:00
rlaphoenix
eac5ed5b61 Aria2c: Fix completed progress information
For some reason aria2c has like 700 internal "download" structs per actual URL it was downloading, probably something to do with multiple connections/split, don't know don't care, as this way works just fine.
2024-02-15 23:54:10 +00:00
rlaphoenix
a8a89aab9c Aria2c: Fallback to an empty list if stopped_downloads is None
This was fine during most testing in the `for` loop below it, but there's also a len() a bit below that.
2024-02-15 23:45:44 +00:00
rlaphoenix
837015b4ea HLS: Fix incorrect last segment i when decrypting first segment 2024-02-15 23:44:00 +00:00
rlaphoenix
1f11ed258b DASH: Update progress bar when merging segments 2024-02-15 20:06:42 +00:00
rlaphoenix
4e12b867f1 Aria2c: Improve download progress and error handling 2024-02-15 19:19:37 +00:00
rlaphoenix
e8b07bf03a DASH: Don't set Range Header if no bytes range value
This caused a HTTP 501 Not Implemented on some CDNs.
2024-02-15 19:10:52 +00:00
rlaphoenix
630a9906ce Rework the Aria2c Downloader
- Downloads are now multithreaded directly in the downloader.
- Now reuses connections instead of having to close and reopen connections for every single download.
- Progress updates are now yielded back to the caller instead of drilling down a progress callable.
- Instead of parsing download progress information in a very hacky way from the stdout stream, use aria2's RPC interface.
- Added a new utility get_free_port which is needed to choose aria2's RPC port as I do not want to use the default port in case the user is already using this port for another tool or reason. Also, to try mitigate port scanning attacks that target aria2 RPC ports.
- The config entry `aria2c.max_concurrent_downloads` is now actually used by aria2c when downloading.
- The `--max-concurrent-downloads` option and config value now defaults to `min(32,(cpu_count+4))` (usually around 16 for above average systems) instead of 5.
- Automated pproxy proxy rerouter is made via subprocess instead of trying to re-do what the pproxy entry point does for us, less code, less trouble, and was ultimately easier to implement.
2024-02-15 17:26:39 +00:00
rlaphoenix
2b7fc929f6 Rework the HLS downloader, add support for new downloaders
- It now downloads all segment files multi-threaded first before any decryption or merging operations (excluding init data, which will be downloaded in sequence/order after all the segments are downloaded)
- Once all segments are downloaded it then starts to go through and do any merging/decryption/init data stuff/e.t.c afterwards.
- Segments are no longer decrypted one by one. If segments use the same EXT-X-KEY data, then they will be merged together and then decrypted. This should see a noticeable speed increase for Widevine DRM.
2024-02-15 17:26:39 +00:00
rlaphoenix
e5a330df7e Add support for the new Downloaders to direct URLs 2024-02-15 17:26:39 +00:00
rlaphoenix
a1ed083b74 Add support for the new Downloaders to DASH 2024-02-15 17:26:39 +00:00
rlaphoenix
0e96d18af6 Rework the Requests and Curl-Impersonate Downloaders
- Downloads are now multithreaded directly in the downloader.
- Requests and Curl-Impersonate use one singular Session for all downloads, keeping connections alive and cached so it doesn't have to close and reopen connections for every single download.
- Progress updates are now yielded back to the caller instead of drilling down a progress callable.
2024-02-15 17:26:39 +00:00
rlaphoenix
709901176e Use CRC32 instead of MD5 for Track IDs in DASH/HLS 2024-02-15 10:56:51 +00:00
rlaphoenix
bd185126b6 HLS: Skip merging continuity if all segments were skipped
If all segments of a continuity is skipped, i.e. by OnSegmentFilter, then this code fails as the folder wouldn't exist.
2024-02-13 17:03:42 +00:00
rlaphoenix
cd194e3192 Add new Track Event, OnSegmentDownloaded
Like OnDownloaded but called every time a DASH or HLS segment is downloaded. The path to the downloaded segment file is passed to the callable.
2024-02-10 18:10:09 +00:00
rlaphoenix
87779f4e7d Move Track OnDownloaded event before decryption 2024-02-10 18:05:35 +00:00
rlaphoenix
a98d1d98ac Add a new Subtitle Track Event, OnConverted
This runs after a Subtitle has been converted to another format, and only if it was converted. It is passed the new subtitle format codec value.
2024-02-10 18:05:35 +00:00
rlaphoenix
c18fe5706b Pass DRM and Segment objects to Track OnDecrypted event 2024-02-10 17:48:26 +00:00
rlaphoenix
439e376b38 No longer pass the track through track events
If you are setting a callable onto a track event, then you have access to the track variable, so just include/use that in your lambda/callable.
2024-02-10 17:47:12 +00:00
rlaphoenix
7be24a130d Give some documentation on Track events 2024-02-10 17:19:48 +00:00
rlaphoenix
8bf6e4d87e Improve typing of Chapters constructor 2024-02-10 12:47:14 +00:00
rlaphoenix
92e00ed667 Fix OGM Chapter Regex patterns in Chapters class 2024-02-10 12:42:17 +00:00
rlaphoenix
66edf577f9 Allow Chapter Timestamp to be float, fix typing 2024-02-10 12:35:02 +00:00
rlaphoenix
a544b1e867 Merge HLS segments first by discontinuity then via FFmpeg
HLS playlists where each segment is in an mp4 container seems to corrupt when the EXT-X-MAP is changed out, unless you first merge segments by discontinuity and then merge the merges via FFmpeg (which demuxes all the merged segment continuities and then concatanates them together, probably giving it new init data too).
2024-02-09 08:33:17 +00:00
rlaphoenix
167b45475e Only decode text direction entities in Sub files
Previously, all entities were decoded in Subtitle files because of a problem with SubtitleEdit and it's /ReverseRtlStartEnd option not being entity-aware.

It actually ends up reversing the `;` of `&rlm;`, instead of the actual value of `&rlm;`. Therefore, I decoded all entities before SubtitleEdit could have processed the Subtitle, but this has caused problems with more advanced formats like TTML and WebVTT as `&lt;` would decode to `<` causing syntax errors, among other problematic characters.

According to the TTML and WebVTT spec, html entity encoding is allowed, and that makes sense or you wouldn't be able to use `<` etc. Any failure for players to show the decoded character would be a player problem and be out of scope with Devine.
2024-02-05 12:37:21 +00:00
rlaphoenix
568cb616df Use /ConvertColorsToDialog when converting subs to SRT format
This is because SubtitleEdit keeps color-related information when converting to SRT from WebVTT, TTML, and such formats. Why? Not 100% sure. Maybe some players support colors, but generally if you are using SubRip, it's because you either only want basic text subs, or your player doesn't support these "fancy" ooh-la-la colors.

This is a better solution to just stripped out the information. As the option name suggests, it isn't just removing the color information but rather using it to detect different speakers, then appropriately "dialogify" the captions when needed. I.e., start each speaker's sentence with `- `, and separate them with a new line.

The dash-style dialog formatting is quite vital to know if a caption is all spoken by one speaker versus multiple. Not particularly necessary for non-SDH captioning, but would be wanted for SDH subtitles.
2024-02-05 12:10:33 +00:00
rlaphoenix
3b62b50e25 Add support for SegmentBase and BaseURL-only DASH Manifests 2024-02-05 10:22:40 +00:00
rlaphoenix
c06ea4cea8 Rework Chapter System, add Chapters class
Overall this commit is to just make working with Chapters a lot less manual and convoluted. The current system has you specify information that can easily be automated, like Chapter order and numbers, which is one of the main changes in this commit.

Note: This is a Breaking change and requires updates to your Service code. The `get_chapters()` method must be updated. For more information see the updated doc-string for `Service.get_chapters()`.

- Added new Chapters class which automatically sorts Chapters by timestamp.
- Chapter class has been significantly reworked to be much more generic. Most operations have been mvoed to the new Chapters class.
- Chapter objects can no longer specify a Chapter number. The number is now automatically set based on it's sorted order in the Chapters object, which is all done automatically.
- Chapter objects can now provide a timestamp in more formats. Timestamp's are now verified more efficiently.
- Chapter objects ID is now a crc32 hash of the timestamp and name instead of just basically their number.
- The Chapters object now also has an ID which is also a crc32 hash of all of the Chapter IDs it holds. This ID can be used for stuff like temp paths.
- `Service.get_chapters()` must now return a Chapters object. The Chapters object may be empty. The Chapters object must hold Chapter objects.
- Using `Chapter {N}` or `Act {N}` Chapters and so on is no longer permitted. You should instead leave the name blank if there's no descriptive name to use for it.
- If you or a user wants `Chapter {N}` names, then they can use the config option `chapter_fallback_name` set to `"Chapter {i:02}"`. See the config documentation for more info.
- Do not add a `00:00:00.000` Chapter, at all. This is automatically added for you if there's at least 1 Chapter with a timestamp after `00:00:00.000`.
2024-02-05 01:42:43 +00:00
rlaphoenix
2affb62ad0 Fix SegmentList source/media join with Base URL in DASH download_track() 2024-02-03 05:26:52 +00:00
rlaphoenix
30abe26321 Improve caching of keys to vaults log 2024-01-29 17:02:30 +00:00
rlaphoenix
3dbe0caa52 Fix Cookie update at the end of dl command 2024-01-29 16:28:40 +00:00
rlaphoenix
837061cf91 Rework Profile/Authentication System
- Removed `devine auth` command and sub-commands due to lack of support, risk of data, and general quirks of it.
- Removed `profiles` config data, you must now specify which profile you wish to use each time with -p/--profile. If you use a specific profile a lot more than others, you should make it the default. See below.
- Added a `default` key to each service mapping in `credentials` that will be used if -p/--profile is not specified.
- Each service mapping in `credentials` is no longer forced to use profiles. You can now simply specify `Service: username:password` if you only use one credential.
- Auth-less Services now simply have to specify no credential and have no cookie file.
- There is no longer an error for not having a cookie and/or credential for the chosen profile, as a profile no longer has to be chosen.
- Cookies are now checked for in 3 different locations in the following order:
1. `/Cookies/{Service Name}.txt`
2. `/Cookies/Service Name/{profile}.txt`
3. `/Cookies/Service Name/default.txt`
This means you now have more options on organization and layout of Cookie files, similarly to the new Credentials config.
Note: `/Cookies/Service Name/.txt` also works as an alternative to `default.txt`. The benefit of this is `.txt` will always be at the top of your folder.
2024-01-29 06:34:22 +00:00
rlaphoenix
1c6e91b6f9 Rename --group to --tag 2024-01-29 03:54:17 +00:00
rlaphoenix
e9dc53735c Fix BaseURLs starting with ../ in DASH download_track() 2024-01-29 03:26:15 +00:00
rlaphoenix
e967c7c8d1 Add custom RESTful Vault API Interface 2024-01-24 20:09:59 +00:00
rlaphoenix
c08c45fc16 Prioritize loading configs next to devine over other locations 2024-01-24 18:44:01 +00:00
rlaphoenix
3b788c221a Look for a config file in 2 more locations
This is to aid using Devine in a portable folder by trying to load configs next to Devine's code.
2024-01-24 18:41:24 +00:00
rlaphoenix
21687e6649 No longer create an empty config in the user configs folder 2024-01-24 18:39:36 +00:00
rlaphoenix
de7122a179 Add basic control file to Requests and Curl-Impersonate downloaders 2024-01-23 10:06:42 +00:00
rlaphoenix
c53330046c Improve Dependencies list in README 2024-01-23 09:57:04 +00:00
rlaphoenix
6450d4d447 Change default downloader from aria2c to requests
This is to reduce the amount of required dependencies by not strictly requiring aria2c out of the box. You can always change the downloader back to aria2c in the config.
2024-01-23 09:56:25 +00:00
rlaphoenix
5e858e1259 Delete file on failure in Requests and Curl-Impersonate downloaders 2024-01-23 09:46:24 +00:00
rlaphoenix
ba93c78b99 Add missing while loop to Curl-Impersonate downloader 2024-01-23 09:45:31 +00:00
rlaphoenix
172ab64017 Add missing while loop to Requests downloader 2024-01-21 18:47:19 +00:00
rlaphoenix
2056e056a4 Unescape HTML Entities in Subtitles after Downloading
This fixes some Subtitles having e.g., `&amp;` instead of just `&`, but especially for special entities like `&rlm;` which enables Right-to-Left mode on Hebrew and Arabic Subtitles.
2024-01-18 16:25:39 +00:00
rlaphoenix
26d067915f Fix output directory and filename for single-URL aria2c downloads 2024-01-17 04:49:37 +00:00
rlaphoenix
746c55d188 Fix progress total on single-URL requests downloads
Previously, it would show the download as fully complete after the first 1024-byte chunk was downloaded, as the Progress Bar total value was set to the amount of URLs. This is because it assumed there would be multiple URLs to download at once, and would advance the progress bar each time one of the downloads completed instead.

This changes it so that if there's only one URL to download, then it calculates the total amount of chunks to download which corrects the progress bar advances.
2024-01-14 01:24:51 +00:00
rlaphoenix
0493d28914 Manually specify the output format with Shaka-Packager
It normally auto-detects the format from the file extension. The supports formats are "MP4" and "WEBM". The input files to shaka-packager are currently always ".mp4", so this isn't particularly an issue.

However, I want to add this just as a pre-caution in case it isn't. This isn't an issue if the input file is another format, like WEBM, as this only controls the output format, the format devine wants, not the input and output format.
2024-01-12 01:17:18 +00:00
rlaphoenix
0116c278af Absorb original file and path in Decrypt, Repack, & Range Operations
To possibly support download resuming in the future, the file names for the decrypt, repack, and change range functions were simplified and once output has finished it then deletes the original input file and re-uses the original input file path.

The file names were changed to just append `_repack`, `_decrypted`, `_full_range` etc. to the filename rather than using a duplex extension (`.repack.mp4`, `.decrypted.mp4`, `.range0.mp4`).

This is all so that code to check if the file was already downloaded can be simpler. Instead of having to check if 4x different possible file names for a completed download existed, it checks one.
2024-01-12 01:11:47 +00:00
rlaphoenix
ee56bc87c2 Use new Subtitle.convert() in dl command for --sub-format 2024-01-12 00:51:06 +00:00
rlaphoenix
e76bc7201d Add convert() method to Subtitle class 2024-01-12 00:50:27 +00:00
rlaphoenix
f4d8bc8dd0 Add support for parsing SubRip (SRT) in Subtitle.parse() 2024-01-12 00:37:22 +00:00
rlaphoenix
14ebe4ee1b Ensure input is UTF-8 when parsing TTML and WebVTT Subtitles
This fixes some conversion errors when working with non-latin languages like Russian (crylic) and Arabic.
2024-01-12 00:36:43 +00:00
rlaphoenix
96f1cbb260 Remove empty caption lists post-parsing in Subtitle.parse()
This issue is common with Now TV where it for some reason parses into "two" languages. "en" and "eng". This results in one empty caption list, and one non empty caption list. The empty caption list tends to be first.

This issue causes a multitude of snowballing problems later down the codebase like when converting to SRT it will result in "MULTI-LANGUAGE SRT" header, which most programs do not recognize, like mkvmerge, causing a mux failure.
2024-01-12 00:30:52 +00:00
rlaphoenix
9683c34337 Improve readability of Subtitle.parse() method 2024-01-12 00:27:19 +00:00
rlaphoenix
c6c2e9ca51 Add Curl-Impersonate Downloader via curl_cffi project
The browser to imitate can be set in the config:

For example,
```yaml
curl_impersonate:
    browser: chrome110
```

It will default to using chrome110 if no value is set in the config.

A list of available Browsers are listed here: https://github.com/yifeikong/curl_cffi#sessions
2024-01-11 22:29:49 +00:00
rlaphoenix
a9de9748ec Remove saldl from downloaders config docs 2024-01-09 22:35:45 +00:00
rlaphoenix
e8e3d4a90f Remove 5-attempt loop from DASH and HLS Downloads
These are unnecessary now as all downloaders have retry functionality built-in.
2024-01-09 13:00:39 +00:00
rlaphoenix
cc4900a2ed Remove uses of the downloader's silent arg in DASH and HLS
This was originally done to prevent *all* aria2c logs unless on the last attempt, at which if it failed all attempts it would let aria2c log the error.

However, that's bad practice as aria2c may produce errors or warnings on say the 3rd attempt, and the 3rd attempt may have otherwise succeeded, with warnings or errors. It also generally shouldn't be necessary.
2024-01-09 12:54:27 +00:00
rlaphoenix
009a880371 Silence at the log_buffer not the stdout in aria2c
This is so we can still obtain progress data while calling aria2c silently
2024-01-09 12:52:14 +00:00
rlaphoenix
9f04676b5c Get Cookie Header for each URL in aria2c 2024-01-09 12:41:15 +00:00
rlaphoenix
552a0f13a5 Add retry attempts to Requests downloader 2024-01-09 12:09:21 +00:00
rlaphoenix
fa3cee11b7 Move Download Cancel/Skip Events to constants 2024-01-09 11:55:05 +00:00
rlaphoenix
ce457df151 Change wording from Download Stopped to Download Cancelled 2024-01-09 11:38:58 +00:00
rlaphoenix
d566aa2547 Show Licensing and Licensed Messages via Rich 2024-01-09 11:34:14 +00:00
rlaphoenix
09edb696ba Change to safer default values for -j, -x, and -s in aria2c
The original values would cause blocks by some Services. Therefore, it is better to default to safer values. The new values match the defaults used by aria2c as listed in their docs.
2024-01-09 10:22:28 +00:00
rlaphoenix
a7bbac7bcc Get -j, -x, and -s from aria2c config, default to 16 2024-01-09 10:18:52 +00:00
rlaphoenix
dbfefc1d97 Pretty up and improve readability of aria2c arguments 2024-01-09 10:05:03 +00:00
rlaphoenix
316f8f0530 Set Referer & User-Agent via dedicated options instead Header in aria2c 2024-01-09 09:57:31 +00:00
rlaphoenix
347c31d717 No longer retrieve timestamp of downloads in aria2c
For downloads by devine, there's generally no reason to retrieve this information when it will be decrypted, repacked, remuxed, and so on anyway. Requesting the timestamp will just mean more requests being made, perhaps slowing down the download.
2024-01-09 09:56:15 +00:00
rlaphoenix
e54d4b4f41 Move unsupported proxy check to start of aria2c function 2024-01-09 09:55:12 +00:00
rlaphoenix
484338cf50 Remove unnecessary --min-split-size from aria2c downloader
This was added by another team member a long time ago, seemingly for the purposes of preventing a split on DASH/HLS segment files, as they would be already quite small.

However, just because they are small it isn't exactly a problem to have it split, and it would only split if the segment file size fits the default split size of 20M at least twice. I.e., if the segment is 45M, it will split twice. If the segment is 25M, it actually won't split at all. You may think 25M will split by 20M into two downloads, but actually the split size must explicitly fit for it to split. So for 2 downloads it will need to be 40MB in size, then 60, then 80, and so on.

A 40M or bigger segment file does in my opinion deserve to be split as it may genuinely reap speed benefits.
2024-01-09 09:52:22 +00:00
rlaphoenix
a3ab971132 Fix infinite loop in Track.get_init_segment
If the Server returns a Content-Length Header with a value of 0, then the code near-after it would end up looping response streamed chunks of 0-length size, which would go on forever.
2024-01-09 02:45:10 +00:00
rlaphoenix
58cb00b18b Implement --no-proxy to disable all uses proxies and proxy providers
This prevents a service from setting a proxy if geofenced, and also discards any manually provided proxy from `--proxy`.
2024-01-09 02:40:49 +00:00
rlaphoenix
f28a6dc28a Fix usage of __all__ 2024-01-09 02:31:22 +00:00
rlaphoenix
2291f90f64 Re-map Video Transfer value 5 to 6
This is seen in some manifests/services for whatever reason. I can't find documentation for this value anywhere. It seems unused in official specifications as of right now. However, it seems in some services/places it is unofficially used as a PAL-version of BT-601 transfer, which makes sense.

Devine's code (and other services) wouldn't care about the difference here so currently it is just implemented as a remap from 5 to 6. In the future it may be changed and actually defined as two seperate BT_601 Transfer enum entries.
2024-01-08 23:56:45 +00:00
rlaphoenix
d690ca4d13 Skip audio track filtering if there's no audio tracks
This also bypasses the warning log about the audio likely being part of an invariant playlist, which may be true it is too specific of a warning when it could be multiple other reasons why.
2023-12-29 21:19:53 +00:00
rlaphoenix
c0d940b17b Remove Track.needs_proxy
Ok, so there's a few reasons this was done.

1) Design-wise it isn't valid to have --proxy (or via config/otherwise) set a proxy, then unpredictably have it bypassed or disabled. If I specify `--proxy 127.0.0.1:8080`, I would expect it to use that proxy for all communication indefinitely, not switch in and out depending on the track or service.

2) With reason 1, it's also a security problem. The only reason I implemented it in the first place was so I could download faster on my home connection. This means I would authenticate and call APIs under a proxy, then suddenly download manifests and segments e.t.c under my home connection. A competent service could see that as an indicator of bad play and flag you.

3) Maintaining this setup across the codebase is extremely annoying, especially because of how proxies are setup/used by Requests in the Session. There's no way to tell a request session to temporarily disable the proxy and turn it back on later, without having to get the proxy from the session (in an annoying way) store it, then remove it, make the calls, then assuming your still in the same function you can add it back. If you're not in the same function, well, time for some spaghetti code.

---

tldr; -1 ux/design/expectations with CLI, -1 security aspect, -1 code maintenance, but only +1 for potentially increased download speeds in certain scenarios.
2023-12-29 20:25:57 +00:00
rlaphoenix
3c1c408ccd Remove forced removal of Multi-Language SRT header
Services needing this done should apply it themselves, e.g. OnMultiplex. A convenience function to do it is available now as `Subtitle.remove_multi_lang_srt_header()`, so you can do e.g., `subtitle.OnMultiplex = remove_multi_lang_srt_header` and it will pass through this function just before muxing.
2023-12-29 16:39:45 +00:00
rlaphoenix
53de34da51 Add remove_multi_lang_srt_header() method to Subtitle class 2023-12-29 16:39:45 +00:00
rlaphoenix
e7e18a4204 Use same output subtitle format as input codec to SubtitleEdit calls 2023-12-29 16:39:45 +00:00
rlaphoenix
7cc7227f8c Specify utf8 with SubtitleEdit when stripping hearing impaired 2023-12-29 16:02:10 +00:00
rlaphoenix
d94d6042b7 Fix Chapter Encoding on Windows when muxing with mkvmerge
On Windows it seems to default to some encoding other than UTF-8 (possibly UTF-16 or CP-1252) and since the chapter file is saved as UTF-8, it breaks characters outside typical range. Like ø, æ, and other stuff.
2023-12-03 15:04:58 +00:00
rlaphoenix
308ddbd394 Improve private forking instructions in README 2023-12-03 00:17:04 +00:00
rlaphoenix
7cec16d8ab Validate track languages in HLS.to_tracks 2023-12-02 22:40:41 +00:00
rlaphoenix
86635f9b7f Add Support for Python 3.12, update dependencies 2023-12-02 21:17:41 +00:00
rlaphoenix
8cd6dfb65a Implement --sub-format in dl to set output subtitle format
The default is still SubRip SRT, but you can now change the output format to almost any of the available Codec options. There is no option to leave the subtitle format as-is yet. I.e., if there's a SRT and WebVTT subtitle, leave them both as-is.

Like always, you can configure a default in your config file, e.g.,

```yaml
dl:
  sub_format: vtt
```

Note though that SSA, SSAv4, fTTML, and fVTT are not yet supported. There are no plans to support fTTML or fVTT.
2023-12-02 17:56:40 +00:00
rlaphoenix
e87de50940 Exclude fragmented Sub Codecs from DASH UTF-8 checks
Chardet was detecting a mixture of mostly cp1252 and MacRoman encoding, where it should just be left as-is when parsing. The actual text within it perhaps may want to go through `try_ensure_utf8` when parsed, but not the entire box.
2023-12-02 17:44:47 +00:00
rlaphoenix
0be62541ba Handle chardet returning None as encoding 2023-12-02 15:10:00 +00:00
Shivelight
c31ee338dc
Add option for automatic subtitle character encoding normalization (#68)
* Add option for automatic subtitle character encoding normalization

The rationale behind this function is that some services use ISO-8859-1
(latin1) or Windows-1252 (CP-1252) instead of UTF-8 encoding, whether
intentionally or accidentally. Some services even stream subtitles with
malformed/mixed encoding (each segment has a different encoding).

* Remove Subtitle parameter `auto_fix_encoding`

Just always attempt to fix encoding. If the subtitle is neither UTF-8 nor CP-1252, then it should realistically error out instead of producing garbage Subtitle data anyway.

* Move Subtitle encoding fixing code out of if drm tree

* Use chardet as a last ditch effort fixing Subs, or return original data

* Move Subtitle.fix_encoding method to utilities as try_ensure_utf8

* Add Shivelight as a contributor

---------

Co-authored-by: rlaphoenix <rlaphoenix@pm.me>
2023-12-02 11:00:55 +00:00
rlaphoenix
4b8cfabaac Fix all Ruff and isort linter errors 2023-12-02 09:57:13 +00:00
rlaphoenix
959590a6bb Overhaul tooling, linting, editor configs, and README 2023-12-02 09:57:13 +00:00
rlaphoenix
c159672181 Update Video.Range.from_cicp with changes in H.Sup19 (04/21)
Note: There is some breaking changes here. If you manually worked with the Enum names here, then some of them have changed to better reflect the code points usage.

Generally speaking it should not affect service code.
2023-09-04 00:48:50 +01:00
rlaphoenix
aff40df7d1 Raise CalledProcessError if Shaka logs an error
This seems to be necessary as Shaka-packager seems to always return exit code 0, even on errors.
2023-07-15 18:13:24 +01:00
rlaphoenix
f3cfaa3ab3 Fix DASH FPS error when SegmentBase is not found 2023-07-15 18:08:01 +01:00
rlaphoenix
883c9ae063 docs: Add Discord badge to README 2023-07-09 14:41:11 +01:00
rlaphoenix
a31cb6aa2f deps: Update all dependencies 2023-07-07 18:20:49 +01:00
rlaphoenix
bfceb15f14 docs: Remove portable installation steps and info
I'm not happy with the approach used here to make portable installations of Devine, therefore for now I will remove the information relating to portable installations.
2023-07-04 03:03:07 +01:00
rlaphoenix
9aafa3d8df Add missing cookies param on aria2c function recursion 2023-06-01 00:40:13 +01:00
rlaphoenix
a01766c60b Remove the saldl downloader 2023-05-31 23:04:48 +01:00
rlaphoenix
d369e6134c Add function to fix Start/End Chars on Subtitles 2023-05-30 20:22:40 +01:00
rlaphoenix
6cfbaa7db1 Pass cookies to the aria2c and requests downloaders
For aria2c I've simplified the operation by offloading most of the work for creating a cookie header by just re-doing what Python-requests does. This results in the exact same cookies Python-requests would have used in a requests.get() call or such. It supports multiple of the same-name cookies under different domains/paths based on the URI of the mock request.
2023-05-29 22:23:39 +01:00
rlaphoenix
1ff4858ca7 Fix mistake in Web Address for FFmpeg in README 2023-05-28 19:46:55 +01:00
rlaphoenix
fd52073605 Skip merging of HLS segments if --skip-dl is used
Partially fixes #61
2023-05-27 20:20:07 +01:00
rlaphoenix
89f5e04348 Bump requests from 2.28.2 to 2.31.0 2023-05-27 20:15:51 +01:00
rlaphoenix
57af8d98c9 Add --video-only flag to dl command 2023-05-26 11:16:12 +01:00
rlaphoenix
215730663b Allow --audio/subs/chapters-only to be used simultaneously
E.g., if you only wanted the subs and chapters, this would now be possible with `--subs-only --chapters-only`.
2023-05-26 11:15:38 +01:00
rlaphoenix
6a9598021d Re-raise errors when loading WVD files so it's more understandable
It also looks for the "expected 2 but parsed 1" error which is likely an error while parsing the WVD version field. If this happens, it will inform the user to use `pywidevine migrate`.
2023-05-25 04:45:49 +01:00
rlaphoenix
a24633fe61 List available Services on error
This is mainly to lessen confusion on service name typo's or new users getting used to the CLI.

It also changed the Exceptions on the methods of Service from ClickException to a KeyError since they are intended to be used on the core codebase outside of the context of Click.
2023-05-25 04:37:17 +01:00
rlaphoenix
df2f9b85ae Use urljoin instead of an if check and + op in HLS
This used to be used even before devine was public, but it was constantly changed back and forth between an urljoin(), another form of urljoin (something custom or something I can't remember), and an if check + addition.

However, I can confirm that a simple if check will not work as the Base URI might not even be in the same relative root. The if checks have also been inconsistent with some checking if it starts with http(s)://, and some checking if it does not have the base URI at the start of the string.

This if check method does not work as well as an urljoin() has the potential to. It also fixes some services as some HLS playlists would have the m3u8 URL on a completely different root, subdomain, or even domain, causing it to completely break when trying to download segments.
2023-05-21 00:06:30 +01:00
rlaphoenix
301c026ca9 Remove Smart/Fancy Left/Right Quotation Marks from Filename Sanitizer 2023-05-20 22:10:55 +01:00
rlaphoenix
8df04de1ea Remove file size check from Requests downloader
We cannot actually do this check. The Content-Length value will be the size after being further encoded or compressed. While we can find out what it was compressed with via the Content-Encoding header, we cannot match the downloaded length with the Content-Length header as requests will automatically decompress/decode according to the Content-Encoding header.
2023-05-19 22:11:05 +01:00
rlaphoenix
8ada6165e3 Set stop event & mark track failed if DASH DRM fails to license 2023-05-19 19:07:35 +01:00
rlaphoenix
6e844409ae Set stop event & mark track failed if HLS Session DRM fails to license 2023-05-19 19:07:06 +01:00
rlaphoenix
c9ecab444f Use range offset when calculating HLS init map byte ranges 2023-05-19 18:38:33 +01:00
rlaphoenix
3e0b7ef200 Fix regression where Range header is accidentally kept and re-used 2023-05-19 00:35:46 +01:00
rlaphoenix
8e7a63f0b9 Fix the file move in wvd add when the WVDs folder does not exist
On new installs, or where the `WVDs` folder is not made yet, then the shutil.move() assumes it's a file path and moves the `.wvd` file to the WVDs folder path, as a file. If the folder existed but was empty, this error wouldn't have occurred.
2023-05-19 00:35:46 +01:00
rlaphoenix
55a86ac6c9 Fix filesize.decimal call in requests downloader size exception 2023-05-17 03:32:08 +01:00
rlaphoenix
dd64212ad2 Move download_segment() from DASH/HLS download_track() to Class
Various overall small readability improvements have also been made.
2023-05-17 03:20:01 +01:00
rlaphoenix
03c012f88e Move the Downloaded msg after Decrypt mgs in DASH/URL downloads 2023-05-17 02:09:16 +01:00
rlaphoenix
6cdde3efb0 Override the downloader more efficiently in DASH/HLS when Range is used 2023-05-17 01:33:06 +01:00
rlaphoenix
6d4be8620c Only write segment data if the tfhd fix was necessary in DASH 2023-05-17 01:22:59 +01:00
rlaphoenix
681d69d5e5 Mark DASH and URL tracks as Decrypting when using shaka
DASH and normal URL downloads now both decrypt one large single or merged file after all downloads are finished. This leaves a bit of a "pause" between progress bar movement which looks a bit odd. So mark the track as being in a Decrypting state.
2023-05-16 22:01:07 +01:00
rlaphoenix
a45c784569 Replace download speeds with "Downloaded" text when finished 2023-05-16 21:59:03 +01:00
rlaphoenix
2a8307b98d Decrypt DASH downloads after merging all segments
Since DASH doesn't have the ability to change keys dynamically per-track (Representation), there's no need for the DASH downloader to decrypt segments as they are downloaded (like HLS).

This halves the amount of processes needing to be opened as well as the I/O usage. It may result in noticeably lower CPU usage. Since the IOPS is lowered, you may even see an increase in download speed if downloading to something like a meh HDD.

This also fixes decryption in some weird edge-cases where decrypting each segment individually resulted in timestamp anomalies causing shaka to fail.
2023-05-16 21:55:53 +01:00
rlaphoenix
bdc1203514 Only verify download size in requests downloader if possible
Some Servers may not response with the Content-Length header, even if it's from segmented media. I.e., if it's a subtitle URL. The requests downloader required the header to be present as it downloads each URL, which is not possible.

Now it tries to get it if possible, and verifies the download size with the Content-Length value if it could be obtained.
2023-05-16 20:49:43 +01:00
rlaphoenix
2a4e9505f1 Remove unnecessary HEAD calls in requests downloader
HEAD requests were made to sum a total file size of the download operation. However, the downloader is may be used on URLs where the content is not segmented media. Therefore, the server may not support or respond with the Content-Length header which causes the requests downloader to crash before it even gets a chance to begin downloading.

Even still, this total size value isn't really necessary, and would cause possibly 100s of HEAD requests (in quick succession of each other) on segmented sources. It would also add up-front delay before it actually starts to download.
2023-05-16 20:47:26 +01:00
rlaphoenix
e7dc138c0f Improve readability and documentation of DASH's to_tracks function 2023-05-15 16:19:53 +01:00
rlaphoenix
e079febe79 Ensure output directory exists in requests downloader 2023-05-15 13:33:59 +01:00
rlaphoenix
95802d1e64 Fix regression with downloader mapper on aria2c and saldl
The setup I had for using asyncio.run with functools.partial didn't actually pan out. A full pass-through lambda is required.

I've also moved the mapped downloader variable to the root of the downloaders package.
2023-05-12 12:19:34 +01:00
rlaphoenix
be403bbff4 Implement a Python-requests-based downloader 2023-05-12 07:02:39 +01:00
rlaphoenix
cb82febb7c Add ability to choose downloader via config 2023-05-12 06:42:33 +01:00
rlaphoenix
b92708ef45 Alter behaviour of --skip-dl to allow DRM licensing
Most people used --skip-dl just to license the DRM pre-v1.3.0. Which makes sense, --skip-dl is otherwise a pointless feature. I've fixed it so that --skip-dl worked like before, allowing license calls, while still supporting the new per-segment features post-v1.3.0.

Fixes #37
2023-05-11 22:17:41 +01:00
rlaphoenix
3ec317e9d6 Pass manifest to DASH downloader instead of re-requesting it
Fixes #51
2023-05-11 20:46:37 +01:00
rlaphoenix
5ca2f256d5 Fix URL used on final chance to get Track KID on DASH downloads
segments[0] is the first tuple, of two values. The URL and an optional byte range. So this accidentally passed the tuple rather than the URL within the tuple.

Fixes #54
2023-05-09 13:04:20 +01:00
rlaphoenix
1668647e4d Change ConstError for ValidationError when ignoring tenc errors
The change to pymp4 may result in it complaining when validating parameters of the tenc box data, i.e., if the version byte is not 0 or 1, e.t.c. It shouldn't do a ConstError on tenc boxes anymore as the definition of the tenc box has been much improved in pymp4 v1.4.0.
2023-05-08 17:41:55 +01:00
rlaphoenix
bf82065400 Move back to the official pymp4 dependency
All the latest changes in my fork is now available on the official pymp4 repository, and a new PyPI release v1.4.0 is available with the fixes to use.
2023-05-07 18:30:42 +01:00
rlaphoenix
3ae0fb3454 Update Actions in GitHub Actions CI/CD workflows 2023-05-03 03:05:11 +01:00
rlaphoenix
1c5099440b Add FLAC to the Audio Codecs enum and relevant methods 2023-05-01 18:49:25 +01:00
rlaphoenix
e3941e4640 Remove now unnecessary explicit version checks on tenc boxes
pymp4 was updated to automatically do this during parsing and building of tenc boxes. Therefore it would instead fail parsing if the version is not 0 or 1.
2023-04-23 22:39:14 +01:00
rlaphoenix
2b07399f5a Update dependency on my fork of pymp4
It may look like I'm downgrading, but I'm not.

1.5.0 was an incorrect version bump based on TrueDread's fork. v1.4.0 is the next release after, even though it's a lower version. v1.4.0 should be used instead.
2023-04-23 22:36:41 +01:00
rlaphoenix
b5263491ab Update Changelog for v2.2.0 2023-04-23 18:38:57 +01:00
rlaphoenix
bd40c38d23 Bump to v2.2.0 2023-04-23 18:38:47 +01:00
rlaphoenix
630832e434 Ignore failed parsing of tenc boxes
Some services accidentally (? I presume) mix up the `tenc` box's data with that of an `avc1` box or similar. This causes total failure and crashing. However, in these scenarios there's usually a 2nd box further down the stream that is not an error and will parse correctly. So just skip these errors and continue.
2023-04-23 17:38:12 +01:00
rlaphoenix
86322159b6 Fix multiplexing of downloads without a video track
E.g., --subs-only and --audio-only
2023-04-17 14:09:36 +01:00
rlaphoenix
96aa7c1e0a Fix segmented vtt merging code
This got 'broken' after moving to my fork of pymp4 because my fork has commits by TrueDread that add support for the vttc, payl, and sttg boxes, therefore they no longer contain `data` fields but rather specifically parsed fields. I also no longer need to parse the data stream of vttc boxes, as they are already parsed as `children`.
2023-04-04 20:15:18 +01:00
rlaphoenix
62965f8e21 Skip tenc boxes that are not version 0 or 1
It seems tenc boxes should only be 0 or 1, yet sometimes it can be above that. It seems some services accidentally use the `tenc` atom code over another. E.g., Netflix sometimes using `tenc` instead of `avc1`.
2023-03-28 21:27:43 +01:00
rlaphoenix
eb39c8eba6 Ensure QualityList returns resolutions in descending order 2023-03-28 20:46:13 +01:00
rlaphoenix
b301fb4390 Use tenc boxes within uuid boxes as another fallback for get_key_id 2023-03-28 20:25:30 +01:00
rlaphoenix
5b7fadbc55 Use -nobom with CCExtractor 2023-03-27 19:25:21 +01:00
rlaphoenix
527cd4cca1 Fix regression where only last mux would be moved to dl folder 2023-03-27 18:51:32 +01:00
rlaphoenix
8c14b73bc1 Replace abandoned pymp4 with my fork of pymp4
My fork contains various fixes by beardypig and truedread that do not have a release on PyPI yet.
2023-03-27 18:41:28 +01:00
rlaphoenix
0a128e1f70 Fix regression where no-video dls fail, improve multiplex progress 2023-03-26 23:13:11 +01:00
rlaphoenix
8f5bbeb8e3 Split the download finished log into dl/title finish logs 2023-03-26 22:42:16 +01:00
rlaphoenix
0b2e3e2255 Remove the muxed download path log
It's not very appealing, nor is it info the user realistically needs for every download. They could check `devine env info` to see where downloads go if they are unsure.
2023-03-26 22:41:23 +01:00
rlaphoenix
2a8e86b057 Delete video tracks as they are muxed
This reduces the total cumulative temp download folder size.
2023-03-26 22:40:05 +01:00
rlaphoenix
0c6d0986e4 Remove now unused and superseded with_resolution method 2023-03-26 20:21:31 +01:00
rlaphoenix
63eeeca910 Fix regression where all videos are selected if --quality isnt used 2023-03-26 20:18:15 +01:00
Hollander_1908
d894e5bbe0
Was not able to use the initialization from a DASH segment_list (#47)
* Was not able to use the initialization from a DASH segment_list

* Check if initialization in DASH has attribute range
2023-03-26 20:01:17 +01:00
rlaphoenix
33a9c307f3 Add ability to download multiple resolutions per title
Closes #26
2023-03-26 19:28:46 +01:00
rlaphoenix
71cf2b4016 Fix rare issue where DASH/HLS dl speed divides by 0 2023-03-26 14:30:12 +01:00
rlaphoenix
1c73e8d7fa Fix CCExtractor failing in edge cases by repacking first 2023-03-26 12:04:16 +01:00
rlaphoenix
bf3219b4e8 Ignore empty files when decrypting with Widevine
This happens because the input file has no actual useful data, likely just a bunch of headers and nothing of use. Therefore shaka skips and returns OK, yet didnt make any file at the decrypted path.

This fixes a crash when it tries to move the non-existent decrypted file to the input file location, causing the rest of the download to halt.
2023-03-17 21:09:09 +00:00
rlaphoenix
f4a9d6c0b1 Replace negative size values in TTML text with 0
Negative size values are not allowed by the spec basically anywhere in the document. Some services seem to accidentally specify a negative value which puts pycaption on a fritz.
2023-03-17 19:28:55 +00:00
rlaphoenix
41018d4574 Don't absorb error messages on Caption Syntax Errors 2023-03-17 18:56:53 +00:00
rlaphoenix
2a4dfb3e93 Improve anti-duplicate checks in Widevine tree logs
While it already has anti-duplicate checks, these checks did not take into account the `*` indicator, or `from x vault` e.t.c. This rework of the duplicate check ignores all messages.
2023-03-17 01:52:47 +00:00
rlaphoenix
df4dd73271 Bump to v2.1.0 2023-03-16 20:51:45 +00:00
rlaphoenix
6e888a095e Silence SubtitleEdit when stripping SDH 2023-03-16 20:49:23 +00:00
rlaphoenix
c778a890cf Fix calls for Audio & Subtitle's OnMultiplex event 2023-03-15 05:50:12 +00:00
rlaphoenix
0ac1955db6 Ignore aria2's "If errors see the log file" logs 2023-03-15 03:13:41 +00:00
rlaphoenix
d3cfc722dc
Merge pull request #43 from Hollander-1908/patch-1
DASH: improved forced subtitle recognition
2023-03-14 12:39:21 +00:00
rlaphoenix
cd2831fe82 Add Hollander-1908 to Contributors list 2023-03-14 12:39:05 +00:00
Hollander_1908
5eedbe1f59
DASH: improved forced subtitle recognition
Some manifests uses value `forced_subtitle` instead of the regular `forced-subtitle`.
This way both are recognized.
2023-03-14 12:56:34 +01:00
rlaphoenix
36c530ccc6 Add support for JS-style 13-char timestamps to Cacher 2023-03-13 22:48:49 +00:00
rlaphoenix
ddf1c519e0 Try get track language from representation ID on DASH playlists 2023-03-13 01:09:52 +00:00
rlaphoenix
7ca58c96ab Shorten variable aria_log_buffer to log_buffer 2023-03-12 00:12:48 +00:00
rlaphoenix
90818f201d Include the mkvmerge return code on error 2023-03-12 00:10:54 +00:00
rlaphoenix
d8acdda044 Silence DASH and HLS logs unless it's the last attempt 2023-03-12 00:09:02 +00:00
rlaphoenix
055bc927f5 Add a 5-attempt retry system to DASH & HLS downloads 2023-03-11 19:28:02 +00:00
rlaphoenix
111dac9264 Fix association of preceding HLS EXT-X-KEYs with m3u8 fork
This will improve efficiency and accuracy of getting appropriate DRM systems when downloading segments.

This can dramatically improve download speed from less than 50 kb/s to full speed if the HLS playlist used a lot of AES-128 EXT-X-KEYs. E.g., a unique key for each segment.

This was caused because the HLS.get_drm function took EVERY EXT-X-KEY, checked for supported systems, loaded them, and returned the supported objects. This meant it could load possibly 100s of AES-128 ClearKey objects (likely requiring URL downloads for the key URI) causing a huge delay before downloading each segment.
2023-03-09 21:46:48 +00:00
rlaphoenix
7bb215d496 Prevent licensing Widevine DRM a second time
If you use --cdm-only, you will end up licensing multiple times if the PSSH has more than one Key ID. While this is checked for, with the KID check now being more lenient, it will end up continuing and licensing again.

However, even with the original check code this would have been pointless. If the first license did not return a content key for a KID, then the next license call with the exact same parameters wouldn't have either
2023-03-08 22:48:36 +00:00
rlaphoenix
abf6c71688 Specify HLS Track Key IDs to prepare_drm
This also moves the init data code before drm related code, just so it has the init data ready to retrieve the Key ID from.
2023-03-08 22:45:41 +00:00
rlaphoenix
da7acb0417 Specify URL Track Key IDs to prepare_drm 2023-03-08 22:42:25 +00:00
rlaphoenix
a549cc6afb Specify DASH Track Key IDs to prepare_drm 2023-03-08 22:41:58 +00:00
rlaphoenix
923cb71f81 Only raise error if the Track's KID was not found when licensing
For ex., if a service has the same PSSH or license call for 720p and 1080p video tracks, but it doesn't return a KID for the 1080p track, then the previous code would return an error, even though it has enough content key data to continue.

With this change it now only raises the error if the track's exact KID was not licensed. This adds support to prepare_drm for specifying the track's KID.
2023-03-08 22:41:13 +00:00
rlaphoenix
73bd17ec94 Implement new get_key_id() method to Track 2023-03-08 22:36:21 +00:00
rlaphoenix
853a021ac0 Fix regression in new get_init_segment, change fallback_size to maximum_size
When the Content-Size is successfully determined, it still uses the (now called maximum_size) as it could otherwise be MBs or GBs worth of data.
2023-03-08 22:27:38 +00:00
rlaphoenix
573dd8cd49 Don't immediately license DASH DRM until used
This is unnecessary as the DASH track may get converted into an URL track, which will also prepare the DRM.
2023-03-08 21:42:05 +00:00
rlaphoenix
8337162991 Prepare DRM on URL tracks if they already have DRM
Previously it would only prepare the DRM, if it had to find DRM from the init data. Now it prepares DRM if already pre-provided with DRM objects.
2023-03-08 21:35:14 +00:00
rlaphoenix
d73256f1b3 Fix storing of DRM to be before preparation on URL tracks 2023-03-08 21:31:44 +00:00
rlaphoenix
32c118ab57 Rewrite Track's get_init_segment method, now more dynamic
- DASH and HSL tracks must now explicitly provide the URL to download as the init segment. This is because the original code assumed the first segment, or first init segment was the one that the caller wants, which may not be the case (e.g., and ad is the first init segment).

It now supports explicitly providing a byte-range to download, as well as modifying the fallback content size of 20KB.

It now also checks if the server supports the HTTP Range header and uses it over the hacky request-streaming method. It also checks for the file size and uses that over the fallback size as long as it's not bigger than 100 KB.

Overall it's now more dynamic to specific use-cases, and more efficient in various ways.
2023-03-08 21:08:50 +00:00
rlaphoenix
4f1d94dd7b Remove list unpack from Widevine's from_track for HLS tracks
This isn't actually necessary. Likely hasn't been necessary since either v1.0.0, or v1.2.0.
2023-03-08 20:43:25 +00:00
rlaphoenix
cbd796463d Ignore "aria2 will resume download" logs
These only happen if we intentionally cancel the process, or failed. However, this is something that is generally obvious given the args, and when cancelling a wall of these logs would appear.
2023-03-08 13:46:55 +00:00
rlaphoenix
fa84ef53e7 Don't space the * that denotes KIDs within PSSH
This prevents it from being 1 character over the size limit and wrapping to the next line.
2023-03-08 13:43:11 +00:00
rlaphoenix
b3fdafcf06 Simplify Base URL joining and calculation on DASH
This also fixes some DASH manifests where it uses multiple BaseURL definitions that must be joined together.
2023-03-07 12:36:00 +00:00
rlaphoenix
cddfdf6336 Update Changelog for v2.0.1 2023-03-07 10:28:18 +00:00
rlaphoenix
eaf7752dde Bump to v2.0.1 2023-03-07 10:28:07 +00:00
rlaphoenix
d175ffaf15 Add support for byte-range on HLS init maps 2023-03-04 12:21:28 +00:00
rlaphoenix
1b1412d498 Fix byte range calculation on HLS downloads
It was off by one. The final calculation for the right-side range needed to be converted from one-index to zero-index.
2023-03-04 12:18:19 +00:00
rlaphoenix
318832e6b2 Store DRM in the track.drm property in HLS and DASH 2023-03-04 11:49:53 +00:00
rlaphoenix
f8166f098c Apply threading lock to HLS DRM preparation
Without this, if two threads started at the same time there was a very good chance they would run the code and license twice, which is unnecessary.
2023-03-04 11:41:10 +00:00
rlaphoenix
0bceb772c2 Handle exceptions in user's Service license funcs 2023-03-04 11:23:58 +00:00
rlaphoenix
d9471f886f Raise exceptions in prepare_drm instead of using sys.exit(1) 2023-03-04 11:22:51 +00:00
rlaphoenix
4b330c0478 Implement CEKNotFound and EmptyLicense exceptions to Widevine 2023-03-04 11:18:28 +00:00
rlaphoenix
c3a22431f0 Fix possible soft-lock in HLS if Queue is left empty after error 2023-03-04 11:11:20 +00:00
rlaphoenix
7df6aa42b4 Ignore Insufficient bits warning from shaka 2023-03-04 02:31:23 +00:00
rlaphoenix
9fff14af30 Fix regression that broke pproxy 2023-03-03 08:53:28 +00:00
rlaphoenix
19ca567019 Only use captured aria2c output if available 2023-03-03 07:54:31 +00:00
rlaphoenix
d964dde4d5 Don't pre-allocate file-space for segmented downloads 2023-03-03 07:52:13 +00:00
rlaphoenix
a3efadf00b Fix aria2c's segmented check for DASH/HLS 2023-03-03 07:52:13 +00:00
rlaphoenix
714e9af99a Don't print traceback of subprocess errors on download failures
Since we now have pretty logs for them, the exception (which would be a CalledProcessError) would be generally pointless. However, the return code may be useful so that is kept.
2023-03-03 07:52:13 +00:00
rlaphoenix
9d6adec707 Fix printing of shaka-packager logs 2023-03-03 07:52:13 +00:00
rlaphoenix
9e23ee13bb Remove silent args in aria2c calls for HLS/DASH 2023-03-03 07:52:13 +00:00
rlaphoenix
432a1122c5 Fix printing of aria2c logs when capturing progress 2023-03-03 07:52:13 +00:00
rlaphoenix
b2bcaf97a2 Remove double newline after ASCII banner 2023-03-02 16:24:54 +00:00
rlaphoenix
cb1a7988f4 Fix centering of project url in ASCII banner 2023-03-02 16:24:01 +00:00
rlaphoenix
3456e24846 State full 'Episode' text if there's no episode name 2023-03-02 16:19:46 +00:00
rlaphoenix
f8a8309628 Fix verbose episode listings if there's no episode name 2023-03-02 16:17:20 +00:00
rlaphoenix
fc3e49baf6 Update Changelog for v2.0.0 2023-03-01 22:06:08 +00:00
rlaphoenix
46cb1ba0fa Bump to v2.0.0 2023-03-01 22:05:52 +00:00
rlaphoenix
0b1f327a6c Update config documentation on the basic proxy provider 2023-03-01 21:26:52 +00:00
rlaphoenix
d75996f6e4 Add title download time elapsed to finish log 2023-03-01 16:06:55 +00:00
rlaphoenix
7ee5e71075 Move download time elapsed code to utilities 2023-03-01 16:06:20 +00:00
rlaphoenix
7b7be47f7d Clean up residual files on download stops and fails 2023-03-01 11:19:32 +00:00
rlaphoenix
9f48aab80c Shutdown HLS & DASH dl pool, pass exceptions to dl
This results in a noticeably faster speed cancelling segmented track downloads on CTRL+C and Errors. It's also reducing code duplication as the dl code will now handle the exception and cleanup for them.

This also simplifies the STOPPING/STOPPED and FAILING/FAILED status messages by quite a bit.
2023-03-01 11:08:52 +00:00
rlaphoenix
fbe78308eb Handle download worker exceptions outside thread loop
This is so that I can start to log information after the track listing. It's also not necessary to have the try catch within the loop, when both methods will have exited the loop.
2023-03-01 11:08:52 +00:00
rlaphoenix
624bb6fe75 Only calculate DASH/HLS dl speed if dl sizes are available 2023-03-01 11:08:52 +00:00
rlaphoenix
3a98c93f03 Support CTRL+C on URL downloads, use FAILED/STOPPED messages 2023-03-01 11:08:52 +00:00
rlaphoenix
6a65617179 Reduce the download stop check to one if check in dl 2023-03-01 08:55:40 +00:00
rlaphoenix
840db6e689 Move segment merging from dl to DASH/HLS classes 2023-03-01 08:54:35 +00:00
rlaphoenix
d07fedbbe1 Move Widevine DRM prep for URL downloads before download 2023-03-01 08:44:50 +00:00
rlaphoenix
fb49210b5a Remove explicit dependency on colorama
It's not used by devine itself, but it's still a sub-dependency. Remove from being an explicit dependency in case the sub-dependencies ever remove it as well.
2023-03-01 08:35:42 +00:00
rlaphoenix
a841dbe2ab Remove dependency on tqdm 2023-03-01 08:33:40 +00:00
rlaphoenix
f4ad7a2e6c Mark track as stopping when skipping segments 2023-02-28 18:14:03 +00:00
rlaphoenix
b482f86bb3 Skip post-download operations if dl stop event is set 2023-02-28 18:05:04 +00:00
rlaphoenix
383e7d9647 Add full support for CTRL+C on HLS and DASH 2023-02-28 18:05:04 +00:00
rlaphoenix
8365d798a4 Pass shaka-packager & aria CTRL+C to caller as KeyboardInterrupt()s 2023-02-28 18:05:04 +00:00
rlaphoenix
ad1990cc42 Print a download cancelled message on CTRL+C 2023-02-28 18:05:04 +00:00
rlaphoenix
53c005f727 Remove unnecessary dl stop event set on CTRL+C 2023-02-28 18:05:04 +00:00
rlaphoenix
9cfda3bb9c Don't shutdown pool or the for loop will lock
Since I'm using `futures.as_completed()`, it will never ever for loop over all tracks and segments and will forever be stuck in the primary thread of the operation. I.e., main thread for the download track threads, or the track thread for the download segment threads.

I've also removed all future cancelled checks as they will never be cancelled before they get the chance to run, because no future cancel calls are made anymore.
2023-02-28 18:05:03 +00:00
rlaphoenix
51fb7920c9 Mark track as skipped if it never got a chance to start downloading 2023-02-28 16:39:33 +00:00
rlaphoenix
acead803bd Remove unnecessary sleep calls at start of download threads 2023-02-28 16:38:10 +00:00
rlaphoenix
b6d3c8368a Update CONFIG docs to use proxy_providers
Also removes an old unused `proxies` config option.
2023-02-28 16:32:13 +00:00
rlaphoenix
961747b74c Set the default aria2c --file-allocation to prealloc
Falloc is faster, but supports less systems/environments, and usually require admin perms on Windows.
2023-02-28 16:23:34 +00:00
rlaphoenix
ce53a1b636 Don't run aria2c under asyncio, further improve progress updates
I've removed asyncio usage as it's generally unnecessary. If you want to run aria2c under a thread, run it under a thread. In the case for devine, this would take another thread, and would be another thread layer deep. Pointless. Would affect speed.

With this change I've been able to improve the aria2c progress capture code quite a bit.
2023-02-28 08:21:55 +00:00
rlaphoenix
d427ec8472 Fix yet another startup crash when loading the config 2023-02-28 06:30:12 +00:00
rlaphoenix
3cfc679294 Center ASCII banner without using U+2800 2023-02-28 06:03:42 +00:00
rlaphoenix
dc55f6ffeb Calculate DASH and HLS download speed in an alternate way
Also fixes getting download sizes for Subtitle tracks
2023-02-28 06:03:27 +00:00
rlaphoenix
f4122f1ae6 Add mapping for Netflix profiles starting with h264
e.g., the new -qc profiles.
2023-02-28 06:02:36 +00:00
rlaphoenix
e5e3f4687d Fix printing of aria2c errors if progress is used
Also improves the general process of ingesting aria2c progress information.
2023-02-27 16:52:00 +00:00
rlaphoenix
18449c4777 Ignore ERROR prints from aria2c progress checks 2023-02-27 00:00:55 +00:00
rlaphoenix
7560ee96c9 Replace console.log calls with padded console.prints
While console.log is currently removing the need for `Padding(..., (0, 5))` as it is overwritten to do it automatically, but in terms of purpose the `console.print` function is more logical.

I hope to find a way to automate the padding within console.print in the future, but for now this will work.
2023-02-26 23:35:40 +00:00
rlaphoenix
eb3f268d64 Revert back to using logging over console.log where possible
It seems the commit I made to do this change initially seemed to help, it was actually pointless and issues I had were caused by other problems.

For consistency it seems best to stick with the logging module with the RichHandler applied. Using just console.log means being unable to control the log level and which level of logs appear.
2023-02-26 21:20:43 +00:00
rlaphoenix
401d0481df Implement verbose arg on tree method of Movies and Album 2023-02-26 19:04:54 +00:00
rlaphoenix
375ccd7638
Merge pull request #31 from Arias800/patch-2
Adds more functionality to the wvd file
2023-02-26 18:49:17 +00:00
Arias800
d028957e9c Implement add and delete sub-commands to wvd
This pull request adds the features that are detailed in this issue:
https://github.com/devine-dl/devine/issues/2

Also changes some debug logs to info logs, as the information would generally be wanted. Also changes some logging logs to console.logs.
2023-02-26 18:46:09 +00:00
rlaphoenix
6419c27e0a Remove spacer from Spinner of progress bars 2023-02-26 15:57:10 +00:00
rlaphoenix
a8c1612eb5 Remove unused TextColumn from Track progress bar 2023-02-26 15:56:04 +00:00
rlaphoenix
73b68fe7fe Use OnMultiplex for auto SDH-strip subtitle 2023-02-25 22:00:44 +00:00
rlaphoenix
fe320e177d Add OnMultiplex event, for pre-multiplex 2023-02-25 21:58:32 +00:00
rlaphoenix
2635538205 Move repack to post-download, use rich status 2023-02-25 21:25:24 +00:00
rlaphoenix
7d1af8bd8c Move sub conversion to post-download, use rich status 2023-02-25 21:20:50 +00:00
rlaphoenix
8b405b6e02 Skip CC extraction if the binary isn't found 2023-02-25 21:06:11 +00:00
rlaphoenix
a5c6052292 Move CC extraction to be post-download, use rich status 2023-02-25 21:04:04 +00:00
rlaphoenix
b535715166 Print download thread exceptions with rich traceback 2023-02-25 17:47:59 +00:00
rlaphoenix
70106d32ce Log DRM license info under track downloads 2023-02-25 17:21:13 +00:00
rlaphoenix
178bd01069 Add a rich horizontal rule print on Service construction 2023-02-25 14:04:13 +00:00
rlaphoenix
e9b3b3a588 Use rich status when checking for proxy geofence 2023-02-25 14:03:41 +00:00
rlaphoenix
cd0c419142 Print available tracks in a Panel 2023-02-25 14:01:46 +00:00
rlaphoenix
4a5aebbca7 Add a total elapsed timer to the final log of dl command 2023-02-25 14:01:25 +00:00
rlaphoenix
09d6c4e1c3 Add a log when download finishes, listing file path 2023-02-25 14:00:39 +00:00
rlaphoenix
a5da7c8fbd Add rich progress bar for the mux operation
Update dl.py
2023-02-25 13:58:58 +00:00
rlaphoenix
58673590df Remove unnecessary if check for --skip-dl
This would always be a true statement as this part of the code is already in an if else statement checking skip_dl.
2023-02-25 13:55:44 +00:00
rlaphoenix
92895426b3 Replace tqdm progress bars with rich progress bars 2023-02-25 13:45:17 +00:00
rlaphoenix
cc69423374 Remove logs stating available/selected tracks
These are now unnecessary to distinguish as only one of the two will appear in the log depending what args are used.
2023-02-25 13:19:45 +00:00
rlaphoenix
96f408ca49 Only log available tracks if --list is used
Update dl.py
2023-02-25 13:12:46 +00:00
rlaphoenix
984582d19d Replace Tracks.print() calls with new rich tree 2023-02-25 13:10:49 +00:00
rlaphoenix
48e35fb4c4 Add rich tree generator to tracks wrapper 2023-02-25 13:05:56 +00:00
rlaphoenix
97b3dbeed2 Re-implement title listing with rich tree
Note that the full title list will no longer be printed unless --list-titles is used.
2023-02-25 13:02:00 +00:00
rlaphoenix
f7c4d72108 Add rich tree generator to sorted title wrappers 2023-02-25 13:01:09 +00:00
rlaphoenix
92774dcfe6 Simplify str representation of Sorted title containers 2023-02-25 12:54:37 +00:00
rlaphoenix
389fa6e979 Use console.rule between sections of dl's new logs
These are handy to separate the logs to be per-title, and per section of initialization.
2023-02-25 12:52:51 +00:00
rlaphoenix
b2bbc808c4 Use console.status across dl command 2023-02-25 12:22:32 +00:00
rlaphoenix
62d91a3e77 Replace log.info calls with console.log calls
I've moved log.info calls to console.log calls to reduce conflicts of logs at the same time as console refreshes, but also to reduce unnecessary log level text being printed to the console.

We don't need to know if a log is an `info` level log, but I've kept log.error's and such as we would want to know if a log is an error log and such.
2023-02-25 12:11:17 +00:00
rlaphoenix
3e1a067724 Override the default traceback with rich traceback 2023-02-25 11:53:44 +00:00
rlaphoenix
b8f3118775 Replace version logs with new ASCII banner log 2023-02-25 11:51:03 +00:00
rlaphoenix
f1864ad63c Replace logging handler with ComfyRichHandler
This changes all logs via logging to be printed to the rich console instead. Specifically, using my custom ComfyRichHandler making the logs padded horizontally.

In this change the time is no longer printed. The log level is still printed but it's now the full log name in bold.
2023-02-25 11:47:39 +00:00
rlaphoenix
8d626822cb Remove coloredlogs and now unused log constants 2023-02-25 11:46:03 +00:00
rlaphoenix
77c16f557c Replace logging FileHandler with console.save_text 2023-02-25 11:38:02 +00:00
rlaphoenix
34a2a8e4e6 Create custom Comfy versions of Console, LogRender, and RichHandler
These add a bit of margin around all console prints/logs. It makes the logs feel a bit more comfortable rather than being crammed edge-to-edge in the terminal.

The methods used here are by no means good, but they work quite well. If you can find a better way to do it, please make a pull request.
2023-02-25 11:33:42 +00:00
rlaphoenix
6eac499ae0 Add new config option to set the terminal bg color
This is a hacky way to do it, but it works surprisingly well, considering there's no true way to modify a terminal's fully color scheme.
2023-02-25 11:29:46 +00:00
rlaphoenix
01e419d52c Create custom-themed 80-width rich console 2023-02-25 11:27:29 +00:00
rlaphoenix
39ff347f58 Add new dependency, rich 13.3.1 2023-02-25 11:22:54 +00:00
rlaphoenix
7ab39377db Update Changelog for v1.4.0 2023-02-25 23:05:52 +00:00
rlaphoenix
0f0000bdd0 Bump to v1.4.0 2023-02-25 23:05:14 +00:00
rlaphoenix
7958ad5e7c Create blank config if it did not exist 2023-02-25 23:01:11 +00:00
rlaphoenix
570e0aba00 Fix printing of download worker exceptions 2023-02-25 14:46:43 +00:00
rlaphoenix
a8694cb049 Disable Insecure Request Warnings
These seem to occur when using HTTP+S proxies when connecting to a HTTPS URL. E.g., NordVPN proxies to https://google.com. While not ideal, we can't solve this solution and the warning logs are quite annoying.
2023-02-25 11:56:23 +00:00
rlaphoenix
eebe76b6f6 Fix segment download merging on Linux machines
It seems on Windows the pathlib.iterdir() function is always in order. However, on Linux or at least some machines this is not the case. This change fixes the order.

If you think you were affected, check your previous downloads that used DASH or HLS segmentation and make sure they dont randomly change scenes out of order.
2023-02-24 23:10:51 +00:00
rlaphoenix
c6976a7112 Move --log from the dl cmd to the root cmd 2023-02-24 19:54:05 +00:00
rlaphoenix
4f1cff681c Increase connection pool limit to accommodate Byte-Range dls 2023-02-24 19:46:26 +00:00
rlaphoenix
ad82eab712 Pad/depad data when decrypting with ClearKey DRM 2023-02-23 18:19:51 +00:00
rlaphoenix
45c9ba5198 Add support for data URIs from HLS playlists in ClearKey 2023-02-23 18:19:51 +00:00
rlaphoenix
221cd1c283 Fix JOC check on HLS playlists with no audio channel info 2023-02-23 18:19:51 +00:00
rlaphoenix
1f86775ac9 Add support for segment downloads with byte-ranges
Adds support for HLS's EXT-X-BYTERANGE and DASH's SegmentBase.
2023-02-23 18:19:51 +00:00
rlaphoenix
55da41c74f Remove byte_range param on aria2c downloader
Turns out, even if you manually set the Range header AND the server has full support, it does not work. It will act like it works, but it seems internally aria2c gets confused on what bytes it requested, what it returned, and it will either just download the full file, or the range requested (but still complain, and freeze!).

Yikes.
2023-02-23 16:47:14 +00:00
rlaphoenix
725480adf0 Update Changelog for v1.3.1 2023-02-23 09:25:46 +00:00
rlaphoenix
da1beae28f Bump to v1.3.1 2023-02-23 09:25:34 +00:00
rlaphoenix
739c17bf4c Initialize config when adding auth if non-existant 2023-02-23 09:09:09 +00:00
rlaphoenix
9932fa3f7a Fix mistake when logging where the cookie file was moved 2023-02-23 09:07:21 +00:00
rlaphoenix
3b5a199f31 Fix regression of missing title & track when licensing with Widevine
For some reason I had the partial removed and replaced with the direct function, meaning services using `title` and `track` during the license and service cert functions would crash.
2023-02-23 08:56:37 +00:00
rlaphoenix
5624232e5e
Merge pull request #33 from varyg1001/master
always define track.path if track.descriptor is URL
2023-02-23 08:45:06 +00:00
varyg
fc19358de4 Fix regression of missing track.path on URL downloads 2023-02-23 08:44:50 +00:00
rlaphoenix
d8ae543d63 Update Changelog for v1.3.0 2023-02-22 04:39:55 +00:00
rlaphoenix
2aea4549ce Bump to v1.3.0 2023-02-22 04:39:42 +00:00
rlaphoenix
2ad3f04a5e Use the session DRM as the initial DRM for HLS
The variant-playlist may have DRM information that won't be in the invariant (stream) playlist. We need to begin with this DRM data if available.
2023-02-22 04:30:24 +00:00
rlaphoenix
c1f716cb6c Drop support for Python 3.8 2023-02-22 03:43:22 +00:00
rlaphoenix
f21aa5aac5 Silence aria2c when downloading segmented streams
It conflicts with the TQDM progress bar, it's also not really useful information.
2023-02-22 03:06:53 +00:00
rlaphoenix
4406e3bbab Add ability to silence aria2c's output 2023-02-22 03:06:17 +00:00
rlaphoenix
0913b0dda6 Add ability to set extra arguments with aria2c 2023-02-22 03:01:54 +00:00
rlaphoenix
1443dfaecc Replace Threading Pool system with Thread Storage in Vaults
This fixes the usage of vaults across different threads. It now makes a truly unique connection for each thread. The previous code did this as well, but put back the connection from x thread, to re-use in y thread. Now it simply creates and reuses the connection on their own thread.

Once the thread is closed, the data is now also garbage collected. This now reduces the risk of filling up memory over time.
2023-02-22 01:31:03 +00:00
rlaphoenix
4e875f5ffc Multi-thread the new DASH download system, improve redundency
Just like the commit for HLS multi-threading, this mimics the -j=16 system of aria2c, but manually via a ThreadPoolExecutor. Benefits of this is we still keep support for the new system, and we now get a useful progress bar via TQDM on segmented downloads, unlike aria2c which essentially fills the terminal with jumbled download progress stubs.
2023-02-22 01:06:24 +00:00
rlaphoenix
9e6f5b25f3 Multi-thread the new HLS download system
This mimics the -j=16 system of aria2c, but manually via a ThreadPoolExecutor. Benefits of this is we still keep support for the new system, and we now get a useful progress bar via TQDM on segmented downloads, unlike aria2c which essentially fills the terminal with jumbled download progress stubs.
2023-02-21 16:16:12 +00:00
rlaphoenix
314079c75f Pass save path to DRM decrypt functions directly
This is required in segmented scenarios when multi-threaded where the same `track.path` would be get and set from possibly at the same time. It's also just better logically to do it this way.
2023-02-21 16:09:35 +00:00
rlaphoenix
8268825ba8 Use Widevine.from_init_data when downloading DASH
This is required as some DASH manifests do not explicitly list the PSSH with the Widevine ContentProtection, only listing that its a supported.
2023-02-21 08:02:32 +00:00
rlaphoenix
8c312440a3 Add func to get Widevine PSSH and KID from Init Data
This is just a more direct alternative to Widevine.from_track where you may have the init data but not the track itself.
2023-02-21 08:00:42 +00:00
rlaphoenix
50193856c2 Add missing tqdm progress bar to DASH downloads 2023-02-21 06:35:40 +00:00
rlaphoenix
61270d3af4 Fix expired check in some cases in Cacher class
It seems if a service sets the expiration time in seconds or by a timestamp, then this resolves as expired up to 14 hrs too early, or 14 hours too late, depending on what your timezone is relative to UTC+00:00.

However, I haven't fully confirmed if this is the right fix to make. As in, I'm not sure if changing datetime.utcnow() to datetime.now() should be done, or the other way around. However, I've had multiple people tell me changing it this way worked for them so I'm just going to roll with it.
2023-02-21 06:13:02 +00:00
rlaphoenix
42aaa03941 Completely rewrite downloading system
The new system now downloads and decrypts segments individually instead of downloading all segments, merging them, and then decrypting. Overall the download system now acts more like a normal player.

This fixes #23 as the new HLS download system detects changes in keys and init segments as segments are downloaded. DASH still only supports one period, and one period only, but hopefully I can change that in the future.

Downloading code is now also moved from the Track classes to the manifest classes. Download progress is now also actually helpful for segmented downloads (all HLS, and most DASH streams). It uses TQDM to show a progress bar based on how many segments it needs to download, and how fast it downloads them.

There's only one down side currently. Downloading of segmented videos no longer have the benefit of aria2c's -j parameter. Where it can download n URLs concurrently. Aria2c is still used but only -x and -s is going to make a difference.

In the future I will make HLS and DASH download in a multi-threaded way, sort of a manual version of -j.
2023-02-21 06:00:39 +00:00
rlaphoenix
c925cb8af9 Remove AtomicSQL, use Connection Pools in Vaults
This allows the use of vaults in any thread, while keeping database-file level thread safety. AtomicSQL doesn't actually do anything that useful.
2023-02-21 05:38:39 +00:00
rlaphoenix
707469d252 Add ability to pass a proxy to ClearKey in HLS.get_drm 2023-02-21 01:37:39 +00:00
rlaphoenix
5197961d91 Add doc-string to HLS.get_drm 2023-02-21 01:36:01 +00:00
rlaphoenix
09989f8b94 Raise error if no supported drm was found in HLS.get_drm 2023-02-21 01:31:52 +00:00
rlaphoenix
c3f2d0d9cc Support HLS's EXT-X-KEY NONE
If EXT-X-KEY of Method=NONE is encountered, it assumes no DRM should be used, even if other supported DRM may have already been or is going to be iterated.
2023-02-21 01:24:04 +00:00
rlaphoenix
7025b5cef3 Only convert AES-128 EXT-X-KEYs to ClearKey objects
There's other AES methods like AES-CBC and such that is not currently supported by the ClearKey DRM class.
2023-02-21 01:21:28 +00:00
rlaphoenix
88a5bc8ad5
Merge pull request #30 from varyg1001/master
Check if cookies exists
2023-02-14 22:38:24 +00:00
varyg
f412b0e37f
Check if cookies exists 2023-02-14 21:56:55 +01:00
rlaphoenix
989c24788b Update Changelog for v1.2.0 2023-02-13 20:12:49 +00:00
rlaphoenix
be87d897e3 Bump to v1.2.0 2023-02-13 20:04:47 +00:00
rlaphoenix
6d3caffa32 Backport Path.is_relative_to to Python 3.8.x
Further Fix to #28.
2023-02-13 19:07:24 +00:00
rlaphoenix
d926b4fe9a Fix annotation type-hints in some files on Python 3.x
Fixes #28.
2023-02-13 19:06:25 +00:00
rlaphoenix
af5e6acbb8 Add byte-range option to the aria2c downloader 2023-02-13 18:43:36 +00:00
rlaphoenix
0c02b1513a
Merge pull request #27 from varyg1001/master
Fix cookie import across drives in `auth add`
2023-02-13 18:36:07 +00:00
rlaphoenix
f06ca768c3 Add varyg1001 to the Contributors list 2023-02-13 18:33:51 +00:00
varyg
bb9e85d777 Use shutil.move to move data instead of Path.rename
Path.rename() cannot move data to different drives. It can only rename the path reference on the same file system. However, shutil.move() will move the data while also changing it's name.
2023-02-13 18:33:02 +00:00
rlaphoenix
b25e6c5ce5
Merge pull request #25 from Arias800/patch-1
Detect redirection
2023-02-12 17:55:22 +00:00
rlaphoenix
6a3559bc0f Add Arias800 to the Contributor list in the README 2023-02-12 17:54:02 +00:00
Arias800
7169eaa885
Detect redirection
If the manifest is hidden behind a redirect, the url is not updated and the segments are created with the old url.
I found a French service where this situation occurred.

After this change, it was possible to correctly parse the mpd.
2023-02-12 11:01:16 +01:00
rlaphoenix
4b5a2c703b Fix subtitle conversion error where WEBVTT header is kept
This happened because the WEBVTT header was segmented and appended to each other without enough newline separation so pycaption thought it was an actual caption and to be kept.
2023-02-11 22:17:43 +00:00
rlaphoenix
47448aac3c Print traceback when download worker fails 2023-02-11 22:16:54 +00:00
rlaphoenix
f4363ae57e Fix HLS conversion of JOC Audio Tracks channels 2023-02-11 17:12:22 +00:00
rlaphoenix
067130990c Add missing exit when --abitrate returns no tracks 2023-02-10 21:11:28 +00:00
rlaphoenix
d612599e27 Add Audio Channels Selection option to dl command CLI
Sub-channel layouts like 5.1, 7.1, will match with e.g., 6.0 and 8.0 respectively.

Closes #9.
2023-02-10 21:10:09 +00:00
rlaphoenix
f9afd87474 Parse audio channels to float, rework parse function
The function now works more effectively, now supports parsing `Nch` strings and explicitly tests for float/int input.
2023-02-10 20:58:33 +00:00
rlaphoenix
0334640e93 Remove unnecessary audio channel 'Nch' differential
This was implemented to know the difference between services explicitly stating 5.1 channel audio, to a service explicitly stating 6.0 channel audio that is likely 5.1 channel layout.

However, since technically both are 6 channels, this is unnecessary and just causes complications.
2023-02-10 20:46:26 +00:00
rlaphoenix
f34bdb8627 If audio channels weren't specified, don't insinuate 2ch 2023-02-10 20:34:46 +00:00
rlaphoenix
84bf9fde92 Add Bitrate Selection options to dl command CLI
It defaults to highest available. Closes #8.
2023-02-10 20:19:54 +00:00
rlaphoenix
4bee08c431 Fix FutureWarning when getting FPS for DASH videos 2023-02-10 20:05:31 +00:00
rlaphoenix
a8f3975f7e Remove assert statements from base Service class
The 2nd assert statement is completely unnecessary, and the first one can be resolved by using it in an if instead.
2023-02-10 19:49:19 +00:00
rlaphoenix
dbe52cf273 Remove list comp on parameter of any() 2023-02-10 19:44:03 +00:00
rlaphoenix
9a757708ac Make the get_session func of the Service class static 2023-02-10 19:41:58 +00:00
rlaphoenix
09e29cb2e1 Define exports with __ALL__ in sub-packages 2023-02-10 19:37:03 +00:00
rlaphoenix
77d892a2a9 Add DeepSource Active Issues Badge to Readme 2023-02-10 19:31:35 +00:00
rlaphoenix
fae44614f5 Define all imports within the tracks package 2023-02-10 19:27:12 +00:00
rlaphoenix
268509fb49 Remove unused f-strings with no placeholders 2023-02-10 19:26:46 +00:00
rlaphoenix
d982e37ee5 Re-order all imports across project with isort 2023-02-10 19:25:25 +00:00
rlaphoenix
faabfb550c Replace use of with_stem + with_suffix to with_name 2023-02-10 19:13:31 +00:00
rlaphoenix
0a5f359217 Fix FutureWarning when getting segment URLs from SegmentTemplate 2023-02-08 12:57:09 +00:00
rlaphoenix
23153f0078 Only use SegmentBase's timescale as FPS if exists
Also optimised it to not require FPS.parse, as the Video class will run it under FPS.parse anyway.
2023-02-08 12:29:57 +00:00
rlaphoenix
3f22c969c3 Update Changelog for v1.1.0 2023-02-07 21:49:42 +00:00
rlaphoenix
71ded306b6 Bump to v1.1.0 2023-02-07 21:49:32 +00:00
rlaphoenix
44f0ab4793 Create CHANGELOG.md 2023-02-07 21:49:07 +00:00
rlaphoenix
18e2d8617e Create credentials dict prior to assignment in auth add
Fixes #17.
2023-02-07 21:32:21 +00:00
rlaphoenix
c5d6ba09f2 Check if Cookie file for profile already exists during auth add
Fixes #16.
2023-02-07 21:25:26 +00:00
rlaphoenix
6619c29fb5 Create parent dirs when adding cookies with auth add
Fixes #21.
2023-02-07 21:21:15 +00:00
rlaphoenix
0446c44a42 Only iterate over cookie dir if exists in auth list
Fixes #18.
2023-02-07 21:08:45 +00:00
rlaphoenix
794de8b516 Efficiently store data in auth list, sort the data 2023-02-07 21:07:58 +00:00
rlaphoenix
dd441bcd85 Fix crash when loading key vaults in dl command
Fixes #15.
2023-02-07 20:47:40 +00:00
rlaphoenix
1fa3ba61c8 Fix crash when using cfg on empty/new configs
Fixes #19.
2023-02-07 20:44:34 +00:00
rlaphoenix
f7683173f8 Fix startup crash if config is blank
Fixes #20.
2023-02-07 20:32:40 +00:00
rlaphoenix
76671495b4 Fix startup crash if config does not yet exist
Fixes #14.
2023-02-07 20:20:40 +00:00
rlaphoenix
5301ac2924 Add util to test decoding of the video/audio streams with FFmpeg 2023-02-07 00:58:53 +00:00
rlaphoenix
00f85f7206 Add util to change video range flag losslessly
This is useful for some services. Some times there's a random stream with the wrong video range.
2023-02-06 23:49:28 +00:00
rlaphoenix
8c66e57175 Change the icon next to the project name 2023-02-06 04:00:14 +00:00
rlaphoenix
8b9247bfc5 Fix the organization name across the project
Sadly `devine` was not available.
2023-02-06 02:55:22 +00:00
76 changed files with 9558 additions and 3828 deletions

View File

@ -1,9 +1,5 @@
version = 1
exclude_patterns = [
"**_pb2.py" # protobuf files
]
[[analyzers]]
name = "python"
enabled = true

15
.editorconfig Normal file
View File

@ -0,0 +1,15 @@
root = true
[*]
end_of_line = lf
charset = utf-8
insert_final_newline = true
indent_style = space
indent_size = 4
trim_trailing_whitespace = true
[*.{feature,json,md,yaml,yml,toml}]
indent_size = 2
[*.md]
trim_trailing_whitespace = false

View File

@ -1,3 +0,0 @@
[flake8]
exclude = .venv,build,dist,*_pb2.py,*.pyi
max-line-length = 120

1
.gitattributes vendored Normal file
View File

@ -0,0 +1 @@
* text=auto eol=lf

View File

@ -1,4 +1,9 @@
name: cd
permissions:
contents: "write"
id-token: "write"
packages: "write"
pull-requests: "read"
on:
push:
@ -10,32 +15,32 @@ jobs:
name: Tagged Release
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10.x'
- name: Install Poetry
uses: abatilo/actions-poetry@v2.2.0
with:
poetry-version: '1.3.2'
- name: Install dependencies
run: poetry install
- name: Build wheel
run: poetry build -f wheel
- name: Upload wheel
uses: actions/upload-artifact@v2.2.4
with:
name: Python Wheel
path: "dist/*.whl"
- name: Deploy release
uses: marvinpinto/action-automatic-releases@latest
with:
prerelease: false
repo_token: "${{ secrets.GITHUB_TOKEN }}"
files: |
dist/*.whl
- name: Publish to PyPI
env:
POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_TOKEN }}
run: poetry publish
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.11"
- name: Install Poetry
uses: abatilo/actions-poetry@v2
with:
poetry-version: 1.6.1
- name: Install project
run: poetry install --only main
- name: Build project
run: poetry build
- name: Upload wheel
uses: actions/upload-artifact@v3
with:
name: Python Wheel
path: "dist/*.whl"
- name: Deploy release
uses: marvinpinto/action-automatic-releases@latest
with:
prerelease: false
repo_token: "${{ secrets.GITHUB_TOKEN }}"
files: |
dist/*.whl
- name: Publish to PyPI
env:
POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_TOKEN }}
run: poetry publish

View File

@ -7,32 +7,38 @@ on:
branches: [ master ]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.11"
- name: Install poetry
uses: abatilo/actions-poetry@v2
with:
poetry-version: 1.6.1
- name: Install project
run: poetry install --all-extras
- name: Run pre-commit which does various checks
run: poetry run pre-commit run --all-files --show-diff-on-failure
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.8', '3.9', '3.10', '3.11']
python-version: ["3.9", "3.10", "3.11"]
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install flake8
run: python -m pip install flake8
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Install poetry
uses: abatilo/actions-poetry@v2.2.0
with:
poetry-version: 1.3.2
- name: Install project
run: poetry install --no-dev
- name: Build project
run: poetry build
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install poetry
uses: abatilo/actions-poetry@v2
with:
poetry-version: 1.6.1
- name: Install project
run: poetry install --all-extras --only main
- name: Build project
run: poetry build

71
.gitignore vendored
View File

@ -1,4 +1,6 @@
# devine
devine.yaml
devine.yml
*.mkv
*.mp4
*.exe
@ -9,6 +11,8 @@
*.pem
*.bin
*.db
*.ttf
*.otf
device_cert
device_client_id_blob
device_private_key
@ -36,6 +40,7 @@ parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
@ -54,14 +59,17 @@ pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
@ -71,6 +79,7 @@ coverage.xml
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
@ -83,16 +92,49 @@ instance/
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# pyenv
.python-version
# IPython
profile_default/
ipython_config.py
# celery beat schedule file
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
@ -113,13 +155,26 @@ venv.bak/
# Rope project settings
.ropeproject
# JetBrains project settings
.idea
# mkdocs documentation
/site
# mypy
.mypy_cache/
.directory
.idea/dataSources.local.xml
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

View File

@ -2,17 +2,22 @@
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/compilerla/conventional-pre-commit
rev: v3.2.0
hooks:
- id: conventional-pre-commit
stages: [commit-msg]
- repo: https://github.com/mtkennerly/pre-commit-hooks
rev: v0.4.0
hooks:
- id: poetry-ruff-check
- repo: https://github.com/pycqa/isort
rev: 5.12.0
rev: 5.13.2
hooks:
- id: isort
- repo: https://github.com/pycqa/flake8
rev: 6.0.0
hooks:
- id: flake8
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
rev: v4.5.0
hooks:
- id: end-of-file-fixer
- id: trailing-whitespace
args: [--markdown-linebreak-ext=md]
- id: end-of-file-fixer

13
.vscode/extensions.json vendored Normal file
View File

@ -0,0 +1,13 @@
{
"recommendations": [
"EditorConfig.EditorConfig",
"streetsidesoftware.code-spell-checker",
"ms-python.python",
"ms-python.vscode-pylance",
"charliermarsh.ruff",
"ms-python.isort",
"ms-python.mypy-type-checker",
"redhat.vscode-yaml",
"tamasfe.even-better-toml"
]
}

833
CHANGELOG.md Normal file
View File

@ -0,0 +1,833 @@
# Changelog
All notable changes to this project will be documented in this file.
This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
Versions [3.0.0] and older use a format based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
but versions thereafter use a custom changelog format using [git-cliff](https://git-cliff.org).
## [3.3.3] - 2024-05-07
### Bug Fixes
- *dl*: Automatically convert TTML Subs to WebVTT for MKV support
- *Subtitle*: Correct timestamps when merging fragmented WebVTT
### Changes
- *env*: List all directories as table in info
- *env*: List possible config path locations when not found
- *binaries*: Move all binary definitions to core/binaries file
- *curl-impersonate*: Remove manual fix for curl proxy SSL
- *curl-impersonate*: Update the default browser to chrome124
- *Config*: Move possible config paths out of func to constant
- *utilities*: Remove get_binary_path, use binaries.find instead
- *dl*: Improve readability of download worker errors
- *env*: Shorten paths on Windows with env vars
## [3.3.2] - 2024-04-16
### Bug Fixes
- *Video*: Ensure track is supported in change_color_range()
- *Video*: Optionalise constructor args, add doc-string & checks
- *Audio*: Optionalise constructor args, add doc-string & checks
- *Subtitle*: Optionalise constructor args, add doc-string & checks
- *HLS*: Ensure playlist.stream_info.codecs exists before use
- *HLS*: Ensure playlist.stream_info.resolution exists before use
- *env*: List used config path, otherwise the default path
- *cfg*: Use loaded config path instead of hardcoded default
- *Basic*: Return None not Exception if no proxy configured
### Changes
- *Video*: Do not print "?"/"Unknown" values in str()
- *Audio*: Do not print "?"/"Unknown" values in str()
- *Subtitle*: Do not print "?"/"Unknown" values in str()
- *Audio*: List lang after codec for consistency with other Tracks
- *Video*: Return None if no m3u RANGE, not SDR
- *env*: Use -- to indicate no config found/loaded
### New Contributors
- [retouching](https://github.com/retouching)
## [3.3.1] - 2024-04-05
### Features
- *dl*: Add *new* --workers to set download threads/workers
### Bug Fixes
- *Chapter*: Cast values to int prior to formatting
- *requests*: Fix multithreaded downloads
- *Events*: Dereference subscription store from ephemeral store
### Changes
- *dl*: Change --workers to --downloads
### New Contributors
- [knowhere01](https://github.com/knowhere01)
## [3.3.0] - 2024-04-02
### Features
- Add support for MKV Attachments via Attachment class
- *dl*: Automatically attach fonts used within SSAv4 subs
- *dl*: Try find SSAv4 fonts in System OS fonts folder
- *Basic*: Allow single string URIs for countries
- *Basic*: Allow proxy selection by index (one-indexed)
- *Events*: Add new global Event Observer API
### Bug Fixes
- *curl-impersonate*: Set Cert-Authority Bundle for HTTPS Proxies
- *Basic*: Make query case-insensitive
- *WVD*: Ensure WVDs dir exists before moving WVD file
- *WVD*: Fix empty path to WVDs folder check
- *WVD*: Move log out of loop to save performance
- *WVD*: Move log with path before Device load
- *WVD*: Add exists/empty checks to WVD folder dumps
- *Basic*: Fix variable typo regression
### Changes
- *Basic*: Improve proxy format checks
- *WVD*: Print error if path to parse doesn't exist
- *WVD*: Seperate logs in loop for visual clarity
- *Track*: Move from OnXyz callables to Event observer
## [3.2.0] - 2024-03-25
### Features
- *ClearKey*: Pass session not proxy str in from_m3u_key method
- *Track*: Allow Track to choose downloader to use
- *search*: New Search command, Service method, SearchResult Class
### Bug Fixes
- *dl*: Include chapters when muxing
- *aria2c*: Support aria2(c) 1.37.0 by handling upstream regression
- *MultipleChoice*: Simplify super() call and value types
- *dl*: Add single mux job if there's no video tracks
- *Track*: Compute Track ID from the `this` variable, not `self`
- *DASH/HLS*: Don't merge folders, skip final merge if only 1 segment
- *dl*: Use click.command() instead of click.group()
- *HLS*: Remove save dir even if final merge wasn't needed
- *Track*: Fix order of operation mistake in get_track_name
- *requests*: Set HTTP pool connections/maxsize to max workers
- *Video*: Delete original file after using change_color_range()
- *Video*: Delete original file after using remove_eia_cc()
- *requests*: Manually compute default max_workers or pool size is None
- *requests*: Block until connection freed if too many connections
- *HLS*: Delete subtitle segments as they are merged
- *HLS*: Delete video/audio segments after FFmpeg merge
### Changes
- *ClearKey*: Only use User-Agent if none set in from_m3u_key
- *Track*: Remove TERRITORY_MAP constant, trim SAR China manually
- *Track*: Default the track name to it's lang's script/territory
- *Service*: Go back to the default pool_maxsize in Session
## [3.1.0] - 2024-03-05
### Features
- *cli*: Implement MultipleChoice click param based on Choice param
- *dl*: Skip video lang filter if --v-lang unused & only 1 video lang
- *dl*: Change --vcodec default to None, use any codec
- *dl*: Support multiple -r/--range and mux ranges separately
- *Subtitle*: Convert from fTTML->TTML & fVTT->WebVTT post-download
- *Track*: Make ID optional, Automatically compute one if not provided
- *Track*: Add a name property to use for the Track Name
### Bug Fixes
- *dl*: Have --sub-format default to None to keep original sub format
- *HLS*: Use filtered out segment key info
- *Track*: Don't modify lang when getting name
- *Track*: Don't use fallback values "Zzzz"/"ZZ" for track name
- *version*: The `__version__` variable forgot to be updated
### Changes
- Move dl command's download_track() to Track.download()
- *dl*: Remove unused `get_profiles()` method
- *DASH*: Move data values from track url to track data property
- *DASH*: Change how Video FPS is gotten to remove FutureWarning log
- *Track*: Add type checks, improve typing
- *Track*: Remove swap() method and it's uses
- *Track*: Remove unused DRM enum
- *Track*: Rename Descriptor's M3U & MPD to HLS & DASH
- *Track*: Remove unnecessary bool casting
- *Track*: Move the path class instance variable with the rest
- *Track*: Return new path on move(), raise exceptions on errors
- *Track*: Move delete and move methods near start of Class
- *Track*: Rename extra to data, enforce type as dict
### Builds
- Explicitly use marisa-trie==1.1.0 for Python 3.12 wheels
## [3.0.0] - 2024-03-01
### Added
- Support for Python 3.12.
- Audio track's Codec Enum now has [FLAC](https://en.wikipedia.org/wiki/FLAC) defined.
- The Downloader to use can now be set in the config under the [downloader key](CONFIG.md#downloader-str).
- New Multi-Threaded Downloader, `requests`, that makes HTTP(S) calls using [Python-requests](https://requests.readthedocs.io).
- New Multi-Threaded Downloader, `curl_impersonate`, that makes HTTP(S) calls using [Curl-Impersonate](https://github.com/yifeikong/curl-impersonate) via [Curl_CFFI](https://github.com/yifeikong/curl_cffi).
- HLS manifests specifying a Byte range value without starting offsets are now supported.
- HLS segments that use `EXT-X-DISCONTINUITY` are now supported.
- DASH manifests with SegmentBase or only BaseURL are now supported.
- Subtitle tracks from DASH manifests now automatically marked as SDH if `urn:tva:metadata:cs:AudioPurposeCS:2007 = 2`.
- The `--audio-only/--subs-only/--chapters-only` flags can now be used simultaneously. For example, `--subs-only`
with `--chapters-only` will get just Subtitles and Chapters.
- Added `--video-only` flag, which can also still be simultaneously used with the only "only" flags. Using all four
of these flags will have the same effect as not using any of them.
- Added `--no-proxy` flag, disabling all uses of proxies, even if `--proxy` is set.
- Added `--sub-format` option, which sets the wanted output subtitle format, defaulting to SubRip (SRT).
- Added `Subtitle.reverse_rtl()` method to use SubtitleEdit's `/ReverseRtlStartEnd` functionality.
- Added `Subtitle.convert()` method to convert the loaded Subtitle to another format. Note that you cannot convert to
fTTML or fVTT, but you can convert from them. SubtitleEdit will be used in precedence over pycaption if available.
Converting to SubStationAlphav4 requires SubtitleEdit, but you may want to manually alter the Canvas resolution after
the download.
- Added support for SubRip (SRT) format subtitles in `Subtitle.parse()` via pycaption.
- Added `API` Vault Client aiming for a RESTful like API.
- Added `Chapters` Class to hold the new reworked `Chapter` objects, automatically handling stuff like order of the
Chapters, Chapter numbers, loading from a chapter file or string, and saving to a chapter file or string.
- Added new `chapter_fallback_name` config option allowing you to set a Chapter Name Template used when muxing Chapters
into an MKV Container with MKVMerge. Do note, it defaults to no Chapter Fallback Name at all, but MKVMerge will force
`Chapter {i:02}` at least for me on Windows with the program language set to English. You may want to instead use
`Chapter {j:02}` which will do `Chapter 01, Intro, Chapter 02` instead of `Chapter 01, Intro, Chapter 03` (an Intro
is not a Chapter of story, but it is the 2nd Chapter marker, so It's up to you how you want to interpret it).
- Added new `Track.OnSegmentDownloaded` Event, called any time one of the Track's segments were downloaded.
- Added new `Subtitle.OnConverted` Event, called any time that Subtitle is converted.
- Implemented `__add__` method to `Tracks` class, allowing you to add to the first Tracks object. For example, making
it handy to merge HLS video tracks with DASH tracks, `tracks = dash_tracks + hls_tracks.videos`, or for iterating:
`for track in dash.videos + hls.videos: ...`.
- Added new utility `get_free_port()` to get a free local port to use, though it may be taken by the time it's used.
### Changed
- Moved from my forked release of pymp4 (`rlaphoenix-pymp4`) back to the original `pymp4` release as it is
now up-to-date with some of my needed fixes.
- The DASH manifest is now stored in the Track `url` property to be reused by `DASH.download_track()`.
- Encrypted DASH streams are now downloaded in full and then decrypted, instead of downloading and decrypting
each individual segment. Unlike HLS, DASH cannot dynamically switch out the DRM/Protection information.
This brings both CPU and Disk IOPS improvements, as well as fixing rare weird decryption anomalies like broken
or odd timestamps, decryption failures, or broken a/v continuity.
- When a track is being decrypted, it now displays "Decrypting" and afterward "Decrypted" in place of the download
speed.
- When a track finishes downloaded, it now displays "Downloaded" in place of the download speed.
- When licensing is needed and fails, the track will display "FAILED" in place of the download speed. The track
download will cancel and all other track downloads will be skipped/cancelled; downloading will end.
- The fancy smart quotes (`“` and `”`) are now stripped from filenames.
- All available services are now listed if you provide an invalid service tag/alias.
- If a WVD file fails to load and looks to be in the older unsupported v1 format, then instructions on migrating to
v2 will be displayed.
- If Shaka-Packager prints an error (i.e., `:ERROR:` log message) it will now raise a `subprocess.CalledProcessError`
exception, even if the process return code is 0.
- The Video classes' Primaries, Transfer, and Matrix classes had changes to their enum names to better represent their
values and uses. See the changed names in the [commit](https://github.com/devine-dl/devine/commit/c159672181ee3bd07b06612f256fa8590d61795c).
- SubRip (SRT) Subtitles no longer have the `MULTI-LANGUAGE SRT` header forcefully removed. The root cause of the error
was identified and fixed in this release.
- Since `Range.Transfer.SDR_BT_601_625 = 5` has been removed, `Range.from_cicp()` now internally remaps CICP transfer
values of `5` to `6` (which is now `Range.Transfer.BT_601 = 6`).
- Referer and User-Agent Header values passed to the aria2(c) downloader is now set via the dedicated `--referer` and
`--user-agent` options respectively, instead of `--header`.
- The aria2(c) `-j`, `-x`, and `-s` option values can now be set by the config under the `aria2c` key in the options'
full names.
- The aria2(c) `-x`, and `-s` option values now use aria2(c)'s own default values for them instead of `16`. The `j`
option value defaults to ThreadPoolExecutor's algorithm of `min(32,(cpu_count+4))`.
- The download progress bar now states `LICENSING` on the speed text when licensing DRM, and `LICENSED` once finished.
- The download progress bar now states `CANCELLING`/`CANCELLED` on the speed text when cancelling downloads. This is to
make it more clear that it didn't just stop, but stopped as it was cancelled.
- The download cancel/skip events were moved to `constants.py` so it can be used across the codebase easier without
argument drilling. `DL_POOL_STOP` was renamed to `DOWNLOAD_CANCELLED` and `DL_POOL_SKIP` to `DOWNLOAD_LICENCE_ONLY`.
- The Cookie header is now calculated for each URL passed to the aria2(c) downloader based on the URL. Instead of
passing every single cookie, which could have two cookies with the same name aimed for different host names, we now
pass only cookies intended for the URL.
- The aria2(c) process no longer prints output to the terminal directly. Devine now only prints contents of the
captured log messages to the terminal. This allows filtering out of errors and warnings that isn't a problem.
- DASH and HLS no longer download segments silencing errors on all but the last retry as the downloader rework makes
this unnecessary. The errors will only be printed on the final retry regardless.
- `Track.repackage()` now saves as `{name}_repack.{ext}` instead of `{name}.repack.{ext}`.
- `Video.change_color_range()` now saves as `{name}_{limited|full}_range.{ext}` instead of `{name}.range{0|1}.{ext}`.
- `Widevine.decrypt()` now saves as `{name}_decrypted.{ext}` instead of `{name}.decrypted.{ext}`.
- Files starting with the save path's name and using the save path's extension, but not the save path, are no longer
deleted on download finish/stop/failure.
- The output container format is now explicitly specified as `MP4` when calling `shaka-packager`.
- The default downloader is now `requests` instead of `aria2c` to reduce required external dependencies.
- Reworked the `Chapter` class to only hold a timestamp and name value with an ID automatically generated as a CRC32 of
the Chapter representation.
- The `--group` option has been renamed to `--tag`.
- The config file is now read from three more locations in the following order:
1) The Devine Namespace Folder (e.g., `%appdata%/Python/Python311/site-packages/devine/devine.yaml`).
2) The Parent Folder to the Devine Namespace Folder (e.g., `%appdata%/Python/Python311/site-packages/devine.yaml`).
3) The AppDirs User Config Folder (e.g., `%localappdata%/devine/devine.yaml`).
Location 2 allows having a config at the root of a portable folder.
- An empty config file is no longer created when no config file is found.
- You can now set a default cookie file for a Service, [see README](README.md#cookies--credentials).
- You can now set a default credential for a Service, [see config](CONFIG.md#credentials-dictstr-strlistdict).
- Services are now auth-less by default and the error for not having at least a cookie or credential is removed.
Cookies/Credentials will only be loaded if a default one for the service is available, or if you use `-p/--profile`
and the profile exists.
- Subtitles when converting to SubRip (SRT) via SubtitleEdit will now use the `/ConvertColorsToDialog` option.
- HLS segments are now merged by discontinuity instead of all at once. The merged discontinuities are then finally
merged to one file using `ffmpeg`. Doing the final merge by byte concatenation did not work for some playlists.
- The Track is no longer passed through Event Callables. If you are able to set a function on an Even Callable, then
you should have access to the track reference to call it directly if needed.
- The Track.OnDecrypted event callable is now passed the DRM and Segment objects used to Decrypt. The segment object is
only passed from HLS downloads.
- The Track.OnDownloaded event callable is now called BEFORE decryption, right after downloading, not after decryption.
- All generated Track ID values across the codebase has moved from md5 to crc32 values as code processors complain
about its use surrounding security, and it's length is too large for our use case anyway.
- HLS segments are now downloaded multi-threaded first and then processed in sequence thereafter.
- HLS segments are no longer decrypted one-by-one, requiring a lot of shaka-packager processes to run and close.
They now merged and decrypt in groups based on their EXT-X-KEY, before being merged per discontinuity.
- The DASH and HLS downloaders now pass multiple URLs to the downloader instead of one-by-one, heavily increasing speed
and reliability as connections are kept alive and re-used.
- Downloaders now yield back progress information in the same convention used by `rich`'s `Progress.update()` method.
DASH and HLS now pass the yielded information to their progress callable instead of passing the progress callable to
the downloader.
- The aria2(c) downloader now uses the aria2(c) JSON-RPC interface to query for download progress updates instead of
parsing the stdout data in an extremely hacky way.
- The aria2(c) downloader now re-routes non-HTTP proxies via `pproxy` by a subprocess instead of the now-removed
`start_pproxy` utility. This way has proven to be easier, more reliable, and prevents pproxy from messing with rich's
terminal output in strange ways.
- All downloader function's have an altered signature but ultimately similar. `uri` to `urls`, `out` (path) was removed,
we now calculate the save path by passing an `output_dir` and `filename`. The `silent`, `segmented`, and `progress`
parameters were completely removed.
- All downloader `urls` can now be a string or a dictionary containing extra URL-specific options to use like
URL-specific headers. It can also be a list of the two types of URLs to downloading multi-threaded.
- All downloader `filenames` can be a static string, or a filename string template with a few variables to use. The
template system used is f-string, e.g., `"file_{i:03}{ext}"` (ext starts with `.` if there's an extension).
- DASH now updates the progress bar when merging segments.
- The `Widevine.decrypt()` method now also searches for shaka-packager as just `packager` as it is the default build
name. (#74)
### Removed
- The `devine auth` command and sub-commands due to lack of support, risk of data, and general quirks with it.
- Removed `profiles` config, you must now specify which profile you wish to use each time with `-p/--profile`. If you
use a specific profile a lot more than others, you should make it the default.
- The `saldl` downloader has been removed as their binary distribution is whack and development has seemed to stall.
It was only used as an alternative to what was at the time the only downloader, aria2(c), as it did not support any
form of Byte Range, but `saldl` did, which was crucial for resuming extremely large downloads or complex playlists.
However, now we have the requests downloader which does support the Range header.
- The `Track.needs_proxy` property was removed for a few design architectural reasons.
1) Design-wise it isn't valid to have --proxy (or via config/otherwise) set a proxy, then unpredictably have it
bypassed or disabled. If I specify `--proxy 127.0.0.1:8080`, I would expect it to use that proxy for all
communication indefinitely, not switch in and out depending on the track or service.
2) With reason 1, it's also a security problem. The only reason I implemented it in the first place was so I could
download faster on my home connection. This means I would authenticate and call APIs under a proxy, then suddenly
download manifests and segments e.t.c under my home connection. A competent service could see that as an indicator
of bad play and flag you.
3) Maintaining this setup across the codebase is extremely annoying, especially because of how proxies are setup/used
by Requests in the Session. There's no way to tell a request session to temporarily disable the proxy and turn it
back on later, without having to get the proxy from the session (in an annoying way) store it, then remove it,
make the calls, then assuming your still in the same function you can add it back. If you're not in the same
function, well, time for some spaghetti code.
- The `Range.Transfer.SDR_BT_601_625 = 5` key and value has been removed as I cannot find any official source to verify
it as the correct use. However, usually a `transfer` value of `5` would be PAL SD material so it better matches `6`,
which is (now named) `Range.Transfer.BT_601 = 6`. If you have something specifying transfer=5, just remap it to 6.
- The warning log `There's no ... Audio Tracks, likely part of an invariant playlist, continuing...` message has been
removed. So long as your playlist is expecting no audio tracks, or the audio is part of the video transport, then
this wouldn't be a problem whatsoever. Therefore, having it log this annoying warning all the time is pointless.
- The `--min-split-size` argument to the aria2(c) downloader as it was only used to disable splitting on
segmented downloads, but the newer downloader system wouldn't really need or want this to be done. If aria2 has
decided based on its other settings to have split a segment file, then it likely would benefit from doing so.
- The `--remote-time` argument from the aria2(c) downloader as it may need to do a GET and a HEAD request to
get the remote time information, slowing the download down. We don't need this information anyway as it will likely
be repacked with `ffmpeg` or multiplexed with `mkvmerge`, discarding/losing that information.
- DASH and HLS's 5-attempt retry loop as the downloaders will retry for us.
- The `start_pproxy` utility has been removed as all uses of it now call `pproxy` via subprocess instead.
- The `LANGUAGE_MUX_MAP` constant and it's usage has been removed as it is no longer necessary as of MKVToolNix v54.
### Fixed
- Uses of `__ALL__` with Class objects have been correct to `__all__` with string objects, following PEP8.
- Fixed value of URL passed to `Track.get_key_id()` as it was a tuple rather than the URL string.
- The `--skip-dl` flag now works again after breaking in v[1.3.0].
- Move WVD file to correct location on new installations in the `wvd add` command.
- Cookie data is now passed to downloaders and use URLs based on the URI it will be used for, just like a browser.
- Failure to get FPS in DASH when SegmentBase isn't used.
- An error message is now returned if a WVD file fails to load instead of raising an exception.
- Track language information within M3U playlists are now validated with langcodes before use. Some manifests use the
property for arbitrary data that their apps/players use for their own purposes.
- Attempt to fix non-UTF-8 and mixed-encoding Subtitle downloads by automatically converting to UTF-8. (#43)
Decoding is attempted in the following order: UTF-8, CP-1252, then finally chardet detection. If it's neither UTF-8
nor CP-1252 and chardet could not detect the encoding, then it is left as-is. Conversion is done per-segment if the
Subtitle is segmented, unless it's the fVTT or fTTML formats which are binary.
- Chapter Character Encoding is now explicitly set to UTF-8 when muxing to an MKV container as Windows seems to default
to latin1 or something, breaking Chapter names with any sort of special character within.
- Subtitle passed through SubtitleEdit now explicitly use UTF-8 character encoding as it usually defaulted to UTF-8
with Byte Order Marks (aka UTF-8-SIG/UTF-8-BOM).
- Subtitles passed through SubtitleEdit now use the same output format as the subtitle being processed instead of SRT.
- Fixed rare infinite loop when the Server hosting the init/header data/segment file responds with a `Content-Length`
header with a value of `0` or smaller.
- Removed empty caption lists/languages when parsing Subtitles with `Subtitle.parse()`. This stopped conversions to SRT
containing the `MULTI-LANGUAGE SRT` header when there was multiple caption lists, even though only one of them
actually contained captions.
- Text-based Subtitle formats now try to automatically convert to UTF-8 when run through `Subtitle.parse()`.
- Text-based Subtitle formats now have `&lrm;` and `&rlm;` HTML entities unescaped post-download as some rendering
libraries seems to not decode them for us. SubtitleEdit also has problems with `/ReverseRtlStartEnd` unless it's
already decoded.
- Fixed two concatenation errors surrounding DASH's BaseURL, sourceURL, and media values that start with or use `../`.
- Fixed the number values in the `Newly added to x/y Vaults` log, which now states `Cached n Key(s) to x/y Vaults`.
- File write handler now flushes after appending a new segment to the final save path or checkpoint file, reducing
memory usage by quite a bit in some scenarios.
### New Contributors
- [Shivelight](https://github.com/Shivelight)
## [2.2.0] - 2023-04-23
### Breaking Changes
Since `-q/--quality` has been reworked to support specifying multiple qualities, the type of this value is
no longer `None|int`. It is now `list[int]` and the list may be empty. It is no longer ever a `None` value.
Please make sure any Service code that uses `quality` via `ctx.parent.params` reflects this change. You may
need to go from an `if quality: ...` to `for res in quality: ...`, or such. You may still use `if quality`
to check if it has 1 or more resolution specified, but make sure that the code within that if tree supports
more than 1 value in the `quality` variable, which is now a list. Note that the list will always be in
descending order regardless of how the user specified them.
### Added
- Added the ability to specify and download multiple resolutions with `-q/--quality`. E.g., `-q 1080p,720p`.
- Added support for DASH manifests that use SegmentList with range values on the Initialization definition (#47).
- Added a check for `uuid` mp4 boxes containing `tenc` box data when getting the Track's Key ID to improve
chances of finding a Key ID.
### Changed
- The download path is no longer printed after each download. The simple reason is it felt unnecessary.
It filled up a fair amount of vertical space for information you should already know.
- The logs after a download finishes has been split into two logs. One after the actual downloading process
and the other after the multiplexing process. The downloading process has its own timer as well, so you can
see how long the downloads itself took.
- I've switched from using the official pymp4 (for now) with my fork. At the time this change was made the
original bearypig pymp4 repo was stagnant and the PyPI releases were old. I forked it, added some fixes
by TrueDread and released my own update to PyPI, so it's no longer outdated. This was needed for some
mp4 box parsing fixes. Since then the original repo is no longer stagnant, and a new release was made on
PyPI. However, my repo still has some of TrueDread's fixes that is not yet on the original repository nor
on PyPI.
### Removed
- Removed the `with_resolution` method in the Tracks class. It has been replaced with `by_resolutions`. The
new replacement method supports getting all or n amount of tracks by resolution instead of the original
always getting all tracks by resolution.
- Removed the `select_per_language` method in the Tracks class. It has been replaced with `by_language`. The
new replacement method supports getting all or n amount of tracks by language instead of the original only
able to get one track by language. It now defaults to getting all tracks by language.
### Fixed
- Prevented some duplicate Widevine tree logs under specific edge-cases.
- The Subtitle parse method no longer absorbs the syntax error message.
- Replaced all negative size values with 0 on TTML subtitles as a negative value would cause syntax errors.
- Fixed crash during decryption when shaka-packager skips decryption of a segment as it had no actual data and
was just headers.
- Fixed CCExtractor crash in some scenarios by repacking the video stream prior to extraction.
- Fixed rare crash when calculating download speed of DASH and HLS downloads where a segment immediately finished
after the previous segment. This seemed to only happen on the very last segment in rare situations.
- Fixed some failures parsing `tenc` mp4 boxes when obtaining the track's Key ID by using my own fork of pymp4
with up-to-date code and further fixes.
- Fixed crashes when parsing some `tenc` mp4 boxes by simply skipping `tenc` boxes that fail to parse. This happens
because some services seem to mix up the data of the `tenc` box with that of another type of box.
- Fixed using invalid `tenc` boxes by skipping ones with a version number greater than 1.
## [2.1.0] - 2023-03-16
### Added
- The Track get_init_segment method has been re-written to be more controllable. A specific Byte-range, URL, and
maximum size can now be specified. A manually specified URL will override the Track's current URL. The Byte-range
will override the fallback value of `0-20000` (where 20000 is the default `maximum_size`). It now also checks if the
server supports Byte-range, or it will otherwise stream the response. It also tries to get the file size length and
uses that instead of `maximum_size` unless it's bigger than `maximum_size`.
- Added new `get_key_id` method to Track to probe the track for a track-specific Encryption Key ID. This is similar to
Widevine's `from_track` method but ignores all `pssh` boxes and manifest information as the information within those
could be for a wider range of tracks or not for that track at all.
- Added a 5-attempt retry system to DASH and HLS downloads. URL downloads only uses aria2(c)'s built in retry system
which has the same amount of tries and same delay between attempts. Any errors emitted when downloading segments will
not be printed to console unless it occurred on the last attempt.
- Added a fallback way to obtain language information by taking it from the representation ID value, which may have the
language code within it. E.g., `audio_en=128000` would be an English audio track at 128kb/s. We now take the `en`
from that ID where possible.
- Added support for 13-char JS-style timestamp values to the Cacher system.
- Improved Forced Subtitle recognition by checking for both `forced-subtitle` and `forced_subtitle` (#43).
### Changed
- The `*` symbol is no longer spaced after the Widevine `KID:KEY` when denoting that it is for this specific PSSH.
This reduces wasted vertical space.
- The "aria2 will resume download if the transfer is restarted" logs that occur when aria2(c) handles the CTRL+C break,
and "If there are any errors, then see the log file" logs are now ignored and no longer logged to the console.
- DASH tracks will no longer prepare and license DRM unless it's just about to download. This is to reduce unnecessary
preparation of DRM if the track had been converted to a URL download.
- For a fix listed below, we now use a fork of https://github.com/globocom/m3u8 that fixes a glaring problem with the
EXT-X-KEY parsing system. See <https://github.com/globocom/m3u8/pull/313>.
- The return code when mkvmerge returns an error is now logged with the error message.
- SubtitleEdit has been silenced when using it for SDH stripping.
### Fixed
- Fixed URL joining and Base URL calculations on DASH manifests that use multiple Base URL values.
- URL downloads will now store the chosen DRM before preparing and licensing with the DRM.
- URL downloads will now prepare and license with the DRM if the Track has pre-existing DRM information. Previously it
would only prepare and license DRM if it did not pre-emptively have DRM information before downloading.
- The `*` symbol that indicates that the KID:KEY is for the track being downloaded now uses the new `get_key_id` method
of the track for a more accurate reading.
- License check now ensures if a KEY was returned for the Track instead of all KIDs of the Track's PSSH. This prevents
an issue where the PSSH may have Key IDs for a 720p and 1080p track, yet only a KEY for the 720p track was returned.
It would have then raised an error and stopped the download, even though you are downloading the 720p track and not
the 1080p track, therefore the error was irrelevant.
- Unnecessary duplicate license calls are now prevented in some scenarios where `--cdm-only` is used.
- Fixed accuracy and speed of preparing and licensing DRM on HLS manifests where multiple EXT-X-KEY definitions appear
in the manifest throughout the file. Using <https://github.com/globocom/m3u8/pull/313> we can now accurately get a
list of EXT-X-KEYs mapped to each segment. This is a game changer for HLS manifests that use unique keys for every
single (or most) segments as it would have otherwised needed to initialize (and possibly do network requests) for
100s of EXT-X-KEY information, per segment. This caused downloads of HLS manifests that used a unique key per segment
to slow to a binding crawl, and still not even decrypt correctly as it wouldn't be able to map the correct initialized
key to the correct segment.
- Fixed a regression that incorrectly implemented the OnMultiplex event for Audio and Subtitle tracks causing them to
never trigger. It would instead accidentally have trigger the last Video track's OnMultiplex event instead of the
Audio or Subtitle's event.
- The above fix also fixed the automatic SDH stripping subtitle. Any automatically created SDH->non-SDH subtitle from
prior downloads would not have actually had SDH captions stripped, it would instead be a duplicate subtitle.
### New Contributors
- [Hollander-1908](https://github.com/Hollander-1908)
## [2.0.1] - 2023-03-07
### Added
- Re-added logging support for shaka-packager on errors and warnings. Do note that INFO logs and the 'Insufficient bits
in bitstream for given AVC profile' warning logs are ignored and never printed.
- Added new exceptions to the Widevine DRM class, `CEKNotFound` and `EmptyLicense`.
- Added support for Byte-ranges on HLS init maps.
### Changed
- Now lists the full 'Episode #' text when listing episode titles without an episode name.
- Subprocess exceptions from a download worker no longer prints a traceback. It now only logs the return code. This is
because all subprocess errors during a download is now logged, therefore the full traceback is no longer necessary.
- Aria2(c) no longer pre-allocates file space if segmented. This is to reduce generally unnecessary upfront I/O usage.
- The Widevine DRM class's `get_content_keys` method now raises the new `CEKNotFound` and `EmptyLicense` exceptions not
`ValueError` exceptions.
- The prepare_drm code now raises exceptions where needed instead of `sys.exit(1)`. Callees do not need to make any
changes. The exception should continue to go up the call stack and get handled by the `dl` command.
### Fixed
- Fixed regression that broke support for pproxy. Do note that while pproxy has wheel's for Python 3.11+, it seems to
be broken. I recommend using Python 3.10 or older for now. See <https://github.com/qwj/python-proxy/issues/161>.
- Fixed regression and now store the chosen DRM object back to the track.drm field. Please note that using the track
DRM field in Service code is not recommended, but for some services it's simply required.
- Fixed regression since v1.4.0 where the byte-range calculation was actually slightly off one on the right-side range.
This was a one-indexed vs. zero-indexed problem. Please note that this could have affected the integrity of HLS
downloads if they used EXT-X-BYTERANGE.
- Fixed possible soft-lock in HLS if the Queue for previous segment key and init data gets stuck in an empty state over
an exception in a download thread. E.g., if a thread takes the previous segment key, throws an exception, and did not
get the chance to give it back for the next thread.
- The prepare_drm function now handles unexpected exceptions raised in the Service's license method. This code would of
otherwise been absorbed and the download would have soft-locked.
- Prevented a double-licensing call race-condition on HLS tracks by using a threading lock when preparing DRM
information. This is not required in DASH, as it prepares DRM on the main thread, once, not per-segment.
- Fixed printing of aria2(c) logs when redirecting progress information to rich progress bars.
- Explicitly mark DASH and HLS aria2(c) downloads as segmented.
- Fixed listing of episode titles without an episode name.
- Fixed centering of the project URL in the ASCII banner.
- Removed the accidental double-newline after the ASCII banner.
## [2.0.0] - 2023-03-01
This release brings a huge change to the fundamentals of Devine's logging, UI, and UX.
### Added
- Add new dependency [rich](https://github.com/Textualize/rich) for advanced color and logging capabilities.
- Set rich console output color scheme to the [Catppuccin Mocha](https://github.com/catppuccin/palette) theme.
- Add full download cancellation support by using CTRL+C. Track downloads will now be marked as STOPPED if you press
CTRL+C to stop the download, or FAILED if any unexpected exception occurs during a download. The track will be marked
as SKIPPED if the download stopped or failed before it got a chance to begin. It will print a download cancelled
message if downloading was stopped, or a download error message if downloading failed. It will print the first
download error traceback with rich before stopping.
- Downloads will now automatically cancel if any track or segment download fails.
- Implement sub-commands `add` and `delete` to the `wvd` command for adding and deleting WVD (Widevine Device) files to
and from the configured WVDs directory (#31).
- Add new config option to disable the forced background color. You may want to disable the purple background if you're
terminal isn't able to apply it correctly, or you prefer to use your own terminal's background color.
- Create `ComfyConsole`, `ComfyLogRenderer`, and `ComfyRichHandler`. These are hacky classes to implement padding to
the left and right of all rich console output. This gives devine a comfortable and freeing look-and-feel.
- An ASCII banner is now displayed at the start of software execution with the version number.
- Add rich status output to various parts of the download process. It's also used when checking GEOFENCE within the
base Service class. I encourage you to follow similar procedures where possible in Service code. This will result in
cleaner log output, and overall less logs being made when finished.
- Add three rich horizontal rules to separate logs during the download process. The Service used, the Title received
from `get_titles()`, and then the Title being downloaded. This helps identify which logs are part of which process.
- Add new `tree` methods to `Series`, `Movies`, and `Album` classes to list items within the objects with Rich Tree.
This allows for more rich console output when displaying E.g., Seasons and Episodes within a Series, or Songs within
an Album.
- Add new `tree` method to the `Tracks` class to list the tracks received from `get_tracks()` with Rich Tree. Similar
to the change just above, this allows for more rich console output. It has replaced the `Tracks.print()` method.
- Add a rich progress bar to the track multiplexing operation.
- Add a log when a download finishes, how long it took, and where the final muxed file was moved to.
- Add a new track event, `OnMultiplex`. This event is run prior to multiplexing the finalized track data together. Use
this to run code once a track has finished downloading and all the post-download operations.
- Add support for mapping Netflix profiles beginning with `h264` to AVC. E.g., the new -QC profiles.
- Download progress bars now display the download speed. It displays in decimal (^1024) size. E.g., MB/s.
- If a download stops or fails, any residual file that may have been downloaded in an incomplete OR complete state will
now be deleted. Download continuation is not yet supported, and this will help to reduce leftover stale files.
### Changed
- The logging base config now has `ComfyRichHandler` as its log handler for automatic rich console output when using
the logging system.
- The standard `traceback` module has been overridden with `rich.traceback` for styled traceback output.
- Only the rich console output is now saved when using `--log`.
- All `tqdm` progress bars have been replaced with rich progress bars. The rich progress bars are now displayed under
each track tree.
- The titles are now only listed if `--list-titles` is used. Otherwise, only a brief explanation of what it received
from `get_titles()` will be returned. E.g., for Series it will list how many seasons and episodes were received.
- Similarly, all available tracks are now only listed if `--list` is used. This is to reduce unnecessary prints, and to
separate confusion between listings of available tracks, and listings of tracks that are going to be downloaded.
- Listing all available tracks with `--list` no longer continues execution. It now stops after the first list. If you
want to list available tracks for a specific title, use `-w` in combination with `--list`.
- The available tracks are now printed in a rich panel with a header denoting the tracks as such.
- The `Series`, `Movies`, and `Album` classes now have a much more simplified string representation. They now simply
state the overarching content within them. E.g., Series says the title and year of the TV Show.
- The final log when all titles are processed is now a rich log and states how long the entire process took.
- Widevine DRM license information is now printed below the tracks as a rich tree.
- The CCExtractor process, Subtitle Conversion process, and FFmpeg Repacking process were all moved out of the track
download function (and therefore the thread) to be done on the main thread after downloading. This improves download
speed as the threads can close and be freed quicker for the next track to begin.
- The CCExtractor process is now optional and will be skipped if the binary could not be found. An error is still
logged in the cases where it would have run.
- The execution point of the `OnDownloaded` event has been moved to directly run after the stream has been downloaded.
It used to run after all the post-download operations finished like CCExtractor, FFmpeg Repacking, and Subtitle
Conversion.
- The automatic SDH-stripped subtitle track now uses the new `OnMultiplex` event instead of `OnDownloaded`. This is to
account for the previous change as it requires the subtitle to be first converted to SubRip to support SDH-stripping.
- Logs during downloads now appear before the downloading track list. This way it isn't constantly interrupting view of
the progress.
- Now running aria2(c) with normal subprocess instead of through asyncio. This removes the creation of yet another
thread which is unnecessary as these calls would have already been under a non-main thread.
- Moved Widevine DRM licensing calls before the download process for normal URL track downloads.
- Segment Merging code for DASH and HLS downloads have been moved from the `dl` class to the HLS and DASH class.
### Removed
- Remove explicit dependency on `coloredlogs` and `colorama` as they are no longer used by devine itself.
- Remove dependency `tqdm` as it was replaced with rich progress bars.
- Remove now-unused logging constants like the custom log formats.
- Remove `Tracks.print()` function as it was replaced with the new `Tracks.tree()` function.
- Remove unnecessary sleep calls at the start of threads. This was believed to help with the download stop event check
but that was not the case. It instead added an artificial delay with downloads.
### Fixed
- Fix another crash when using devine without a config file. It now creates the directory of the config file before
making a new config file.
- Set the default aria2(c) file-allocation to `prealloc` like stated in the config documentation. It uses `prealloc` as
the default, as `falloc` is generally unsupported in most scenarios, so it's not a good default.
- Correct the config documentation in regard to `proxies` now being called `proxy_providers`, and `basic` actually
being a `dict` of lists, and not a `dict` of strings.
## [1.4.0] - 2023-02-25
### Added
- Add support for byte-ranged HLS and DASH segments, i.e., HLS EXT-X-BYTERANGE and DASH SegmentBase. Byte-ranged
segments will be downloaded using python-requests as aria2(c) does not support byte ranges.
- Added support for data URI scheme in ClearKey DRM, including support for the base64 extension.
### Changed
- Increase the urllib3 connection pool max size from the default 10 to 16 * 2. This is to accommodate up to 16
byte-ranged segment downloads while still giving enough room for a few other connections.
- The urllib3 connection pool now blocks and waits if it's full. This removes the Connection Pool Limit warnings when
downloading more than one byte-ranged segmented track at a time.
- Moved `--log` from the `dl` command to the entry command to allow logging of more than just the download command.
With this change, the logs now include the initial root logs, including the version number.
- Disable the urllib3 InsecureRequestWarnings as these seem to occur when using HTTP+S proxies when connecting to an
HTTPS URL. While not ideal, we can't solve this problem, and the warning logs are quite annoying.
### Removed
- Remove the `byte_range` parameter from the aria2(c) downloader that was added in v1.3.0 as it turns out it doesn't
actually work. Theoretically it should, but it seems aria2(c) doesn't honor the Range header correctly and fails.
### Fixed
- Fix the JOC check on HLS playlists to check if audio channels are defined first.
- Fix decryption of AES-encrypted segments that are not pre-padded to AES-CBC boundary size (16 bytes).
- Fix the order of segment merging on Linux machines. On Windows, the `pathlib.iterdir()` function is always in order.
However, on Linux, or at least some machines, this was not the case.
- Fix printing of the traceback when a download worker raises an unexpected exception.
- Fix initial creation of the config file if none was created yet.
## [1.3.1] - 2023-02-23
### Fixed
- Fixed a regression where the `track.path` was only updated for `Descriptor.URL` downloads if it had DRM. This caused
downloads of subtitles or DRM-free tracks using the `URL` descriptor to be broken (#33).
- Fixed a regression where `title` and `track` were not passed to the Service's functions for getting Widevine Service
Certificates and Widevine Licenses.
- Corrected the Cookie Path that was logged when adding cookies with `devine auth add`.
- The Config data is now defaulted to an empty dictionary when completely empty or non-existent. This fixes a crash if
you try to use `devine auth add` without a config file.
## [1.3.0] - 2023-02-22
## Deprecated
- Support for Python 3.8 has been dropped. Support for Windows 7 ended in January 2020.
- Although Python 3.8 is the last version with support for Windows 7, the decision was made to drop support because
the number of affected users would be low.
- You may be interested in <https://github.com/adang1345/PythonWin7>, which has newer installers with patched support.
### Added
- Segmented HLS and DASH downloads now provide useful progress information using TQDM. Previously, aria2c would print
progress information, but it was not very useful for segmented downloads due to how the information was presented.
- Segmented HLS and DASH downloads are now manually multi-threaded in a similar way to aria2c's `--j=16`.
- A class-function was added to the Widevine DRM class to obtain PSSH and KID information from init data by looking for
PSSH and TENC boxes. This is an alternative to the from_track class-function when you only have the init data and not
a track object.
- Aria2c now has the ability to silence progress output and provide extra arguments.
### Changed
- The downloading system for HLS and DASH has been completely reworked. It no longer downloads segments, merges them,
and then decrypts. Instead, it now downloads and decrypts each individual segment. It dynamically switches DRM and
Init Data per-segment where needed, fully supporting multiple EXT-X-KEY, EXT-X-MAP, and EXT-X-DISCONTINUITY tags in
HLS. You can now download DRM-encrypted and DRM-free segments from within the same manifest, as well as manifests
with unique DRM per-segment. None of this was possible with the old method of downloading.
- If a HLS manifest or segment uses an EXT-X-KEY with the method of NONE, it is assumed that the manifest or segment is
DRM-free. This behavior applies even if the manifest or segment has other EXT-X-KEY methods specified, as that would
be a mistake in the manifest.
- HLS now uses the proxy when loading AES-128 DRM as ClearKey objects, which is required for some services. It will
only be used if `Track.needs_proxy` is True.
- The Widevine and ClearKey DRM classes decrypt functions no longer ask for a track. Instead, they ask for an input
file path to which it will decrypt. It will automatically delete the input file and put the decrypted data in its
place.
### Removed
- The AtomicSQL utility was removed because it did not actually assist in making the SQL connections thread-safe. It
helped, but in an almost backwards and over-thought approach.
### Fixed
- The Cacher expiration check now uses your local datetime timestamp over the UTC timestamp, which seems to have fixed
early or late expiration if you are not at exactly UTC+00:00.
- The cookies file path is now checked to exist if supplied with the `--cookies` argument (#30).
- An error is now logged, and execution will end if none of the DRM for a HLS manifest or segment is supported.
- HLS now only loads AES-128 EXT-X-KEY methods as ClearKey DRM because it currently only supports AES-128.
- AtomicSQL was replaced with connection factory systems using thread-safe storage for SQL connections. All Vault SQL
calls are now fully thread-safe.
## [1.2.0] - 2023-02-13
### Deprecation Warning
- This release marks the end of support for Python 3.8.x.
- Although version 1.0.0 was intended to support Python 3.8.x, PyCharm failed to warn about a specific type annotation
incompatibility. As a result, I was not aware that the support was not properly implemented.
- This release adds full support for Python 3.8.x, but it will be the only release with such support.
### Added
- The `dl` command CLI now includes Bitrate Selection options: `-vb/--vbitrate` and `-ab/--abitrate`.
- The `dl` command CLI now includes an Audio Channels Selection option: `-c/--channels`.
- If a download worker fails abruptly, a full traceback will now be printed.
- The aria2c downloader has a new parameter for downloading a specific byte range.
### Changed
- The usage of `Path.with_stem` with `Path.with_suffix` has been simplified to `Path.with_name`.
- When printing audio track information, the assumption that the audio is `2.0ch` has been removed.
- If audio channels were previously set as an integer value, they are no longer transformed as e.g., `6ch` and now
follow the normal behavior of being defined as a float value, e.g., `6.0`.
- Audio channels are now explicitly parsed as float values, therefore parsing of values such as `16/JOC` (HLS) is no
longer supported. The HLS manifest parser now assumes the track to be `5.1ch` if the channels value is set to
`.../JOC`.
### Fixed
- Support for Python `>=3.8.6,<3.9.0` has been fixed.
- The final fallback FPS value is now only obtained from the SegmentBase's timescale value if it exists.
- The FutureWarning that occurred when getting Segment URLs from SegmentTemplate DASH manifests has been removed.
- The HLS manifest parser now correctly sets the audio track's `joc` parameter.
- Some Segmented WEBVTT streams may have included the WEBVTT header data when converting to SubRip SRT. This issue has
been fixed by separating the header from any previous caption before conversion.
- The DASH manifest parser now uses the final redirected URL as the manifest URI (#25).
- File move operations from or to different drives (e.g., importing a cookie from another drive in `auth add`) (#27).
### New Contributors
- [Arias800](https://github.com/Arias800)
- [varyg1001](https://github.com/varyg1001)
## [1.1.0] - 2023-02-07
### Added
- Added utility to change the video range flag between full(pc) and limited(tv).
- Added utility to test decoding of video and audio streams using FFmpeg.
- Added CHANGELOG.md
### Changed
- The services and profiles listed by `auth list` are now sorted alphabetically.
- An explicit error is now logged when adding a Cookie to a Service under a duplicate name.
### Fixed
- Corrected the organization name across the project from `devine` to `devine-dl` as `devine` was taken.
- Fixed startup crash if the config was not yet created or was blank.
- Fixed crash when using the `cfg` command to set a config option on new empty config files.
- Fixed crash when loading key vaults during the `dl` command.
- Fixed crash when using the `auth list` command when you do not have a `Cookies` data directory.
- Fixed crash when adding a Cookie using `auth add` to a Service that has no directory yet.
- Fixed crash when adding a Credential using `auth add` when it's the first ever credential, or first for the Service.
## [1.0.0] - 2023-02-06
Initial public release under the name Devine.
[3.3.3]: https://github.com/devine-dl/devine/releases/tag/v3.3.3
[3.3.2]: https://github.com/devine-dl/devine/releases/tag/v3.3.2
[3.3.1]: https://github.com/devine-dl/devine/releases/tag/v3.3.1
[3.3.0]: https://github.com/devine-dl/devine/releases/tag/v3.3.0
[3.2.0]: https://github.com/devine-dl/devine/releases/tag/v3.2.0
[3.1.0]: https://github.com/devine-dl/devine/releases/tag/v3.1.0
[3.0.0]: https://github.com/devine-dl/devine/releases/tag/v3.0.0
[2.2.0]: https://github.com/devine-dl/devine/releases/tag/v2.2.0
[2.1.0]: https://github.com/devine-dl/devine/releases/tag/v2.1.0
[2.0.1]: https://github.com/devine-dl/devine/releases/tag/v2.0.1
[2.0.0]: https://github.com/devine-dl/devine/releases/tag/v2.0.0
[1.4.0]: https://github.com/devine-dl/devine/releases/tag/v1.4.0
[1.3.1]: https://github.com/devine-dl/devine/releases/tag/v1.3.1
[1.3.0]: https://github.com/devine-dl/devine/releases/tag/v1.3.0
[1.2.0]: https://github.com/devine-dl/devine/releases/tag/v1.2.0
[1.1.0]: https://github.com/devine-dl/devine/releases/tag/v1.1.0
[1.0.0]: https://github.com/devine-dl/devine/releases/tag/v1.0.0

175
CONFIG.md
View File

@ -10,6 +10,13 @@ which does not keep comments.
## aria2c (dict)
- `max_concurrent_downloads`
Maximum number of parallel downloads. Default: `min(32,(cpu_count+4))`
Note: Overrides the `max_workers` parameter of the aria2(c) downloader function.
- `max_connection_per_server`
Maximum number of connections to one server for each download. Default: `1`
- `split`
Split a file into N chunks and download each chunk on its own connection. Default: `5`
- `file_allocation`
Specify file allocation method. Default: `"prealloc"`
@ -59,27 +66,52 @@ DSNP:
default: chromecdm_903_l3
```
## credentials (dict)
## chapter_fallback_name (str)
Specify login credentials to use for each Service by Profile as Key (case-sensitive).
The Chapter Name to use when exporting a Chapter without a Name.
The default is no fallback name at all and no Chapter name will be set.
The value should be `email:password` or `username:password` (with some exceptions).
The first section does not have to be an email or username. It may also be a Phone number.
The fallback name can use the following variables in f-string style:
- `{i}`: The Chapter number starting at 1.
E.g., `"Chapter {i}"`: "Chapter 1", "Intro", "Chapter 3".
- `{j}`: A number starting at 1 that increments any time a Chapter has no title.
E.g., `"Chapter {j}"`: "Chapter 1", "Intro", "Chapter 2".
These are formatted with f-strings, directives are supported.
For example, `"Chapter {i:02}"` will result in `"Chapter 01"`.
## credentials (dict[str, str|list|dict])
Specify login credentials to use for each Service, and optionally per-profile.
For example,
```yaml
AMZN:
ALL4: jane@gmail.com:LoremIpsum100 # directly
AMZN: # or per-profile, optionally with a default
default: jane@example.tld:LoremIpsum99 # <-- used by default if -p/--profile is not used
james: james@gmail.com:TheFriend97
jane: jane@example.tld:LoremIpsum99
john: john@example.tld:LoremIpsum98
NF:
NF: # the `default` key is not necessary, but no credential will be used by default
john: john@gmail.com:TheGuyWhoPaysForTheNetflix69420
```
Credentials must be specified per-profile. You cannot specify a fallback or default credential.
The value should be in string form, i.e. `john@gmail.com:password123` or `john:password123`.
Any arbitrary values can be used on the left (username/password/phone) and right (password/secret).
You can also specify these in list form, i.e., `["john@gmail.com", ":PasswordWithAColon"]`.
If you specify multiple credentials with keys like the `AMZN` and `NF` example above, then you should
use a `default` key or no credential will be loaded automatically unless you use `-p/--profile`. You
do not have to use a `default` key at all.
Please be aware that this information is sensitive and to keep it safe. Do not share your config.
## curl_impersonate (dict)
- `browser` - The Browser to impersonate as. A list of available Browsers and Versions are listed here:
<https://github.com/yifeikong/curl_cffi#sessions>
## directories (dict)
Override the default directories used across devine.
@ -90,6 +122,7 @@ The following directories are available and may be overridden,
- `commands` - CLI Command Classes.
- `services` - Service Classes.
- `vaults` - Vault Classes.
- `fonts` - Font files (ttf or otf).
- `downloads` - Downloads.
- `temp` - Temporary files or conversions during download.
- `cache` - Expiring data like Authorization tokens, or other misc data.
@ -120,7 +153,14 @@ For example to set the default primary language to download to German,
lang: de
```
or to set `--bitrate=CVBR` for the AMZN service,
to set how many tracks to download concurrently to 4 and download threads to 16,
```yaml
downloads: 4
workers: 16
```
to set `--bitrate=CVBR` for the AMZN service,
```yaml
lang: de
@ -128,6 +168,26 @@ AMZN:
bitrate: CVBR
```
or to change the output subtitle format from the default (original format) to WebVTT,
```yaml
sub_format: vtt
```
## downloader (str)
Choose what software to use to download data throughout Devine where needed.
Options:
- `requests` (default) - https://github.com/psf/requests
- `aria2c` - https://github.com/aria2/aria2
- `curl_impersonate` - https://github.com/yifeikong/curl-impersonate (via https://github.com/yifeikong/curl_cffi)
Note that aria2c can reach the highest speeds as it utilizes threading and more connections than the other
downloaders. However, aria2c can also be one of the more unstable downloaders. It will work one day, then
not another day. It also does not support HTTP(S) proxies while the other downloaders do.
## headers (dict)
Case-Insensitive dictionary of headers that all Services begin their Request Session state with.
@ -155,12 +215,28 @@ provide the same Key ID and CEK for both Video and Audio, as well as for multipl
You can have as many Key Vaults as you would like. It's nice to share Key Vaults or use a unified Vault on
Teams as sharing CEKs immediately can help reduce License calls drastically.
Two types of Vaults are in the Core codebase, SQLite and MySQL Vaults. Both directly connect to an SQLite or MySQL
Server. It has to connect directly to the Host/IP. It cannot be in front of a PHP API or such. Beware that some Hosts
do not let you access the MySQL server outside their intranet (aka Don't port forward or use permissive network
interfaces).
Three types of Vaults are in the Core codebase, API, SQLite and MySQL. API makes HTTP requests to a RESTful API,
whereas SQLite and MySQL directly connect to an SQLite or MySQL Database.
### Connecting to a MySQL Vault
Note: SQLite and MySQL vaults have to connect directly to the Host/IP. It cannot be in front of a PHP API or such.
Beware that some Hosting Providers do not let you access the MySQL server outside their intranet and may not be
accessible outside their hosting platform.
### Using an API Vault
API vaults use a specific HTTP request format, therefore API or HTTP Key Vault APIs from other projects or services may
not work in Devine. The API format can be seen in the [API Vault Code](devine/vaults/API.py).
```yaml
- type: API
name: "John#0001's Vault" # arbitrary vault name
uri: "https://key-vault.example.com" # api base uri (can also be an IP or IP:Port)
# uri: "127.0.0.1:80/key-vault"
# uri: "https://api.example.com/key-vault"
token: "random secret key" # authorization token
```
### Using a MySQL Vault
MySQL vaults can be either MySQL or MariaDB servers. I recommend MariaDB.
A MySQL Vault can be on a local or remote network, but I recommend SQLite for local Vaults.
@ -186,7 +262,7 @@ make tables yourself.
- You may give trusted users CREATE permission so devine can create tables if needed.
- Other uses should only be given SELECT and INSERT permissions.
### Connecting to an SQLite Vault
### Using an SQLite Vault
SQLite Vaults are usually only used for locally stored vaults. This vault may be stored on a mounted Cloud storage
drive, but I recommend using SQLite exclusively as an offline-only vault. Effectively this is your backup vault in
@ -211,7 +287,33 @@ together.
- `set_title`
Set the container title to `Show SXXEXX Episode Name` or `Movie (Year)`. Default: `true`
## nordvpn (dict)
## proxy_providers (dict)
Enable external proxy provider services. These proxies will be used automatically where needed as defined by the
Service's GEOFENCE class property, but can also be explicitly used with `--proxy`. You can specify which provider
to use by prefixing it with the provider key name, e.g., `--proxy basic:de` or `--proxy nordvpn:de`. Some providers
support specific query formats for selecting a country/server.
### basic (dict[str, str|list])
Define a mapping of country to proxy to use where required.
The keys are region Alpha 2 Country Codes. Alpha 2 Country Codes are `[a-z]{2}` codes, e.g., `us`, `gb`, and `jp`.
Don't get this mixed up with language codes like `en` vs. `gb`, or `ja` vs. `jp`.
Do note that each key's value can be a list of strings, or a string. For example,
```yaml
us:
- "http://john%40email.tld:password123@proxy-us.domain.tld:8080"
- "http://jane%40email.tld:password456@proxy-us.domain2.tld:8080"
de: "https://127.0.0.1:8080"
```
Note that if multiple proxies are defined for a region, then by default one will be randomly chosen.
You can choose a specific one by specifying it's number, e.g., `--proxy basic:us2` will choose the
second proxy of the US list.
### nordvpn (dict)
Set your NordVPN Service credentials with `username` and `password` keys to automate the use of NordVPN as a Proxy
system where required.
@ -236,47 +338,6 @@ You can even set a specific server number this way, e.g., `--proxy=gb2366`.
Note that `gb` is used instead of `uk` to be more consistent across regional systems.
## profiles (dict)
Pre-define Profiles to use Per-Service.
For example,
```yaml
AMZN: jane
DSNP: john
```
You can also specify a fallback value to pre-define if a match was not made.
This can be done using `default` key. This can help reduce redundancy in your specifications.
```yaml
AMZN: jane
DSNP: john
default: james
```
If a Service doesn't require a profile (as it does not require Credentials or Authorization of any kind), you can
disable the profile checks by specifying `false` as the profile for the Service.
```yaml
ALL4: false
CTV: false
```
## proxies (dict)
Define a list of proxies to use where required.
The keys are region Alpha 2 Country Codes. Alpha 2 Country Codes are `a-z{2}` codes, e.g., `us`, `gb`, and `jp`.
Don't get mixed up between language codes like `en` vs. `gb`, or `ja` vs. `jp`.
For example,
```yaml
us: "http://john%40email.tld:password123@proxy-us.domain.tld:8080"
de: "http://127.0.0.1:8888"
```
## remote_cdm (list\[dict])
Use [pywidevine] Serve-compliant Remote CDMs in devine as if it was a local widevine device file.

49
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,49 @@
# Development
This project is managed using [Poetry](https://python-poetry.org), a fantastic Python packaging and dependency manager.
Install the latest version of Poetry before continuing. Development currently requires Python 3.9+.
## Set up
Starting from Zero? Not sure where to begin? Here's steps on setting up this Python project using Poetry. Note that
Poetry installation instructions should be followed from the Poetry Docs: https://python-poetry.org/docs/#installation
1. While optional, It's recommended to configure Poetry to install Virtual environments within project folders:
```shell
poetry config virtualenvs.in-project true
```
This makes it easier for Visual Studio Code to detect the Virtual Environment, as well as other IDEs and systems.
I've also had issues with Poetry creating duplicate Virtual environments in the default folder for an unknown
reason which quickly filled up my System storage.
2. Clone the Repository:
```shell
git clone https://github.com/devine-dl/devine
cd devine
```
3. Install the Project with Poetry:
```shell
poetry install
```
This creates a Virtual environment and then installs all project dependencies and executables into the Virtual
environment. Your System Python environment is not affected at all.
4. Now activate the Virtual environment:
```shell
poetry shell
```
Note:
- You can alternatively just prefix `poetry run` to any command you wish to run under the Virtual environment.
- I recommend entering the Virtual environment and all further instructions will have assumed you did.
- JetBrains PyCharm has integrated support for Poetry and automatically enters Poetry Virtual environments, assuming
the Python Interpreter on the bottom right is set up correctly.
- For more information, see: https://python-poetry.org/docs/basic-usage/#using-your-virtual-environment
5. Install Pre-commit tooling to ensure safe and quality commits:
```shell
pre-commit install
```
## Building Source and Wheel distributions
poetry build
You can optionally specify `-f` to build `sdist` or `wheel` only.
Built files can be found in the `/dist` directory.

View File

@ -671,4 +671,4 @@ into proprietary programs. If your program is a subroutine library, you
may consider it more useful to permit linking proprietary applications with
the library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License. But first, please read
<https://www.gnu.org/licenses/why-not-lgpl.html>.
<https://www.gnu.org/licenses/why-not-lgpl.html>.

396
README.md
View File

@ -1,181 +1,286 @@
<p align="center">
<img src="https://rawcdn.githack.com/rlaphoenix/pywidevine/077a3aa6bec14777c06cbdcb47041eee9791c06e/docs/images/widevine_icon_24.png">
<a href="https://github.com/devine/devine">Devine</a>
<img src="https://user-images.githubusercontent.com/17136956/216880837-478f3ec7-6af6-4cca-8eef-5c98ff02104c.png">
<a href="https://github.com/devine-dl/devine">Devine</a>
<br/>
<sup><em>Open-Source Movie, TV, and Music Downloading Solution</em></sup>
<sup><em>Modular Movie, TV, and Music Archival Software</em></sup>
<br/>
<a href="https://discord.gg/34K2MGDrBN">
<img src="https://img.shields.io/discord/841055398240059422?label=&logo=discord&logoColor=ffffff&color=7289DA&labelColor=7289DA" alt="Discord">
</a>
</p>
<p align="center">
<a href="https://github.com/devine/devine/actions/workflows/ci.yml">
<img src="https://github.com/devine/devine/actions/workflows/ci.yml/badge.svg" alt="Build status">
<a href="https://github.com/devine-dl/devine/actions/workflows/ci.yml">
<img src="https://github.com/devine-dl/devine/actions/workflows/ci.yml/badge.svg" alt="Build status">
</a>
<a href="https://python.org">
<img src="https://img.shields.io/badge/python-3.8.6%2B-informational" alt="Python version">
<img src="https://img.shields.io/badge/python-3.9.0%2B-informational" alt="Python version">
</a>
<a href="https://deepsource.io/gh/devine-dl/devine/?ref=repository-badge">
<img src="https://deepsource.io/gh/devine-dl/devine.svg/?label=active+issues&token=1ADCbjJ3FPiGT_s0Y0rlugGU" alt="DeepSource">
</a>
<br/>
<a href="https://github.com/astral-sh/ruff">
<img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json" alt="Linter: Ruff">
</a>
<a href="https://python-poetry.org">
<img src="https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json" alt="Dependency management: Poetry">
</a>
</p>
## Features
- 🎥 Supports Movies, TV shows, and Music
- 🧩 Easy installation via PIP/PyPI
- 👥 Multi-profile authentication per-service with credentials or cookies
- 🚀 Seamless Installation via [pip](#installation)
- 🎥 Movie, Episode, and Song Service Frameworks
- 🛠️ Built-in [DASH] and [HLS] Parsers
- 🔒 Widevine DRM integration via [pywidevine](https://github.com/devine-dl/pywidevine)
- 💾 Local & Remote DRM Key-vaults
- 🌍 Local & Remote Widevine CDMs
- 👥 Multi-profile Authentication per-service with Credentials and/or Cookies
- 🤖 Automatic P2P filename structure with Group Tag
- 🛠️ Flexible Service framework system
- 📦 Portable Installations
- 🗃️ Local and Remote SQL-based Key Vault database
- ⚙️ YAML for Configuration
- 🌍 Local and Remote Widevine CDMs
- ❤️ Fully Open-Source! Pull Requests Welcome
[DASH]: <devine/core/manifests/dash.py>
[HLS]: <devine/core/manifests/hls.py>
## Installation
```shell
$ pip install devine
```
> __Note__ If you see warnings about a path not being in your PATH environment variable, add it, or `devine` won't run.
> [!NOTE]
> If pip gives you a warning about a path not being in your PATH environment variable then promptly add that path then
> close all open command prompt/terminal windows, or `devine` won't work as it will not be found.
Voilà 🎉! You now have the `devine` package installed and a `devine` executable is now available.
Check it out with `devine --help`!
Voilà 🎉 — You now have the `devine` package installed!
A command-line interface is now available, try `devine --help`.
### Dependencies
The following is a list of programs that need to be installed manually. I recommend installing these with [winget],
[chocolatey] or such where possible as it automatically adds them to your `PATH` environment variable and will be
easier to update in the future.
The following is a list of programs that need to be installed by you manually.
- [aria2(c)] for downloading streams and large manifests.
- [CCExtractor] for extracting Closed Caption data like EIA-608 from video streams and converting as SRT.
- [FFmpeg] (and ffprobe) for repacking/remuxing streams on specific services, and evaluating stream data.
- [MKVToolNix] v54+ for muxing individual streams to an `.mkv` file.
- [shaka-packager] for decrypting CENC-CTR and CENC-CBCS video and audio streams.
- (optional) [aria2(c)] to use as a [downloader](CONFIG.md#downloader-str).
For portable downloads, make sure you put them in your current working directory, in the installation directory,
or put the directory path in your `PATH` environment variable. If you do not do this then their binaries will not be
able to be found.
> [!TIP]
> You should install these from a Package Repository if you can; including winget/chocolatey on Windows. They will
> automatically add the binary's path to your `PATH` environment variable and will be easier to update in the future.
> [!IMPORTANT]
> Most of these dependencies are portable utilities and therefore do not use installers. If you do not install them
> from a package repository like winget/choco/pacman then make sure you put them in your current working directory, in
> Devine's installation directory, or the binary's path into your `PATH` environment variable. If you do not do this
> then Devine will not be able to find the binaries.
[winget]: <https://winget.run>
[chocolatey]: <https://chocolatey.org>
[aria2(c)]: <https://aria2.github.io>
[CCExtractor]: <https://github.com/CCExtractor/ccextractor>
[FFmpeg]: <https://fmpeg.org>
[FFmpeg]: <https://ffmpeg.org>
[MKVToolNix]: <https://mkvtoolnix.download/downloads.html>
[shaka-packager]: <https://github.com/google/shaka-packager/releases/latest>
### Portable installation
## Usage
1. Download a Python Embeddable Package of a supported Python version (the `.zip` download).
(make sure it's either x64/x86 and not ARM unless you're on an ARM device).
2. Extract the `.zip` and rename the folder, if you wish.
3. Open Terminal and `cd` to the extracted folder.
4. Run the following on Windows:
```
(Invoke-WebRequest -Uri https://gist.githubusercontent.com/rlaphoenix/5ef250e61ceeb123c6696c05ad4dee8b/raw -UseBasicParsing).Content | .\python -
```
or the following on Linux/macOS:
```
curl -sSL https://gist.githubusercontent.com/rlaphoenix/5ef250e61ceeb123c6696c05ad4dee8b/raw | ./python -
```
5. Run `.\python -m pip install devine`
First, take a look at `devine --help` for a full help document, listing all commands available and giving you more
information on what can be done with Devine.
You can now call `devine` by,
Here's a checklist on what I recommend getting started with, in no particular order,
- running `./python -m devine --help`, or,
- running `./Scripts/devine.exe --help`, or,
- symlinking the `/Scripts/devine.exe` binary to the root of the folder, for `./devine --help`, or,
- zipping the entire folder to `devine.zip`, for `python devine.zip --help`.
- [ ] Add [Services](#services), these will be used in `devine dl`.
- [ ] Add [Profiles](#profiles-cookies--credentials), these are your cookies and credentials.
- [ ] Add [Widevine Provisions](#widevine-provisions), also known as CDMs, these are used for DRM-protected content.
- [ ] Set your Group Tag, the text at the end of the final filename, e.g., `devine cfg tag NOGRP` for `...-NOGRP`.
- [ ] Set Up a Local Key Vault, take a look at the [Key Vaults Config](CONFIG.md#keyvaults-listdict).
The last method of calling devine, by archiving to a zip file, is incredibly useful for sharing and portability!
I urge you to give it a try!
And here's some more advanced things you could take a look at,
### Services
- [ ] Setting default Headers that the Request Session uses.
- [ ] Setting default Profiles and CDM Provisions to use for services.
- [ ] NordVPN and Hola Proxy Providers for automatic proxies.
- [ ] Hosting and/or Using Remote Key Vaults.
- [ ] Serving and/or Using Remote CDM Provisions.
Devine does not come with any infringing Service code. You must develop your own Service code and place them in
the `/devine/services` directory. There are different ways the add services depending on your installation type.
In some cases you may use multiple of these methods to have separate copies.
Documentation on the config is available in the [CONFIG.md](CONFIG.md) file, it has a lot of handy settings.
If you start to get sick of putting something in your CLI call, then I recommend taking a look at it!
Please refrain from making or using Service code unless you have full rights to do so. I also recommend ensuring that
you keep the Service code private and secure, i.e. a private repository or keeping it offline.
## Services
No matter which method you use, make sure that you install any further dependencies needed by the services. There's
currently no way to have these dependencies automatically install apart from within the Fork method.
Unlike similar project's such as [youtube-dl], Devine does not currently come with any Services. You must develop your
own Services and only use Devine with Services you have the legal right to do so.
> __Warning__ Please be careful with who you trust and what you run. The users you collaborate with on Service
> [!NOTE]
> If you made a Service for Devine that does not use Widevine or any other DRM systems, feel free to make a Pull Request
> and make your service available to others. Any Service on [youtube-dl] (or [yt-dlp]) would be able to be added to the
> Devine repository as they both use the [Unlicense license] therefore direct reading and porting of their code would be
> legal.
[youtube-dl]: <https://github.com/ytdl-org/youtube-dl>
[yt-dlp]: <https://github.com/yt-dlp/yt-dlp>
[Unlicense license]: <https://choosealicense.com/licenses/unlicense>
### Creating a Service
> [!WARNING]
> Only create or use Service Code with Services you have full legal right to do so.
A Service consists of a folder with an `__init__.py` file. The file must contain a class of the same name as the folder.
The class must inherit the [Service] class and implement all the abstracted methods. It must finally implement a new
method named `cli` where you define CLI arguments.
1. Make a new folder within `/devine/services`. The folder name you choose will be what's known as the [Service Tag].
This "tag" is used in the final output filename of downloaded files, for various code-checks, lookup keys in
key-vault databases, and more.
2. Within the new folder create an `__init__.py` file and write a class inheriting the [Service] class. It must be named
the exact same as the folder. It is case-sensitive.
3. Implement all the methods of the Service class you are inheriting that are marked as abstract.
4. Define CLI arguments by implementing a `cli` method. This method must be static (i.e. `@staticmethod`). For example
to implement the bare minimum to receive a Title ID of sorts:
```python
@staticmethod
@click.command(name="YT", short_help="https://youtube.com", help=__doc__)
@click.argument("title", type=str)
@click.pass_context
def cli(ctx, **kwargs):
return YT(ctx, **kwargs)
```
You must implement this `cli` method, even if you do not want or need any CLI arguments. It is required for the core
CLI functionality to be able to find and call the class.
5. Accept the CLI arguments by overriding the constructor (the `__init__()` method):
```python
def __init__(self, ctx, title):
self.title = title
super().__init__(ctx) # important
# ... the title is now available across all methods by calling self.title
```
> [!NOTE]
> - All methods of your class inherited from `Service` marked as abstract (`@abstractmethod`) MUST be implemented by
> your class.
> - When overriding any method (e.g., `__init__()` method) you MUST super call it, e.g., `super().__init__()` at the
> top of the override. This does not apply to any abstract methods, as they are unimplemented.
> - If preparing your Requests Session with global headers or such, then you should override the `get_session` method,
> then modify `self.session`. Do not manually make `self.session` from scratch.
> [!TIP]
> 1. To make web requests use the `self.session` class instance variable, e.g. `self.session.get(url)`.
> 2. If you make a `config.yaml` file next to your `__init__.py`, you can access it with `self.config`.
> 3. You can include any arbitrary file within your Service folder for use by your Service. For example TLS certificate
> files, or other python files with helper functions and classes.
[Service]: <devine/core/service.py>
[Service Tag]: <#service-tags>
### Service Tags
Service tags generally follow these rules:
- Tag must be between 2-4 characters long, consisting of just `[A-Z0-9i]{2,4}`.
- Lower-case `i` is only used for select services. Specifically BBC iPlayer and iTunes.
- If the Service's commercial name has a `+` or `Plus`, the last character should be a `P`.
E.g., `ATVP` for `Apple TV+`, `DSCP` for `Discovery+`, `DSNP` for `Disney+`, and `PMTP` for `Paramount+`.
These rules are not exhaustive and should only be used as a guide. You don't strictly have to follow these rules, but
I recommend doing so for consistency.
### Sharing Services
Sending and receiving zipped Service folders is quite cumbersome. Let's explore alternative routes to collaborating on
Service Code.
> [!WARNING]
> Please be careful with who you trust and what you run. The users you collaborate with on Service
> code could update it with malicious code that you would run via devine on the next call.
#### via Copy & Paste
#### Forking
If you have service code already and wish to just install and use it locally, then simply putting it into the Services
directory of your local pip installation will do the job. However, this method is the worst in terms of collaboration.
If you are collaborating with a team on multiple services then forking the project is the best way to go.
1. Get the installation directory by running the following in terminal,
`python -c 'import os,devine.__main__ as a;print(os.path.dirname(a.__file__))'`
2. Head to the installation directory and create a `services` folder if one is not yet created.
3. Within that `services` folder you may install or create service code.
1. Create a new Private GitHub Repository without README, .gitignore, or LICENSE files.
Note: Do NOT use the GitHub Fork button, or you will not be able to make the repository private.
2. `git clone <your repo url here>` and then `cd` into it.
3. `git remote add upstream https://github.com/devine-dl/devine`
4. `git remote set-url --push upstream DISABLE`
5. `git fetch upstream`
6. `git pull upstream master`
7. (optionally) Hard reset to the latest stable version by tag. E.g., `git reset --hard v1.0.0`.
> __Warning__ Uninstalling Python or Devine may result in the Services you installed being deleted. Make sure you back
> up the services before uninstalling.
Now commit your Services or other changes to your forked repository.
Once committed all your other team members can easily pull changes as well as push new changes.
#### via a Forked Repository
When a new update comes out you can easily rebase your fork to that commit to update.
If you are collaborating with a team on multiple services then forking the project is the best way to go. I recommend
forking the project then hard resetting to the latest stable update by tag. Once a new stable update comes out you can
easily rebase your fork to that commit to update.
1. `git fetch upstream`
2. `git rebase upstream/master`
However, please make sure you look at changes between each version before rebasing and resolve any breaking changes and
deprecations when rebasing to a new version.
1. Fork the project with `git` or GitHub [(fork)](https://github.com/devine/devine/fork).
2. Head inside the root `devine` directory and create a `services` directory.
3. Within that `services` folder you may install or create service code.
If you are new to `git` then take a look at [GitHub Desktop](https://desktop.github.com).
You may now commit changes or additions within that services folder to your forked repository.
Once committed all your other team members can easily sync and contribute changes.
> [!TIP]
> A huge benefit with this method is that you can also sync dependencies by your own Services as well!
> Just use `poetry` to add or modify dependencies appropriately and commit the changed `poetry.lock`.
> However, if the core project also has dependency changes your `poetry.lock` changes will conflict and you
> will need to learn how to do conflict resolution/rebasing. It is worth it though!
> __Note__ You may add Service-specific Python dependencies using `poetry` that can install alongside the project.
> Just do note that this will complicate rebasing when even the `poetry.lock` gets updates in the upstream project.
#### Symlinking
#### via Cloud storage (symlink)
This is a great option for those who wish to do something like the forking method, but without the need of constantly
rebasing their fork to the latest version. Overall less knowledge on git would be required, but each user would need
to do a bit of symlinking compared to the fork method.
This is a great option for those who wish to do something like the forking method, but may not care what changes
happened or when and just want changes synced across a team.
This also opens up the ways you can host or collaborate on Service code. As long as you can receive a directory that
updates with just the services within it, then you're good to go. Options could include an FTP server, Shared Google
Drive, a non-fork repository with just services, and more.
1. Follow the steps in the [Copy & Paste method](#via-copy--paste) to create the `services` folder.
2. Use any Cloud Source that gives you a pseudo-directory to access the Service files. E.g., rclone or google drive fs.
3. Symlink the services directory from your Cloud Source to the new services folder you made.
(you may need to delete it first)
1. Use any Cloud Source that gives you a pseudo-directory to access the Service files like a normal drive. E.g., rclone,
Google Drive Desktop (aka File Stream), Air Drive, CloudPool, etc.
2. Create a `services` directory somewhere in it and have all your services within it.
3. [Symlink](https://en.wikipedia.org/wiki/Symbolic_link) the `services` directory to the `/devine` folder. You should
end up with `/devine/services` folder containing services, not `/devine/services/services`.
Of course, you have to make sure the original folder keeps receiving and downloading/streaming those changes, or that
you keep git pulling those changes. You must also make sure that the version of devine you have locally is supported by
the Services code.
You have to make sure the original folder keeps receiving and downloading/streaming those changes. You must also make
sure that the version of devine you have locally is supported by the Service code.
> __Note__ If you're using a cloud source that downloads the file once it gets opened, you don't have to worry as those
> will automatically download. Python importing the files triggers the download to begin. However, it may cause a delay
> on startup.
> [!NOTE]
> If you're using a cloud source that downloads the file once it gets opened, you don't have to worry as those will
> automatically download. Python importing the files triggers the download to begin. However, it may cause a delay on
> startup.
### Profiles (Cookies & Credentials)
## Cookies & Credentials
Just like a streaming service, devine associates both a cookie and/or credential as a Profile. You can associate up to
one cookie and one credential per-profile, depending on which (or both) are needed by the Service. This system allows
you to configure multiple accounts per-service and choose which to use at any time.
Devine can authenticate with Services using Cookies and/or Credentials. Credentials are stored in the config, and
Cookies are stored in the data directory which can be found by running `devine env info`.
Credentials are stored in the config, and Cookies are stored in the data directory. You can find the location of these
by running `devine env info`. However, you can manage profiles with `devine auth --help`. E.g. to add a new John
profile to Netflix with a Cookie and Credential, take a look at the following CLI call,
`devine auth add John NF --cookie "C:\Users\John\Downloads\netflix.com.txt --credential "john@gmail.com:pass123"`
To add a Credential to a Service, take a look at the [Credentials Config](CONFIG.md#credentials-dictstr-strlistdict)
for information on setting up one or more credentials per-service. You can add one or more Credential per-service and
use `-p/--profile` to choose which Credential to use.
You can also delete a credential with `devine auth delete`. E.g., to delete the cookie for John that we just added, run
`devine auth delete John --cookie`. Take a look at `devine auth delete --help` for more information.
To add a Cookie to a Service, use a Cookie file extension to make a `cookies.txt` file and move it into the Cookies
directory. You must rename the `cookies.txt` file to that of the Service tag (case-sensitive), e.g., `NF.txt`. You can
also place it in a Service Cookie folder, e.g., `/Cookies/NF/default.txt` or `/Cookies/NF/.txt`.
> __Note__ Profile names are case-sensitive and unique per-service. They also have no arbitrary character or length
> limit, but for convenience I don't recommend using any special characters as your terminal may get confused.
You can add multiple Cookies to the `/Cookies/NF/` folder with their own unique name and then use `-p/--profile` to
choose which one to use. E.g., `/Cookies/NF/sam.txt` and then use it with `--profile sam`. If you make a Service Cookie
folder without a `.txt` or `default.txt`, but with another file, then no Cookies will be loaded unless you use
`-p/--profile` like shown. This allows you to opt in to authentication at whim.
#### Cookie file format and Extensions
> [!TIP]
> - If your Service does not require Authentication, then do not define any Credential or Cookie for that Service.
> - You can use both Cookies and Credentials at the same time, so long as your Service takes and uses both.
> - If you are using profiles, then make sure you use the same name on the Credential name and Cookie file name when
> using `-p/--profile`.
> [!WARNING]
> Profile names are case-sensitive and unique per-service. They have no arbitrary character or length limit, but for
> convenience sake I don't recommend using any special characters as your terminal may get confused.
### Cookie file format and Extensions
Cookies must be in the standard Netscape cookies file format.
Recommended Cookie exporter extensions:
@ -192,7 +297,7 @@ Any other extension that exports to the standard Netscape format should theoreti
> versions floating around (usually just older versions of the extension), but since there are safe alternatives I'd
> just avoid it altogether. Source: https://reddit.com/r/youtubedl/comments/10ar7o7
### Widevine Provisions
## Widevine Provisions
A Widevine Provision is needed for acquiring licenses containing decryption keys for DRM-protected content.
They are not needed if you will be using devine on DRM-free services. Please do not ask for any Widevine Device Files,
@ -208,50 +313,9 @@ From here you can then set which WVD to use for each specific service. It's best
provision where possible.
An alternative would be using a pywidevine Serve-compliant CDM API. Of course, you would need to know someone who is
serving one, and they would need to give you access. Take a look at the [remote_cdm](CONFIG.md#remotecdm--listdict--)
serving one, and they would need to give you access. Take a look at the [remote_cdm](CONFIG.md#remotecdm-listdict)
config option for setup information. For further information on it see the pywidevine repository.
## Usage
First, take a look at `devine --help` for a full help document, listing all commands available and giving you more
information on what can be done with Devine.
Here's a checklist on what I recommend getting started with, in no particular order,
- [ ] Add [Services](#services), these will be used in `devine dl`.
- [ ] Add [Profiles](#profiles--cookies--credentials-), these are your cookies and credentials.
- [ ] Add [Widevine Provisions](#widevine-provisions), also known as CDMs, these are used for DRM-protected content.
- [ ] Set your Group Tag, the text at the end of the final filename, e.g., `devine cfg tag NOGRP` for ...-NOGRP.
- [ ] Set Up a Local Key Vault, take a look at the [Key Vaults Config](CONFIG.md#keyvaults--listdict--).
And here's some more advanced things you could take a look at,
- [ ] Setting default Headers that the Request Session uses.
- [ ] Setting default Profiles and CDM Provisions to use for services.
- [ ] NordVPN and Hola Proxy Providers for automatic proxies.
- [ ] Hosting and/or Using Remote Key Vaults.
- [ ] Serving and/or Using Remote CDM Provisions.
Documentation on the config is available in the [CONFIG.md](CONFIG.md) file, it has a lot of handy settings.
If you start to get sick of putting something in your CLI call, then I recommend taking a look at it!
## Development
The following steps are instructions on downloading, preparing, and running the code under a [Poetry] environment.
You can skip steps 3-5 with a simple `pip install .` call instead, but you miss out on a wide array of benefits.
1. `git clone https://github.com/devine/devine`
2. `cd devine`
3. (optional) `poetry config virtualenvs.in-project true`
4. `poetry install`
5. `poetry run devine --help`
As seen in Step 5, running the `devine` executable is somewhat different to a normal PIP installation.
See [Poetry's Docs] on various ways of making calls under the virtual-environment.
[Poetry]: <https://python-poetry.org>
[Poetry's Docs]: <https://python-poetry.org/docs/basic-usage/#using-your-virtual-environment>
## End User License Agreement
Devine and it's community pages should be treated with the same kindness as other projects.
@ -266,29 +330,27 @@ Please refrain from spam or asking for questions that infringe upon a Service's
back immediately.
5. Be kind to one another and do not single anyone out.
## Disclaimer
1. This project requires a valid Google-provisioned Private/Public Keypair and a Device-specific Client Identification
blob; neither of which are included with this project.
2. Public testing provisions are available and provided by Google to use for testing projects such as this one.
3. License Servers have the ability to block requests from any provision, and are likely already blocking test provisions
on production endpoints. Therefore, have the ability to block the usage of Devine by themselves.
4. This project does not condone piracy or any action against the terms of the Service or DRM system.
5. All efforts in this project have been the result of Reverse-Engineering and Publicly available research.
## Credit
- Widevine Icon © Google.
- The awesome community for their shared research and insight into the Widevine Protocol and Key Derivation.
## Contributors
<a href="https://github.com/rlaphoenix"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/17136956?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt=""/></a>
<a href="https://github.com/mnmll"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/22942379?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt=""/></a>
<a href="https://github.com/shirt-dev"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/2660574?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt=""/></a>
<a href="https://github.com/nyuszika7h"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/482367?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt=""/></a>
<a href="https://github.com/bccornfo"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/98013276?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt=""/></a>
<a href="https://github.com/rlaphoenix"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/17136956?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt="rlaphoenix"/></a>
<a href="https://github.com/mnmll"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/22942379?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt="mnmll"/></a>
<a href="https://github.com/shirt-dev"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/2660574?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt="shirt-dev"/></a>
<a href="https://github.com/nyuszika7h"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/482367?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt="nyuszika7h"/></a>
<a href="https://github.com/bccornfo"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/98013276?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt="bccornfo"/></a>
<a href="https://github.com/Arias800"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/24809312?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt="Arias800"/></a>
<a href="https://github.com/varyg1001"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/88599103?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt="varyg1001"/></a>
<a href="https://github.com/Hollander-1908"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/93162595?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt="Hollander-1908"/></a>
<a href="https://github.com/Shivelight"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/20620780?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt="Shivelight"/></a>
<a href="https://github.com/knowhere01"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/113712042?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt="knowhere01"/></a>
<a href="https://github.com/retouching"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/33735357?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt="retouching"/></a>
<a href="https://github.com/pandamoon21"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/33972938?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt="pandamoon21"/></a>
<a href="https://github.com/adbbbb"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/56319336?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt="adbbbb"/></a>
## License
## Licensing
© 2019-2023 rlaphoenix — [GNU General Public License, Version 3.0](LICENSE)
This software is licensed under the terms of [GNU General Public License, Version 3.0](LICENSE).
You can find a copy of the license in the LICENSE file in the root folder.
* * *
© rlaphoenix 2019-2024

71
cliff.toml Normal file
View File

@ -0,0 +1,71 @@
# git-cliff ~ default configuration file
# https://git-cliff.org/docs/configuration
[changelog]
header = """
# Changelog\n
All notable changes to this project will be documented in this file.
This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
Versions [3.0.0] and older use a format based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
but versions thereafter use a custom changelog format using [git-cliff](https://git-cliff.org).\n
"""
body = """
{% if version -%}
## [{{ version | trim_start_matches(pat="v") }}] - {{ timestamp | date(format="%Y-%m-%d") }}
{% else -%}
## [Unreleased]
{% endif -%}
{% for group, commits in commits | group_by(attribute="group") %}
### {{ group | striptags | trim | upper_first }}
{% for commit in commits %}
- {% if commit.scope %}*{{ commit.scope }}*: {% endif %}\
{% if commit.breaking %}[**breaking**] {% endif %}\
{{ commit.message | upper_first }}\
{% endfor %}
{% endfor %}\n
"""
footer = """
{% for release in releases -%}
{% if release.version -%}
{% if release.previous.version -%}
[{{ release.version | trim_start_matches(pat="v") }}]: \
https://github.com/{{ remote.github.owner }}/{{ remote.github.repo }}\
/compare/{{ release.previous.version }}..{{ release.version }}
{% endif -%}
{% else -%}
[unreleased]: https://github.com/{{ remote.github.owner }}/{{ remote.github.repo }}\
/compare/{{ release.previous.version }}..HEAD
{% endif -%}
{% endfor %}
"""
trim = true
postprocessors = [
# { pattern = '<REPO>', replace = "https://github.com/orhun/git-cliff" }, # replace repository URL
]
[git]
conventional_commits = true
filter_unconventional = true
split_commits = false
commit_preprocessors = []
commit_parsers = [
{ message = "^feat", group = "<!-- 0 -->Features" },
{ message = "^fix|revert", group = "<!-- 1 -->Bug Fixes" },
{ message = "^docs", group = "<!-- 2 -->Documentation" },
{ message = "^style", skip = true },
{ message = "^refactor", group = "<!-- 3 -->Changes" },
{ message = "^perf", group = "<!-- 4 -->Performance Improvements" },
{ message = "^test", skip = true },
{ message = "^build", group = "<!-- 5 -->Builds" },
{ message = "^ci", skip = true },
{ message = "^chore", skip = true },
]
protect_breaking_commits = false
filter_commits = false
# tag_pattern = "v[0-9].*"
# skip_tags = ""
# ignore_tags = ""
topo_order = false
sort_commits = "oldest"

View File

@ -1,252 +0,0 @@
import logging
import tkinter.filedialog
from pathlib import Path
from typing import Optional
import click
from ruamel.yaml import YAML
from devine.core.config import Config, config
from devine.core.constants import context_settings
from devine.core.credential import Credential
@click.group(
short_help="Manage cookies and credentials for profiles of services.",
context_settings=context_settings)
@click.pass_context
def auth(ctx: click.Context) -> None:
"""Manage cookies and credentials for profiles of services."""
ctx.obj = logging.getLogger("auth")
@auth.command(
name="list",
short_help="List profiles and their state for a service or all services.",
context_settings=context_settings)
@click.argument("service", type=str, required=False)
@click.pass_context
def list_(ctx: click.Context, service: Optional[str] = None) -> None:
"""
List profiles and their state for a service or all services.
\b
Profile and Service names are case-insensitive.
"""
log = ctx.obj
service_f = service
profiles: dict[str, dict[str, list]] = {}
for cookie_dir in config.directories.cookies.iterdir():
service = cookie_dir.name
profiles[service] = {}
for cookie in cookie_dir.glob("*.txt"):
if cookie.stem not in profiles[service]:
profiles[service][cookie.stem] = ["Cookie"]
for service, credentials in config.credentials.items():
if service not in profiles:
profiles[service] = {}
for profile, credential in credentials.items():
if profile not in profiles[service]:
profiles[service][profile] = []
profiles[service][profile].append("Credential")
for service, profiles in profiles.items():
if service_f and service != service_f.upper():
continue
log.info(service)
for profile, authorizations in profiles.items():
log.info(f' "{profile}": {", ".join(authorizations)}')
@auth.command(
short_help="View profile cookies and credentials for a service.",
context_settings=context_settings)
@click.argument("profile", type=str)
@click.argument("service", type=str)
@click.pass_context
def view(ctx: click.Context, profile: str, service: str) -> None:
"""
View profile cookies and credentials for a service.
\b
Profile and Service names are case-sensitive.
"""
log = ctx.obj
service_f = service
profile_f = profile
found = False
for cookie_dir in config.directories.cookies.iterdir():
if cookie_dir.name == service_f:
for cookie in cookie_dir.glob("*.txt"):
if cookie.stem == profile_f:
log.info(f"Cookie: {cookie}")
log.debug(cookie.read_text(encoding="utf8").strip())
found = True
break
for service, credentials in config.credentials.items():
if service == service_f:
for profile, credential in credentials.items():
if profile == profile_f:
log.info(f"Credential: {':'.join(list(credential))}")
found = True
break
if not found:
raise click.ClickException(
f"Could not find Profile '{profile_f}' for Service '{service_f}'."
f"\nThe profile and service values are case-sensitive."
)
@auth.command(
short_help="Check what profile is used by services.",
context_settings=context_settings)
@click.argument("service", type=str, required=False)
@click.pass_context
def status(ctx: click.Context, service: Optional[str] = None) -> None:
"""
Check what profile is used by services.
\b
Service names are case-sensitive.
"""
log = ctx.obj
found_profile = False
for service_, profile in config.profiles.items():
if not service or service_.upper() == service.upper():
log.info(f"{service_}: {profile or '--'}")
found_profile = True
if not found_profile:
log.info(f"No profile has been explicitly set for {service}")
default = config.profiles.get("default", "not set")
log.info(f"The default profile is {default}")
@auth.command(
short_help="Delete a profile and all of its authorization from a service.",
context_settings=context_settings)
@click.argument("profile", type=str)
@click.argument("service", type=str)
@click.option("--cookie", is_flag=True, default=False, help="Only delete the cookie.")
@click.option("--credential", is_flag=True, default=False, help="Only delete the credential.")
@click.pass_context
def delete(ctx: click.Context, profile: str, service: str, cookie: bool, credential: bool):
"""
Delete a profile and all of its authorization from a service.
\b
By default this does remove both Cookies and Credentials.
You may remove only one of them with --cookie or --credential.
\b
Profile and Service names are case-sensitive.
Comments may be removed from config!
"""
log = ctx.obj
service_f = service
profile_f = profile
found = False
if not credential:
for cookie_dir in config.directories.cookies.iterdir():
if cookie_dir.name == service_f:
for cookie_ in cookie_dir.glob("*.txt"):
if cookie_.stem == profile_f:
cookie_.unlink()
log.info(f"Deleted Cookie: {cookie_}")
found = True
break
if not cookie:
for key, credentials in config.credentials.items():
if key == service_f:
for profile, credential_ in credentials.items():
if profile == profile_f:
config_path = Config._Directories.user_configs / Config._Filenames.root_config
yaml, data = YAML(), None
yaml.default_flow_style = False
data = yaml.load(config_path)
del data["credentials"][key][profile_f]
yaml.dump(data, config_path)
log.info(f"Deleted Credential: {credential_}")
found = True
break
if not found:
raise click.ClickException(
f"Could not find Profile '{profile_f}' for Service '{service_f}'."
f"\nThe profile and service values are case-sensitive."
)
@auth.command(
short_help="Add a Credential and/or Cookies to an existing or new profile for a service.",
context_settings=context_settings)
@click.argument("profile", type=str)
@click.argument("service", type=str)
@click.option("--cookie", type=str, default=None, help="Direct path to Cookies to add.")
@click.option("--credential", type=str, default=None, help="Direct Credential string to add.")
@click.pass_context
def add(ctx: click.Context, profile: str, service: str, cookie: Optional[str] = None, credential: Optional[str] = None):
"""
Add a Credential and/or Cookies to an existing or new profile for a service.
\b
Cancel the Open File dialogue when presented if you do not wish to provide
cookies. The Credential should be in `Username:Password` form. The username
may be an email. If you do not wish to add a Credential, just hit enter.
\b
Profile and Service names are case-sensitive!
Comments may be removed from config!
"""
log = ctx.obj
service = service.upper()
profile = profile.lower()
if cookie:
cookie = Path(cookie)
else:
print("Opening File Dialogue, select a Cookie file to import.")
cookie = tkinter.filedialog.askopenfilename(
title="Select a Cookie file (Cancel to skip)",
filetypes=[("Cookies", "*.txt"), ("All files", "*.*")]
)
if cookie:
cookie = Path(cookie)
else:
log.info("Skipped adding a Cookie...")
if credential:
try:
credential = Credential.loads(credential)
except ValueError as e:
raise click.ClickException(str(e))
else:
credential = input("Credential: ")
if credential:
try:
credential = Credential.loads(credential)
except ValueError as e:
raise click.ClickException(str(e))
else:
log.info("Skipped adding a Credential...")
if cookie:
cookie = cookie.rename((config.directories.cookies / service / profile).with_suffix(".txt"))
log.info(f"Moved Cookie file to: {cookie}")
if credential:
config_path = Config._Directories.user_configs / Config._Filenames.root_config
yaml, data = YAML(), None
yaml.default_flow_style = False
data = yaml.load(config_path)
data["credentials"][service][profile] = credential.dumps()
yaml.dump(data, config_path)
log.info(f"Added Credential: {credential}")

View File

@ -5,7 +5,7 @@ import sys
import click
from ruamel.yaml import YAML
from devine.core.config import config
from devine.core.config import config, get_config_path
from devine.core.constants import context_settings
@ -36,15 +36,19 @@ def cfg(ctx: click.Context, key: str, value: str, unset: bool, list_: bool) -> N
log = logging.getLogger("cfg")
config_path = config.directories.user_configs / config.filenames.root_config
yaml, data = YAML(), None
yaml.default_flow_style = False
if config_path.is_file():
config_path = get_config_path() or config.directories.user_configs / config.filenames.root_config
if config_path.exists():
data = yaml.load(config_path)
if not data:
log.warning(f"{config_path} has no configuration data, yet")
log.warning("No config file was found or it has no data, yet")
# yaml.load() returns `None` if the input data is blank instead of a usable object
# force a usable object by making one and removing the only item within it
data = yaml.load("""__TEMP__: null""")
del data["__TEMP__"]
if list_:
yaml.dump(data, sys.stdout)

File diff suppressed because it is too large Load Diff

View File

@ -1,10 +1,17 @@
import logging
import os
import shutil
import sys
from pathlib import Path
from typing import Optional
import click
from rich.padding import Padding
from rich.table import Table
from rich.tree import Tree
from devine.core.config import config
from devine.core.config import POSSIBLE_CONFIG_PATHS, config, config_path
from devine.core.console import console
from devine.core.constants import context_settings
from devine.core.services import Services
@ -18,13 +25,42 @@ def env() -> None:
def info() -> None:
"""Displays information about the current environment."""
log = logging.getLogger("env")
log.info(f"[Root Config] : {config.directories.user_configs / config.filenames.root_config}")
log.info(f"[Cookies] : {config.directories.cookies}")
log.info(f"[WVDs] : {config.directories.wvds}")
log.info(f"[Cache] : {config.directories.cache}")
log.info(f"[Logs] : {config.directories.logs}")
log.info(f"[Temp Files] : {config.directories.temp}")
log.info(f"[Downloads] : {config.directories.downloads}")
if config_path:
log.info(f"Config loaded from {config_path}")
else:
tree = Tree("No config file found, you can use any of the following locations:")
for i, path in enumerate(POSSIBLE_CONFIG_PATHS, start=1):
tree.add(f"[repr.number]{i}.[/] [text2]{path.resolve()}[/]")
console.print(Padding(
tree,
(0, 5)
))
table = Table(title="Directories", expand=True)
table.add_column("Name", no_wrap=True)
table.add_column("Path")
path_vars = {
x: Path(os.getenv(x))
for x in ("TEMP", "APPDATA", "LOCALAPPDATA", "USERPROFILE")
if sys.platform == "win32" and os.getenv(x)
}
for name in sorted(dir(config.directories)):
if name.startswith("__") or name == "app_dirs":
continue
path = getattr(config.directories, name).resolve()
for var, var_path in path_vars.items():
if path.is_relative_to(var_path):
path = rf"%{var}%\{path.relative_to(var_path)}"
break
table.add_row(name.title(), str(path))
console.print(Padding(
table,
(1, 5)
))
@env.group(name="clear", short_help="Clear an environment directory.", context_settings=context_settings)

View File

@ -1,5 +1,3 @@
from __future__ import annotations
import logging
import re
from pathlib import Path
@ -7,10 +5,10 @@ from typing import Optional
import click
from devine.core.vault import Vault
from devine.core.config import config
from devine.core.constants import context_settings
from devine.core.services import Services
from devine.core.vault import Vault
from devine.core.vaults import Vaults
@ -89,7 +87,7 @@ def copy(to_vault: str, from_vaults: list[str], service: Optional[str] = None) -
log.info(f"Adding {total_count} Content Keys to {to_vault} for {service_}")
try:
added = to_vault.add_keys(service_, content_keys, commit=True)
added = to_vault.add_keys(service_, content_keys)
except PermissionError:
log.warning(f" - No permission to create table ({service_}) in {to_vault}, skipping...")
continue
@ -171,7 +169,7 @@ def add(file: Path, service: str, vaults: list[str]) -> None:
for vault in vaults_:
log.info(f"Adding {total_count} Content Keys to {vault}")
added_count = vault.add_keys(service, kid_keys, commit=True)
added_count = vault.add_keys(service, kid_keys)
existed_count = total_count - added_count
log.info(f"{vault}: {added_count} newly added, {existed_count} already existed (skipped)")

166
devine/commands/search.py Normal file
View File

@ -0,0 +1,166 @@
from __future__ import annotations
import logging
import re
import sys
from typing import Any, Optional
import click
import yaml
from rich.padding import Padding
from rich.rule import Rule
from rich.tree import Tree
from devine.commands.dl import dl
from devine.core import binaries
from devine.core.config import config
from devine.core.console import console
from devine.core.constants import context_settings
from devine.core.proxies import Basic, Hola, NordVPN
from devine.core.service import Service
from devine.core.services import Services
from devine.core.utils.click_types import ContextData
from devine.core.utils.collections import merge_dict
@click.command(
short_help="Search for titles from a Service.",
cls=Services,
context_settings=dict(
**context_settings,
token_normalize_func=Services.get_tag
))
@click.option("-p", "--profile", type=str, default=None,
help="Profile to use for Credentials and Cookies (if available).")
@click.option("--proxy", type=str, default=None,
help="Proxy URI to use. If a 2-letter country is provided, it will try get a proxy from the config.")
@click.option("--no-proxy", is_flag=True, default=False,
help="Force disable all proxy use.")
@click.pass_context
def search(
ctx: click.Context,
no_proxy: bool,
profile: Optional[str] = None,
proxy: Optional[str] = None
):
if not ctx.invoked_subcommand:
raise ValueError("A subcommand to invoke was not specified, the main code cannot continue.")
log = logging.getLogger("search")
service = Services.get_tag(ctx.invoked_subcommand)
profile = profile
if profile:
log.info(f"Using profile: '{profile}'")
with console.status("Loading Service Config...", spinner="dots"):
service_config_path = Services.get_path(service) / config.filenames.config
if service_config_path.exists():
service_config = yaml.safe_load(service_config_path.read_text(encoding="utf8"))
log.info("Service Config loaded")
else:
service_config = {}
merge_dict(config.services.get(service), service_config)
proxy_providers = []
if no_proxy:
ctx.params["proxy"] = None
else:
with console.status("Loading Proxy Providers...", spinner="dots"):
if config.proxy_providers.get("basic"):
proxy_providers.append(Basic(**config.proxy_providers["basic"]))
if config.proxy_providers.get("nordvpn"):
proxy_providers.append(NordVPN(**config.proxy_providers["nordvpn"]))
if binaries.HolaProxy:
proxy_providers.append(Hola())
for proxy_provider in proxy_providers:
log.info(f"Loaded {proxy_provider.__class__.__name__}: {proxy_provider}")
if proxy:
requested_provider = None
if re.match(r"^[a-z]+:.+$", proxy, re.IGNORECASE):
# requesting proxy from a specific proxy provider
requested_provider, proxy = proxy.split(":", maxsplit=1)
if re.match(r"^[a-z]{2}(?:\d+)?$", proxy, re.IGNORECASE):
proxy = proxy.lower()
with console.status(f"Getting a Proxy to {proxy}...", spinner="dots"):
if requested_provider:
proxy_provider = next((
x
for x in proxy_providers
if x.__class__.__name__.lower() == requested_provider
), None)
if not proxy_provider:
log.error(f"The proxy provider '{requested_provider}' was not recognised.")
sys.exit(1)
proxy_uri = proxy_provider.get_proxy(proxy)
if not proxy_uri:
log.error(f"The proxy provider {requested_provider} had no proxy for {proxy}")
sys.exit(1)
proxy = ctx.params["proxy"] = proxy_uri
log.info(f"Using {proxy_provider.__class__.__name__} Proxy: {proxy}")
else:
for proxy_provider in proxy_providers:
proxy_uri = proxy_provider.get_proxy(proxy)
if proxy_uri:
proxy = ctx.params["proxy"] = proxy_uri
log.info(f"Using {proxy_provider.__class__.__name__} Proxy: {proxy}")
break
else:
log.info(f"Using explicit Proxy: {proxy}")
ctx.obj = ContextData(
config=service_config,
cdm=None,
proxy_providers=proxy_providers,
profile=profile
)
@search.result_callback()
def result(service: Service, profile: Optional[str] = None, **_: Any) -> None:
log = logging.getLogger("search")
service_tag = service.__class__.__name__
with console.status("Authenticating with Service...", spinner="dots"):
cookies = dl.get_cookie_jar(service_tag, profile)
credential = dl.get_credentials(service_tag, profile)
service.authenticate(cookies, credential)
if cookies or credential:
log.info("Authenticated with Service")
search_results = Tree("Search Results", hide_root=True)
with console.status("Searching...", spinner="dots"):
for result in service.search():
result_text = f"[bold text]{result.title}[/]"
if result.url:
result_text = f"[link={result.url}]{result_text}[/link]"
if result.label:
result_text += f" [pink]{result.label}[/]"
if result.description:
result_text += f"\n[text2]{result.description}[/]"
result_text += f"\n[bright_black]id: {result.id}[/]"
search_results.add(result_text + "\n")
# update cookies
cookie_file = dl.get_cookie_path(service_tag, profile)
if cookie_file:
dl.save_cookies(cookie_file, service.session.cookies)
console.print(Padding(
Rule(f"[rule.text]{len(search_results.children)} Search Results"),
(1, 2)
))
if search_results.children:
console.print(Padding(
search_results,
(0, 5)
))
else:
console.print(Padding(
"[bold text]No matches[/]\n[bright_black]Please check spelling and search again....[/]",
(0, 5)
))

View File

@ -2,9 +2,9 @@ import subprocess
import click
from devine.core import binaries
from devine.core.config import config
from devine.core.constants import context_settings
from devine.core.utilities import get_binary_path
@click.command(
@ -29,11 +29,10 @@ def serve(host: str, port: int, caddy: bool) -> None:
from pywidevine import serve
if caddy:
executable = get_binary_path("caddy")
if not executable:
if not binaries.Caddy:
raise click.ClickException("Caddy executable \"caddy\" not found but is required for --caddy.")
caddy_p = subprocess.Popen([
executable,
binaries.Caddy,
"run",
"--config", str(config.directories.user_configs / "Caddyfile")
])

View File

@ -4,8 +4,8 @@ from pathlib import Path
import click
from pymediainfo import MediaInfo
from devine.core import binaries
from devine.core.constants import context_settings
from devine.core.utilities import get_binary_path
@click.group(short_help="Various helper scripts and programs.", context_settings=context_settings)
@ -38,8 +38,7 @@ def crop(path: Path, aspect: str, letter: bool, offset: int, preview: bool) -> N
as it may go from being 2px away from a perfect crop, to 20px over-cropping
again due to sub-sampled chroma.
"""
executable = get_binary_path("ffmpeg")
if not executable:
if not binaries.FFMPEG:
raise click.ClickException("FFmpeg executable \"ffmpeg\" not found but is required.")
if path.is_dir():
@ -74,19 +73,20 @@ def crop(path: Path, aspect: str, letter: bool, offset: int, preview: bool) -> N
if preview:
out_path = ["-f", "mpegts", "-"] # pipe
else:
out_path = [str(video_path.with_stem(".".join(filter(bool, [
out_path = [str(video_path.with_name(".".join(filter(bool, [
video_path.stem,
video_track.language,
"crop",
str(offset or "")
]))).with_suffix({
# ffmpeg's MKV muxer does not yet support HDR
"HEVC": ".h265",
"AVC": ".h264"
}.get(video_track.commercial_name, ".mp4")))]
str(offset or ""),
{
# ffmpeg's MKV muxer does not yet support HDR
"HEVC": "h265",
"AVC": "h264"
}.get(video_track.commercial_name, ".mp4")
]))))]
ffmpeg_call = subprocess.Popen([
executable, "-y",
binaries.FFMPEG, "-y",
"-i", str(video_path),
"-map", "0:v:0",
"-c", "copy",
@ -94,7 +94,7 @@ def crop(path: Path, aspect: str, letter: bool, offset: int, preview: bool) -> N
] + out_path, stdout=subprocess.PIPE)
try:
if preview:
previewer = get_binary_path("mpv", "ffplay")
previewer = binaries.MPV or binaries.FFPlay
if not previewer:
raise click.ClickException("MPV/FFplay executables weren't found but are required for previewing.")
subprocess.Popen((previewer, "-"), stdin=ffmpeg_call.stdout)
@ -102,3 +102,123 @@ def crop(path: Path, aspect: str, letter: bool, offset: int, preview: bool) -> N
if ffmpeg_call.stdout:
ffmpeg_call.stdout.close()
ffmpeg_call.wait()
@util.command(name="range")
@click.argument("path", type=Path)
@click.option("--full/--limited", is_flag=True,
help="Full: 0..255, Limited: 16..235 (16..240 YUV luma)")
@click.option("-p", "--preview", is_flag=True, default=False,
help="Instantly preview the newly-set video range in MPV (or ffplay if mpv is unavailable).")
def range_(path: Path, full: bool, preview: bool) -> None:
"""
Losslessly set the Video Range flag to full or limited at the bit-stream level.
You may provide a path to a file, or a folder of mkv and/or mp4 files.
If you ever notice blacks not being quite black, and whites not being quite white,
then you're video may have the range set to the wrong value. Flip its range to the
opposite value and see if that fixes it.
"""
if not binaries.FFMPEG:
raise click.ClickException("FFmpeg executable \"ffmpeg\" not found but is required.")
if path.is_dir():
paths = list(path.glob("*.mkv")) + list(path.glob("*.mp4"))
else:
paths = [path]
for video_path in paths:
try:
video_track = next(iter(MediaInfo.parse(video_path).video_tracks or []))
except StopIteration:
raise click.ClickException("There's no video tracks in the provided file.")
metadata_key = {
"HEVC": "hevc_metadata",
"AVC": "h264_metadata"
}.get(video_track.commercial_name)
if not metadata_key:
raise click.ClickException(f"{video_track.commercial_name} Codec not supported.")
if preview:
out_path = ["-f", "mpegts", "-"] # pipe
else:
out_path = [str(video_path.with_name(".".join(filter(bool, [
video_path.stem,
video_track.language,
"range",
["limited", "full"][full],
{
# ffmpeg's MKV muxer does not yet support HDR
"HEVC": "h265",
"AVC": "h264"
}.get(video_track.commercial_name, ".mp4")
]))))]
ffmpeg_call = subprocess.Popen([
binaries.FFMPEG, "-y",
"-i", str(video_path),
"-map", "0:v:0",
"-c", "copy",
"-bsf:v", f"{metadata_key}=video_full_range_flag={int(full)}"
] + out_path, stdout=subprocess.PIPE)
try:
if preview:
previewer = binaries.MPV or binaries.FFPlay
if not previewer:
raise click.ClickException("MPV/FFplay executables weren't found but are required for previewing.")
subprocess.Popen((previewer, "-"), stdin=ffmpeg_call.stdout)
finally:
if ffmpeg_call.stdout:
ffmpeg_call.stdout.close()
ffmpeg_call.wait()
@util.command()
@click.argument("path", type=Path)
@click.option("-m", "--map", "map_", type=str, default="0",
help="Test specific streams by setting FFmpeg's -map parameter.")
def test(path: Path, map_: str) -> None:
"""
Decode an entire video and check for any corruptions or errors using FFmpeg.
You may provide a path to a file, or a folder of mkv and/or mp4 files.
Tests all streams within the file by default. Subtitles cannot be tested.
You may choose specific streams using the -m/--map parameter. E.g.,
'0:v:0' to test the first video stream, or '0:a' to test all audio streams.
"""
if not binaries.FFMPEG:
raise click.ClickException("FFmpeg executable \"ffmpeg\" not found but is required.")
if path.is_dir():
paths = list(path.glob("*.mkv")) + list(path.glob("*.mp4"))
else:
paths = [path]
for video_path in paths:
print("Starting...")
p = subprocess.Popen([
binaries.FFMPEG, "-hide_banner",
"-benchmark",
"-i", str(video_path),
"-map", map_,
"-sn",
"-f", "null",
"-"
], stderr=subprocess.PIPE, universal_newlines=True)
reached_output = False
errors = 0
for line in p.stderr:
line = line.strip()
if "speed=" in line:
reached_output = True
if not reached_output:
continue
if line.startswith("["): # error of some kind
errors += 1
stream, error = line.split("] ", maxsplit=1)
stream = stream.split(" @ ")[0]
line = f"{stream} ERROR: {error}"
print(line)
p.stderr.close()
print(f"Finished with {errors} Errors, Cleaning up...")
p.terminate()
p.wait()

View File

@ -1,17 +1,18 @@
from __future__ import annotations
import logging
import shutil
from pathlib import Path
from typing import Optional
import click
import yaml
from google.protobuf.json_format import MessageToDict
from pywidevine.device import Device
from pywidevine.device import Device, DeviceTypes
from pywidevine.license_protocol_pb2 import FileHashes
from rich.prompt import Prompt
from unidecode import UnidecodeError, unidecode
from devine.core.config import config
from devine.core.console import console
from devine.core.constants import context_settings
@ -22,6 +23,51 @@ def wvd() -> None:
"""Manage configuration and creation of WVD (Widevine Device) files."""
@wvd.command()
@click.argument("paths", type=Path, nargs=-1)
def add(paths: list[Path]) -> None:
"""Add one or more WVD (Widevine Device) files to the WVDs Directory."""
log = logging.getLogger("wvd")
for path in paths:
dst_path = config.directories.wvds / path.name
if not path.exists():
log.error(f"The WVD path '{path}' does not exist...")
elif dst_path.exists():
log.error(f"WVD named '{path.stem}' already exists...")
else:
# TODO: Check for and log errors
_ = Device.load(path) # test if WVD is valid
dst_path.parent.mkdir(parents=True, exist_ok=True)
shutil.move(path, dst_path)
log.info(f"Added {path.stem}")
@wvd.command()
@click.argument("names", type=str, nargs=-1)
def delete(names: list[str]) -> None:
"""Delete one or more WVD (Widevine Device) files from the WVDs Directory."""
log = logging.getLogger("wvd")
for name in names:
path = (config.directories.wvds / name).with_suffix(".wvd")
if not path.exists():
log.error(f"No WVD file exists by the name '{name}'...")
continue
answer = Prompt.ask(
f"[red]Deleting '{name}'[/], are you sure you want to continue?",
choices=["y", "n"],
default="n",
console=console
)
if answer == "n":
log.info("Aborting...")
continue
Path.unlink(path)
log.info(f"Deleted {name}")
@wvd.command()
@click.argument("path", type=Path)
def parse(path: Path) -> None:
@ -38,6 +84,10 @@ def parse(path: Path) -> None:
log = logging.getLogger("wvd")
if not path.exists():
console.log(f"[bright_blue]{path.absolute()}[/] does not exist...")
return
device = Device.load(path)
log.info(f"System ID: {device.system_id}")
@ -70,9 +120,23 @@ def dump(wvd_paths: list[Path], out_dir: Path) -> None:
If the path is relative, with no file extension, it will dump the WVD in the WVDs
directory.
"""
if wvd_paths == (Path(""),):
wvd_paths = list(config.directories.wvds.iterdir())
for wvd_path, out_path in zip(wvd_paths, (out_dir / x.stem for x in wvd_paths)):
log = logging.getLogger("wvd")
if wvd_paths == ():
if not config.directories.wvds.exists():
console.log(f"[bright_blue]{config.directories.wvds.absolute()}[/] does not exist...")
wvd_paths = list(
x
for x in config.directories.wvds.iterdir()
if x.is_file() and x.suffix.lower() == ".wvd"
)
if not wvd_paths:
console.log(f"[bright_blue]{config.directories.wvds.absolute()}[/] is empty...")
for i, (wvd_path, out_path) in enumerate(zip(wvd_paths, (out_dir / x.stem for x in wvd_paths))):
if i > 0:
log.info("")
try:
named = not wvd_path.suffix and wvd_path.relative_to(Path(""))
except ValueError:
@ -81,10 +145,9 @@ def dump(wvd_paths: list[Path], out_dir: Path) -> None:
wvd_path = config.directories.wvds / f"{wvd_path.stem}.wvd"
out_path.mkdir(parents=True, exist_ok=True)
log.info(f"Dumping: {wvd_path}")
device = Device.load(wvd_path)
log = logging.getLogger("wvd")
log.info(f"Dumping: {wvd_path}")
log.info(f"L{device.security_level} {device.system_id} {device.type.name}")
log.info(f"Saving to: {out_path}")
@ -137,7 +200,7 @@ def dump(wvd_paths: list[Path], out_dir: Path) -> None:
@click.argument("private_key", type=Path)
@click.argument("client_id", type=Path)
@click.argument("file_hashes", type=Path, required=False)
@click.option("-t", "--type", "type_", type=click.Choice([x.name for x in Device.Types], case_sensitive=False),
@click.option("-t", "--type", "type_", type=click.Choice([x.name for x in DeviceTypes], case_sensitive=False),
default="Android", help="Device Type")
@click.option("-l", "--level", type=click.IntRange(1, 3), default=1, help="Device Security Level")
@click.option("-o", "--output", type=Path, default=None, help="Output Directory")
@ -178,7 +241,7 @@ def new(
raise click.UsageError("file_hashes: Not a path to a file, or it doesn't exist.", ctx)
device = Device(
type_=Device.Types[type_.upper()],
type_=DeviceTypes[type_.upper()],
security_level=level,
flags=None,
private_key=private_key.read_bytes(),
@ -195,18 +258,19 @@ def new(
log.info(f"Created binary WVD file, {out_path.name}")
log.info(f" + Saved to: {out_path.absolute()}")
log.info(f" + System ID: {device.system_id}")
log.info(f" + Security Level: {device.security_level}")
log.info(f" + Type: {device.type}")
log.info(f" + Flags: {device.flags}")
log.info(f" + Private Key: {bool(device.private_key)}")
log.info(f" + Client ID: {bool(device.client_id)}")
log.info(f" + VMP: {bool(device.client_id.vmp_data)}")
log.debug("Client ID:")
log.debug(device.client_id)
log.info(f"System ID: {device.system_id}")
log.info(f"Security Level: {device.security_level}")
log.info(f"Type: {device.type}")
log.info(f"Flags: {device.flags}")
log.info(f"Private Key: {bool(device.private_key)}")
log.info(f"Client ID: {bool(device.client_id)}")
log.info(f"VMP: {bool(device.client_id.vmp_data)}")
log.debug("VMP:")
log.info("Client ID:")
log.info(device.client_id)
log.info("VMP:")
if device.client_id.vmp_data:
file_hashes = FileHashes()
file_hashes.ParseFromString(device.client_id.vmp_data)

View File

@ -1 +1 @@
__version__ = "1.0.0"
__version__ = "3.3.3"

View File

@ -1,29 +1,89 @@
import atexit
import logging
from datetime import datetime
from pathlib import Path
import click
import coloredlogs
import urllib3
from rich import traceback
from rich.console import Group
from rich.padding import Padding
from rich.text import Text
from urllib3.exceptions import InsecureRequestWarning
from devine.core import __version__
from devine.core.commands import Commands
from devine.core.constants import context_settings, LOG_FORMAT
from devine.core.config import config
from devine.core.console import ComfyRichHandler, console
from devine.core.constants import context_settings
from devine.core.utilities import rotate_log_file
LOGGING_PATH = None
@click.command(cls=Commands, invoke_without_command=True, context_settings=context_settings)
@click.option("-v", "--version", is_flag=True, default=False, help="Print version information.")
@click.option("-d", "--debug", is_flag=True, default=False, help="Enable DEBUG level logs.")
def main(version: bool, debug: bool) -> None:
"""Devine—Open-Source Movie, TV, and Music Downloading Solution."""
logging.basicConfig(level=logging.DEBUG if debug else logging.INFO)
log = logging.getLogger()
coloredlogs.install(level=log.level, fmt=LOG_FORMAT, style="{")
@click.option("--log", "log_path", type=Path, default=config.directories.logs / config.filenames.log,
help="Log path (or filename). Path can contain the following f-string args: {name} {time}.")
def main(version: bool, debug: bool, log_path: Path) -> None:
"""Devine—Modular Movie, TV, and Music Archival Software."""
logging.basicConfig(
level=logging.DEBUG if debug else logging.INFO,
format="%(message)s",
handlers=[ComfyRichHandler(
show_time=False,
show_path=debug,
console=console,
rich_tracebacks=True,
tracebacks_suppress=[click],
log_renderer=console._log_render # noqa
)]
)
if log_path:
global LOGGING_PATH
console.record = True
new_log_path = rotate_log_file(log_path)
LOGGING_PATH = new_log_path
urllib3.disable_warnings(InsecureRequestWarning)
traceback.install(
console=console,
width=80,
suppress=[click]
)
console.print(
Padding(
Group(
Text(
r" / __ \/ ____/ | / / _/ | / / ____/" + "\n"
r" / / / / __/ | | / // // |/ / __/ " + "\n"
r" / /_/ / /___ | |/ // // /| / /___ " + "\n"
r"/_____/_____/ |___/___/_/ |_/_____/ ",
style="ascii.art"
),
f"v[repr.number]{__version__}[/] Copyright © 2019-{datetime.now().year} rlaphoenix",
" [bright_blue]https://github.com/devine-dl/devine[/]"
),
(1, 21, 1, 20),
expand=True
),
justify="left"
)
log.info(f"Devine version {__version__} Copyright (c) 2019-{datetime.now().year} rlaphoenix")
log.info("Convenient Widevine-DRM Downloader and Decrypter.")
log.info("https://github.com/devine/devine")
if version:
return
@atexit.register
def save_log():
if console.record and LOGGING_PATH:
# TODO: Currently semi-bust. Everything that refreshes gets duplicated.
console.save_text(LOGGING_PATH)
if __name__ == "__main__":
main()

46
devine/core/binaries.py Normal file
View File

@ -0,0 +1,46 @@
import shutil
import sys
from pathlib import Path
from typing import Optional
__shaka_platform = {
"win32": "win",
"darwin": "osx"
}.get(sys.platform, sys.platform)
def find(*names: str) -> Optional[Path]:
"""Find the path of the first found binary name."""
for name in names:
path = shutil.which(name)
if path:
return Path(path)
return None
FFMPEG = find("ffmpeg")
FFProbe = find("ffprobe")
FFPlay = find("ffplay")
SubtitleEdit = find("SubtitleEdit")
ShakaPackager = find(
"shaka-packager",
"packager",
f"packager-{__shaka_platform}",
f"packager-{__shaka_platform}-arm64",
f"packager-{__shaka_platform}-x64"
)
Aria2 = find("aria2c", "aria2")
CCExtractor = find(
"ccextractor",
"ccextractorwin",
"ccextractorwinfull"
)
HolaProxy = find("hola-proxy")
MPV = find("mpv")
Caddy = find("caddy")
__all__ = (
"FFMPEG", "FFProbe", "FFPlay", "SubtitleEdit", "ShakaPackager",
"Aria2", "CCExtractor", "HolaProxy", "MPV", "Caddy", "find"
)

View File

@ -4,14 +4,13 @@ import zlib
from datetime import datetime, timedelta
from os import stat_result
from pathlib import Path
from typing import Optional, Any, Union
from typing import Any, Optional, Union
import jsonpickle
import jwt
from devine.core.config import config
EXP_T = Union[datetime, str, int, float]
@ -47,7 +46,7 @@ class Cacher:
@property
def expired(self) -> bool:
return self.expiration and self.expiration < datetime.utcnow()
return self.expiration and self.expiration < datetime.now()
def get(self, key: str, version: int = 1) -> Cacher:
"""
@ -151,6 +150,8 @@ class Cacher:
except ValueError:
timestamp = float(timestamp)
try:
if len(str(int(timestamp))) == 13: # JS-style timestamp
timestamp /= 1000
timestamp = datetime.fromtimestamp(timestamp)
except ValueError:
raise ValueError(f"Unrecognized Timestamp value {timestamp!r}")

View File

@ -1,5 +1,3 @@
from __future__ import annotations
from typing import Optional
import click
@ -42,4 +40,4 @@ class Commands(click.MultiCommand):
# Hide direct access to commands from quick import form, they shouldn't be accessed directly
__ALL__ = (Commands,)
__all__ = ("Commands",)

View File

@ -2,7 +2,7 @@ from __future__ import annotations
import tempfile
from pathlib import Path
from typing import Any
from typing import Any, Optional
import yaml
from appdirs import AppDirs
@ -17,6 +17,7 @@ class Config:
commands = namespace_dir / "commands"
services = namespace_dir / "services"
vaults = namespace_dir / "vaults"
fonts = namespace_dir / "fonts"
user_configs = Path(app_dirs.user_config_dir)
data = Path(app_dirs.user_data_dir)
downloads = Path.home() / "Downloads" / "devine"
@ -39,6 +40,8 @@ class Config:
self.dl: dict = kwargs.get("dl") or {}
self.aria2c: dict = kwargs.get("aria2c") or {}
self.cdm: dict = kwargs.get("cdm") or {}
self.chapter_fallback_name: str = kwargs.get("chapter_fallback_name") or ""
self.curl_impersonate: dict = kwargs.get("curl_impersonate") or {}
self.remote_cdm: list[dict] = kwargs.get("remote_cdm") or []
self.credentials: dict = kwargs.get("credentials") or {}
@ -49,19 +52,20 @@ class Config:
continue
setattr(self.directories, name, Path(path).expanduser())
self.downloader = kwargs.get("downloader") or "requests"
self.filenames = self._Filenames()
for name, filename in (kwargs.get("filenames") or {}).items():
setattr(self.filenames, name, filename)
self.headers: dict = kwargs.get("headers") or {}
self.key_vaults: list[dict[str, Any]] = kwargs.get("key_vaults")
self.key_vaults: list[dict[str, Any]] = kwargs.get("key_vaults", [])
self.muxing: dict = kwargs.get("muxing") or {}
self.nordvpn: dict = kwargs.get("nordvpn") or {}
self.profiles: dict = kwargs.get("profiles") or {}
self.proxies: dict = kwargs.get("proxies") or {}
self.proxy_providers: dict = kwargs.get("proxy_providers") or {}
self.serve: dict = kwargs.get("serve") or {}
self.services: dict = kwargs.get("services") or {}
self.set_terminal_bg: bool = kwargs.get("set_terminal_bg", True)
self.tag: str = kwargs.get("tag") or ""
@classmethod
@ -70,10 +74,36 @@ class Config:
raise FileNotFoundError(f"Config file path ({path}) was not found")
if not path.is_file():
raise FileNotFoundError(f"Config file path ({path}) is not to a file.")
return cls(**yaml.safe_load(path.read_text(encoding="utf8")))
return cls(**yaml.safe_load(path.read_text(encoding="utf8")) or {})
# noinspection PyProtectedMember
config = Config.from_yaml(Config._Directories.user_configs / Config._Filenames.root_config)
POSSIBLE_CONFIG_PATHS = (
# The Devine Namespace Folder (e.g., %appdata%/Python/Python311/site-packages/devine)
Config._Directories.namespace_dir / Config._Filenames.root_config,
# The Parent Folder to the Devine Namespace Folder (e.g., %appdata%/Python/Python311/site-packages)
Config._Directories.namespace_dir.parent / Config._Filenames.root_config,
# The AppDirs User Config Folder (e.g., %localappdata%/devine)
Config._Directories.user_configs / Config._Filenames.root_config
)
__ALL__ = (config,)
def get_config_path() -> Optional[Path]:
"""
Get Path to Config from any one of the possible locations.
Returns None if no config file could be found.
"""
for path in POSSIBLE_CONFIG_PATHS:
if path.exists():
return path
return None
config_path = get_config_path()
if config_path:
config = Config.from_yaml(config_path)
else:
config = Config()
__all__ = ("config",)

365
devine/core/console.py Normal file
View File

@ -0,0 +1,365 @@
import logging
from datetime import datetime
from types import ModuleType
from typing import IO, Callable, Iterable, List, Literal, Mapping, Optional, Union
from rich._log_render import FormatTimeCallable, LogRender
from rich.console import Console, ConsoleRenderable, HighlighterType, RenderableType
from rich.emoji import EmojiVariant
from rich.highlighter import Highlighter, ReprHighlighter
from rich.live import Live
from rich.logging import RichHandler
from rich.padding import Padding, PaddingDimensions
from rich.status import Status
from rich.style import StyleType
from rich.table import Table
from rich.text import Text, TextType
from rich.theme import Theme
from devine.core.config import config
class ComfyLogRenderer(LogRender):
def __call__(
self,
console: "Console",
renderables: Iterable["ConsoleRenderable"],
log_time: Optional[datetime] = None,
time_format: Optional[Union[str, FormatTimeCallable]] = None,
level: TextType = "",
path: Optional[str] = None,
line_no: Optional[int] = None,
link_path: Optional[str] = None,
) -> "Table":
from rich.containers import Renderables
output = Table.grid(padding=(0, 5), pad_edge=True)
output.expand = True
if self.show_time:
output.add_column(style="log.time")
if self.show_level:
output.add_column(style="log.level", width=self.level_width)
output.add_column(ratio=1, style="log.message", overflow="fold")
if self.show_path and path:
output.add_column(style="log.path")
row: List["RenderableType"] = []
if self.show_time:
log_time = log_time or console.get_datetime()
time_format = time_format or self.time_format
if callable(time_format):
log_time_display = time_format(log_time)
else:
log_time_display = Text(log_time.strftime(time_format))
if log_time_display == self._last_time and self.omit_repeated_times:
row.append(Text(" " * len(log_time_display)))
else:
row.append(log_time_display)
self._last_time = log_time_display
if self.show_level:
row.append(level)
row.append(Renderables(renderables))
if self.show_path and path:
path_text = Text()
path_text.append(
path, style=f"link file://{link_path}" if link_path else ""
)
if line_no:
path_text.append(":")
path_text.append(
f"{line_no}",
style=f"link file://{link_path}#{line_no}" if link_path else "",
)
row.append(path_text)
output.add_row(*row)
return output
class ComfyRichHandler(RichHandler):
def __init__(
self,
level: Union[int, str] = logging.NOTSET,
console: Optional[Console] = None,
*,
show_time: bool = True,
omit_repeated_times: bool = True,
show_level: bool = True,
show_path: bool = True,
enable_link_path: bool = True,
highlighter: Optional[Highlighter] = None,
markup: bool = False,
rich_tracebacks: bool = False,
tracebacks_width: Optional[int] = None,
tracebacks_extra_lines: int = 3,
tracebacks_theme: Optional[str] = None,
tracebacks_word_wrap: bool = True,
tracebacks_show_locals: bool = False,
tracebacks_suppress: Iterable[Union[str, ModuleType]] = (),
locals_max_length: int = 10,
locals_max_string: int = 80,
log_time_format: Union[str, FormatTimeCallable] = "[%x %X]",
keywords: Optional[List[str]] = None,
log_renderer: Optional[LogRender] = None
) -> None:
super().__init__(
level=level,
console=console,
show_time=show_time,
omit_repeated_times=omit_repeated_times,
show_level=show_level,
show_path=show_path,
enable_link_path=enable_link_path,
highlighter=highlighter,
markup=markup,
rich_tracebacks=rich_tracebacks,
tracebacks_width=tracebacks_width,
tracebacks_extra_lines=tracebacks_extra_lines,
tracebacks_theme=tracebacks_theme,
tracebacks_word_wrap=tracebacks_word_wrap,
tracebacks_show_locals=tracebacks_show_locals,
tracebacks_suppress=tracebacks_suppress,
locals_max_length=locals_max_length,
locals_max_string=locals_max_string,
log_time_format=log_time_format,
keywords=keywords,
)
if log_renderer:
self._log_render = log_renderer
class ComfyConsole(Console):
"""A comfy high level console interface.
Args:
color_system (str, optional): The color system supported by your terminal,
either ``"standard"``, ``"256"`` or ``"truecolor"``. Leave as ``"auto"`` to autodetect.
force_terminal (Optional[bool], optional): Enable/disable terminal control codes, or None to auto-detect
terminal. Defaults to None.
force_jupyter (Optional[bool], optional): Enable/disable Jupyter rendering, or None to auto-detect Jupyter.
Defaults to None.
force_interactive (Optional[bool], optional): Enable/disable interactive mode, or None to auto-detect.
Defaults to None.
soft_wrap (Optional[bool], optional): Set soft wrap default on print method. Defaults to False.
theme (Theme, optional): An optional style theme object, or ``None`` for default theme.
stderr (bool, optional): Use stderr rather than stdout if ``file`` is not specified. Defaults to False.
file (IO, optional): A file object where the console should write to. Defaults to stdout.
quiet (bool, Optional): Boolean to suppress all output. Defaults to False.
width (int, optional): The width of the terminal. Leave as default to auto-detect width.
height (int, optional): The height of the terminal. Leave as default to auto-detect height.
style (StyleType, optional): Style to apply to all output, or None for no style. Defaults to None.
no_color (Optional[bool], optional): Enabled no color mode, or None to auto-detect. Defaults to None.
tab_size (int, optional): Number of spaces used to replace a tab character. Defaults to 8.
record (bool, optional): Boolean to enable recording of terminal output,
required to call :meth:`export_html`, :meth:`export_svg`, and :meth:`export_text`. Defaults to False.
markup (bool, optional): Boolean to enable :ref:`console_markup`. Defaults to True.
emoji (bool, optional): Enable emoji code. Defaults to True.
emoji_variant (str, optional): Optional emoji variant, either "text" or "emoji". Defaults to None.
highlight (bool, optional): Enable automatic highlighting. Defaults to True.
log_time (bool, optional): Boolean to enable logging of time by :meth:`log` methods. Defaults to True.
log_path (bool, optional): Boolean to enable the logging of the caller by :meth:`log`. Defaults to True.
log_time_format (Union[str, TimeFormatterCallable], optional): If ``log_time`` is enabled, either string for
strftime or callable that formats the time. Defaults to "[%X] ".
highlighter (HighlighterType, optional): Default highlighter.
legacy_windows (bool, optional): Enable legacy Windows mode, or ``None`` to auto-detect. Defaults to ``None``.
safe_box (bool, optional): Restrict box options that don't render on legacy Windows.
get_datetime (Callable[[], datetime], optional): Callable that gets the current time as a datetime.datetime
object (used by Console.log), or None for datetime.now.
get_time (Callable[[], time], optional): Callable that gets the current time in seconds, default uses
time.monotonic.
"""
def __init__(
self,
*,
color_system: Optional[
Literal["auto", "standard", "256", "truecolor", "windows"]
] = "auto",
force_terminal: Optional[bool] = None,
force_jupyter: Optional[bool] = None,
force_interactive: Optional[bool] = None,
soft_wrap: bool = False,
theme: Optional[Theme] = None,
stderr: bool = False,
file: Optional[IO[str]] = None,
quiet: bool = False,
width: Optional[int] = None,
height: Optional[int] = None,
style: Optional[StyleType] = None,
no_color: Optional[bool] = None,
tab_size: int = 8,
record: bool = False,
markup: bool = True,
emoji: bool = True,
emoji_variant: Optional[EmojiVariant] = None,
highlight: bool = True,
log_time: bool = True,
log_path: bool = True,
log_time_format: Union[str, FormatTimeCallable] = "[%X]",
highlighter: Optional["HighlighterType"] = ReprHighlighter(),
legacy_windows: Optional[bool] = None,
safe_box: bool = True,
get_datetime: Optional[Callable[[], datetime]] = None,
get_time: Optional[Callable[[], float]] = None,
_environ: Optional[Mapping[str, str]] = None,
log_renderer: Optional[LogRender] = None
):
super().__init__(
color_system=color_system,
force_terminal=force_terminal,
force_jupyter=force_jupyter,
force_interactive=force_interactive,
soft_wrap=soft_wrap,
theme=theme,
stderr=stderr,
file=file,
quiet=quiet,
width=width,
height=height,
style=style,
no_color=no_color,
tab_size=tab_size,
record=record,
markup=markup,
emoji=emoji,
emoji_variant=emoji_variant,
highlight=highlight,
log_time=log_time,
log_path=log_path,
log_time_format=log_time_format,
highlighter=highlighter,
legacy_windows=legacy_windows,
safe_box=safe_box,
get_datetime=get_datetime,
get_time=get_time,
_environ=_environ,
)
if log_renderer:
self._log_render = log_renderer
def status(
self,
status: RenderableType,
*,
spinner: str = "dots",
spinner_style: str = "status.spinner",
speed: float = 1.0,
refresh_per_second: float = 12.5,
pad: PaddingDimensions = (0, 5)
) -> Union[Live, Status]:
"""Display a comfy status and spinner.
Args:
status (RenderableType): A status renderable (str or Text typically).
spinner (str, optional): Name of spinner animation (see python -m rich.spinner). Defaults to "dots".
spinner_style (StyleType, optional): Style of spinner. Defaults to "status.spinner".
speed (float, optional): Speed factor for spinner animation. Defaults to 1.0.
refresh_per_second (float, optional): Number of refreshes per second. Defaults to 12.5.
pad (Union[int, Tuple[int]]): Padding for top, right, bottom, and left borders.
May be specified with 1, 2, or 4 integers (CSS style).
Returns:
Status: A Status object that may be used as a context manager.
"""
status_renderable = super().status(
status=status,
spinner=spinner,
spinner_style=spinner_style,
speed=speed,
refresh_per_second=refresh_per_second
)
if pad:
top, right, bottom, left = Padding.unpack(pad)
renderable_width = len(status_renderable.status)
spinner_width = len(status_renderable.renderable.text)
status_width = spinner_width + renderable_width
available_width = self.width - status_width
if available_width > right:
# fill up the available width with padding to apply bg color
right = available_width - right
padding = Padding(
status_renderable,
(top, right, bottom, left)
)
return Live(
padding,
console=self,
transient=True
)
return status_renderable
catppuccin_mocha = {
# Colors based on "CatppuccinMocha" from Gogh themes
"bg": "rgb(30,30,46)",
"text": "rgb(205,214,244)",
"text2": "rgb(162,169,193)", # slightly darker
"black": "rgb(69,71,90)",
"bright_black": "rgb(88,91,112)",
"red": "rgb(243,139,168)",
"green": "rgb(166,227,161)",
"yellow": "rgb(249,226,175)",
"blue": "rgb(137,180,250)",
"pink": "rgb(245,194,231)",
"cyan": "rgb(148,226,213)",
"gray": "rgb(166,173,200)",
"bright_gray": "rgb(186,194,222)",
"dark_gray": "rgb(54,54,84)"
}
primary_scheme = catppuccin_mocha
primary_scheme["none"] = primary_scheme["text"]
primary_scheme["grey23"] = primary_scheme["black"]
primary_scheme["magenta"] = primary_scheme["pink"]
primary_scheme["bright_red"] = primary_scheme["red"]
primary_scheme["bright_green"] = primary_scheme["green"]
primary_scheme["bright_yellow"] = primary_scheme["yellow"]
primary_scheme["bright_blue"] = primary_scheme["blue"]
primary_scheme["bright_magenta"] = primary_scheme["pink"]
primary_scheme["bright_cyan"] = primary_scheme["cyan"]
if config.set_terminal_bg:
primary_scheme["none"] += f" on {primary_scheme['bg']}"
custom_colors = {
"ascii.art": primary_scheme["pink"]
}
if config.set_terminal_bg:
custom_colors["ascii.art"] += f" on {primary_scheme['bg']}"
console = ComfyConsole(
log_time=False,
log_path=False,
width=80,
theme=Theme({
"bar.back": primary_scheme["dark_gray"],
"bar.complete": primary_scheme["pink"],
"bar.finished": primary_scheme["green"],
"bar.pulse": primary_scheme["bright_black"],
"black": primary_scheme["black"],
"inspect.async_def": f"italic {primary_scheme['cyan']}",
"progress.data.speed": "dark_orange",
"repr.number": f"bold not italic {primary_scheme['cyan']}",
"repr.number_complex": f"bold not italic {primary_scheme['cyan']}",
"rule.line": primary_scheme["dark_gray"],
"rule.text": primary_scheme["pink"],
"tree.line": primary_scheme["dark_gray"],
"status.spinner": primary_scheme["pink"],
"progress.spinner": primary_scheme["pink"],
**primary_scheme,
**custom_colors
}),
log_renderer=ComfyLogRenderer(
show_time=False,
show_path=False
)
)
__all__ = ("ComfyLogRenderer", "ComfyRichHandler", "ComfyConsole", "console")

View File

@ -1,26 +1,10 @@
import logging
from threading import Event
from typing import TypeVar, Union
DOWNLOAD_CANCELLED = Event()
DOWNLOAD_LICENCE_ONLY = Event()
LOG_FORMAT = "{asctime} [{levelname[0]}] {name} : {message}" # must be '{}' style
LOG_DATE_FORMAT = "%Y-%m-%d %H:%M:%S"
LOG_FORMATTER = logging.Formatter(LOG_FORMAT, LOG_DATE_FORMAT, "{")
DRM_SORT_MAP = ["ClearKey", "Widevine"]
LANGUAGE_MUX_MAP = {
# List of language tags that cannot be used by mkvmerge and need replacements.
# Try get the replacement to be as specific locale-wise as possible.
# A bcp47 as the replacement is recommended.
"cmn": "zh",
"cmn-Hant": "zh-Hant",
"cmn-Hans": "zh-Hans",
"none": "und",
"yue": "zh-yue",
"yue-Hant": "zh-yue-Hant",
"yue-Hans": "zh-yue-Hans"
}
TERRITORY_MAP = {
"Hong Kong SAR China": "Hong Kong"
}
LANGUAGE_MAX_DISTANCE = 5 # this is max to be considered "same", e.g., en, en-US, en-AU
VIDEO_CODEC_MAP = {
"AVC": "H.264",

View File

@ -1,2 +1,5 @@
from .aria2c import aria2c
from .saldl import saldl
from .curl_impersonate import curl_impersonate
from .requests import requests
__all__ = ("aria2c", "curl_impersonate", "requests")

View File

@ -1,88 +1,354 @@
import asyncio
import os
import subprocess
import textwrap
import time
from functools import partial
from http.cookiejar import CookieJar
from pathlib import Path
from typing import Union, Optional
from typing import Any, Callable, Generator, MutableMapping, Optional, Union
from urllib.parse import urlparse
import requests
from Crypto.Random import get_random_bytes
from requests import Session
from requests.cookies import cookiejar_from_dict, get_cookie_header
from rich import filesize
from rich.text import Text
from devine.core import binaries
from devine.core.config import config
from devine.core.utilities import get_binary_path, start_pproxy
from devine.core.console import console
from devine.core.constants import DOWNLOAD_CANCELLED
from devine.core.utilities import get_extension, get_free_port
async def aria2c(
uri: Union[str, list[str]],
out: Path,
headers: Optional[dict] = None,
proxy: Optional[str] = None
) -> int:
"""
Download files using Aria2(c).
https://aria2.github.io
def rpc(caller: Callable, secret: str, method: str, params: Optional[list[Any]] = None) -> Any:
"""Make a call to Aria2's JSON-RPC API."""
try:
rpc_res = caller(
json={
"jsonrpc": "2.0",
"id": get_random_bytes(16).hex(),
"method": method,
"params": [f"token:{secret}", *(params or [])]
}
).json()
if rpc_res.get("code"):
# wrap to console width - padding - '[Aria2c]: '
error_pretty = "\n ".join(textwrap.wrap(
f"RPC Error: {rpc_res['message']} ({rpc_res['code']})".strip(),
width=console.width - 20,
initial_indent=""
))
console.log(Text.from_ansi("\n[Aria2c]: " + error_pretty))
return rpc_res["result"]
except requests.exceptions.ConnectionError:
# absorb, process likely ended as it was calling RPC
return
If multiple URLs are provided they will be downloaded in the provided order
to the output directory. They will not be merged together.
"""
segmented = False
if isinstance(uri, list) and len(uri) == 1:
uri = uri[0]
if isinstance(uri, list):
segmented = True
uri = "\n".join([
f"{url}\n"
f"\tdir={out}\n"
f"\tout={i:08}.mp4"
for i, url in enumerate(uri)
])
if out.is_file():
raise ValueError("Provided multiple segments to download, expecting directory path")
elif "\t" not in uri:
uri = f"{uri}\n" \
f"\tdir={out.parent}\n" \
f"\tout={out.name}"
executable = get_binary_path("aria2c", "aria2")
if not executable:
def download(
urls: Union[str, list[str], dict[str, Any], list[dict[str, Any]]],
output_dir: Path,
filename: str,
headers: Optional[MutableMapping[str, Union[str, bytes]]] = None,
cookies: Optional[Union[MutableMapping[str, str], CookieJar]] = None,
proxy: Optional[str] = None,
max_workers: Optional[int] = None
) -> Generator[dict[str, Any], None, None]:
if not urls:
raise ValueError("urls must be provided and not empty")
elif not isinstance(urls, (str, dict, list)):
raise TypeError(f"Expected urls to be {str} or {dict} or a list of one of them, not {type(urls)}")
if not output_dir:
raise ValueError("output_dir must be provided")
elif not isinstance(output_dir, Path):
raise TypeError(f"Expected output_dir to be {Path}, not {type(output_dir)}")
if not filename:
raise ValueError("filename must be provided")
elif not isinstance(filename, str):
raise TypeError(f"Expected filename to be {str}, not {type(filename)}")
if not isinstance(headers, (MutableMapping, type(None))):
raise TypeError(f"Expected headers to be {MutableMapping}, not {type(headers)}")
if not isinstance(cookies, (MutableMapping, CookieJar, type(None))):
raise TypeError(f"Expected cookies to be {MutableMapping} or {CookieJar}, not {type(cookies)}")
if not isinstance(proxy, (str, type(None))):
raise TypeError(f"Expected proxy to be {str}, not {type(proxy)}")
if not max_workers:
max_workers = min(32, (os.cpu_count() or 1) + 4)
elif not isinstance(max_workers, int):
raise TypeError(f"Expected max_workers to be {int}, not {type(max_workers)}")
if not isinstance(urls, list):
urls = [urls]
if not binaries.Aria2:
raise EnvironmentError("Aria2c executable not found...")
if proxy and not proxy.lower().startswith("http://"):
raise ValueError("Only HTTP proxies are supported by aria2(c)")
if cookies and not isinstance(cookies, CookieJar):
cookies = cookiejar_from_dict(cookies)
url_files = []
for i, url in enumerate(urls):
if isinstance(url, str):
url_data = {
"url": url
}
else:
url_data: dict[str, Any] = url
url_filename = filename.format(
i=i,
ext=get_extension(url_data["url"])
)
url_text = url_data["url"]
url_text += f"\n\tdir={output_dir}"
url_text += f"\n\tout={url_filename}"
if cookies:
mock_request = requests.Request(url=url_data["url"])
cookie_header = get_cookie_header(cookies, mock_request)
if cookie_header:
url_text += f"\n\theader=Cookie: {cookie_header}"
for key, value in url_data.items():
if key == "url":
continue
if key == "headers":
for header_name, header_value in value.items():
url_text += f"\n\theader={header_name}: {header_value}"
else:
url_text += f"\n\t{key}={value}"
url_files.append(url_text)
url_file = "\n".join(url_files)
rpc_port = get_free_port()
rpc_secret = get_random_bytes(16).hex()
rpc_uri = f"http://127.0.0.1:{rpc_port}/jsonrpc"
rpc_session = Session()
max_concurrent_downloads = int(config.aria2c.get("max_concurrent_downloads", max_workers))
max_connection_per_server = int(config.aria2c.get("max_connection_per_server", 1))
split = int(config.aria2c.get("split", 5))
file_allocation = config.aria2c.get("file_allocation", "prealloc")
if len(urls) > 1:
split = 1
file_allocation = "none"
arguments = [
"-c", # Continue downloading a partially downloaded file
"--remote-time", # Retrieve timestamp of the remote file from the and apply if available
"-x", "16", # The maximum number of connections to one server for each download
"-j", "16", # The maximum number of parallel downloads for every static (HTTP/FTP) URL
"-s", ("1" if segmented else "16"), # Download a file using N connections
"--min-split-size", ("1024M" if segmented else "20M"), # effectively disable split if segmented
# [Basic Options]
"--input-file", "-",
"--all-proxy", proxy or "",
"--continue=true",
# [Connection Options]
f"--max-concurrent-downloads={max_concurrent_downloads}",
f"--max-connection-per-server={max_connection_per_server}",
f"--split={split}", # each split uses their own connection
"--max-file-not-found=5", # counted towards --max-tries
"--max-tries=5",
"--retry-wait=2",
# [Advanced Options]
"--allow-overwrite=true",
"--auto-file-renaming=false",
"--retry-wait", "2", # Set the seconds to wait between retries.
"--max-tries", "5",
"--max-file-not-found", "5",
"--summary-interval", "0",
"--file-allocation", config.aria2c.get("file_allocation", "falloc"),
"--console-log-level", "warn",
"--download-result", "hide",
"-i", "-"
"--console-log-level=warn",
"--download-result=default",
f"--file-allocation={file_allocation}",
"--summary-interval=0",
# [RPC Options]
"--enable-rpc=true",
f"--rpc-listen-port={rpc_port}",
f"--rpc-secret={rpc_secret}"
]
for header, value in (headers or {}).items():
if header.lower() == "cookie":
raise ValueError("You cannot set Cookies as a header manually, please use the `cookies` param.")
if header.lower() == "accept-encoding":
# we cannot set an allowed encoding, or it will return compressed
# and the code is not set up to uncompress the data
continue
if header.lower() == "referer":
arguments.extend(["--referer", value])
continue
if header.lower() == "user-agent":
arguments.extend(["--user-agent", value])
continue
arguments.extend(["--header", f"{header}: {value}"])
if proxy and proxy.lower().split(":")[0] != "http":
# HTTPS proxies not supported by Aria2c.
# Proxy the proxy via pproxy to access it as a HTTP proxy.
async with start_pproxy(proxy) as pproxy_:
return await aria2c(uri, out, headers, pproxy_)
yield dict(total=len(urls))
if proxy:
arguments += ["--all-proxy", proxy]
try:
p = subprocess.Popen(
[
binaries.Aria2,
*arguments
],
stdin=subprocess.PIPE,
stdout=subprocess.DEVNULL
)
p = await asyncio.create_subprocess_exec(executable, *arguments, stdin=subprocess.PIPE)
await p.communicate(uri.encode())
if p.returncode != 0:
raise subprocess.CalledProcessError(p.returncode, arguments)
p.stdin.write(url_file.encode())
p.stdin.close()
return p.returncode
while p.poll() is None:
global_stats: dict[str, Any] = rpc(
caller=partial(rpc_session.post, url=rpc_uri),
secret=rpc_secret,
method="aria2.getGlobalStat"
) or {}
number_stopped = int(global_stats.get("numStoppedTotal", 0))
download_speed = int(global_stats.get("downloadSpeed", -1))
if number_stopped:
yield dict(completed=number_stopped)
if download_speed != -1:
yield dict(downloaded=f"{filesize.decimal(download_speed)}/s")
stopped_downloads: list[dict[str, Any]] = rpc(
caller=partial(rpc_session.post, url=rpc_uri),
secret=rpc_secret,
method="aria2.tellStopped",
params=[0, 999999]
) or []
for dl in stopped_downloads:
if dl["status"] == "error":
used_uri = next(
uri["uri"]
for file in dl["files"]
if file["selected"] == "true"
for uri in file["uris"]
if uri["status"] == "used"
)
error = f"Download Error (#{dl['gid']}): {dl['errorMessage']} ({dl['errorCode']}), {used_uri}"
error_pretty = "\n ".join(textwrap.wrap(
error,
width=console.width - 20,
initial_indent=""
))
console.log(Text.from_ansi("\n[Aria2c]: " + error_pretty))
raise ValueError(error)
if number_stopped == len(urls):
rpc(
caller=partial(rpc_session.post, url=rpc_uri),
secret=rpc_secret,
method="aria2.shutdown"
)
break
time.sleep(1)
p.wait()
if p.returncode != 0:
raise subprocess.CalledProcessError(p.returncode, arguments)
except ConnectionResetError:
# interrupted while passing URI to download
raise KeyboardInterrupt()
except subprocess.CalledProcessError as e:
if e.returncode in (7, 0xC000013A):
# 7 is when Aria2(c) handled the CTRL+C
# 0xC000013A is when it never got the chance to
raise KeyboardInterrupt()
raise
except KeyboardInterrupt:
DOWNLOAD_CANCELLED.set() # skip pending track downloads
yield dict(downloaded="[yellow]CANCELLED")
raise
except Exception:
DOWNLOAD_CANCELLED.set() # skip pending track downloads
yield dict(downloaded="[red]FAILED")
raise
finally:
rpc(
caller=partial(rpc_session.post, url=rpc_uri),
secret=rpc_secret,
method="aria2.shutdown"
)
__ALL__ = (aria2c,)
def aria2c(
urls: Union[str, list[str], dict[str, Any], list[dict[str, Any]]],
output_dir: Path,
filename: str,
headers: Optional[MutableMapping[str, Union[str, bytes]]] = None,
cookies: Optional[Union[MutableMapping[str, str], CookieJar]] = None,
proxy: Optional[str] = None,
max_workers: Optional[int] = None
) -> Generator[dict[str, Any], None, None]:
"""
Download files using Aria2(c).
https://aria2.github.io
Yields the following download status updates while chunks are downloading:
- {total: 100} (100% download total)
- {completed: 1} (1% download progress out of 100%)
- {downloaded: "10.1 MB/s"} (currently downloading at a rate of 10.1 MB/s)
The data is in the same format accepted by rich's progress.update() function.
Parameters:
urls: Web URL(s) to file(s) to download. You can use a dictionary with the key
"url" for the URI, and other keys for extra arguments to use per-URL.
output_dir: The folder to save the file into. If the save path's directory does
not exist then it will be made automatically.
filename: The filename or filename template to use for each file. The variables
you can use are `i` for the URL index and `ext` for the URL extension.
headers: A mapping of HTTP Header Key/Values to use for all downloads.
cookies: A mapping of Cookie Key/Values or a Cookie Jar to use for all downloads.
proxy: An optional proxy URI to route connections through for all downloads.
max_workers: The maximum amount of threads to use for downloads. Defaults to
min(32,(cpu_count+4)). Use for the --max-concurrent-downloads option.
"""
if proxy and not proxy.lower().startswith("http://"):
# Only HTTP proxies are supported by aria2(c)
proxy = urlparse(proxy)
port = get_free_port()
username, password = get_random_bytes(8).hex(), get_random_bytes(8).hex()
local_proxy = f"http://{username}:{password}@localhost:{port}"
scheme = {
"https": "http+ssl",
"socks5h": "socks"
}.get(proxy.scheme, proxy.scheme)
remote_server = f"{scheme}://{proxy.hostname}"
if proxy.port:
remote_server += f":{proxy.port}"
if proxy.username or proxy.password:
remote_server += "#"
if proxy.username:
remote_server += proxy.username
if proxy.password:
remote_server += f":{proxy.password}"
p = subprocess.Popen(
[
"pproxy",
"-l", f"http://:{port}#{username}:{password}",
"-r", remote_server
],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL
)
try:
yield from download(urls, output_dir, filename, headers, cookies, local_proxy, max_workers)
finally:
p.kill()
p.wait()
return
yield from download(urls, output_dir, filename, headers, cookies, proxy, max_workers)
__all__ = ("aria2c",)

View File

@ -0,0 +1,283 @@
import math
import time
from concurrent import futures
from concurrent.futures.thread import ThreadPoolExecutor
from http.cookiejar import CookieJar
from pathlib import Path
from typing import Any, Generator, MutableMapping, Optional, Union
from curl_cffi.requests import Session
from rich import filesize
from devine.core.config import config
from devine.core.constants import DOWNLOAD_CANCELLED
from devine.core.utilities import get_extension
MAX_ATTEMPTS = 5
RETRY_WAIT = 2
CHUNK_SIZE = 1024
PROGRESS_WINDOW = 5
BROWSER = config.curl_impersonate.get("browser", "chrome124")
def download(
url: str,
save_path: Path,
session: Session,
**kwargs: Any
) -> Generator[dict[str, Any], None, None]:
"""
Download files using Curl Impersonate.
https://github.com/lwthiker/curl-impersonate
Yields the following download status updates while chunks are downloading:
- {total: 123} (there are 123 chunks to download)
- {total: None} (there are an unknown number of chunks to download)
- {advance: 1} (one chunk was downloaded)
- {downloaded: "10.1 MB/s"} (currently downloading at a rate of 10.1 MB/s)
- {file_downloaded: Path(...), written: 1024} (download finished, has the save path and size)
The data is in the same format accepted by rich's progress.update() function. The
`downloaded` key is custom and is not natively accepted by all rich progress bars.
Parameters:
url: Web URL of a file to download.
save_path: The path to save the file to. If the save path's directory does not
exist then it will be made automatically.
session: The Requests or Curl-Impersonate Session to make HTTP requests with.
Useful to set Header, Cookie, and Proxy data. Connections are saved and
re-used with the session so long as the server keeps the connection alive.
kwargs: Any extra keyword arguments to pass to the session.get() call. Use this
for one-time request changes like a header, cookie, or proxy. For example,
to request Byte-ranges use e.g., `headers={"Range": "bytes=0-128"}`.
"""
save_dir = save_path.parent
control_file = save_path.with_name(f"{save_path.name}.!dev")
save_dir.mkdir(parents=True, exist_ok=True)
if control_file.exists():
# consider the file corrupt if the control file exists
save_path.unlink(missing_ok=True)
control_file.unlink()
elif save_path.exists():
# if it exists, and no control file, then it should be safe
yield dict(
file_downloaded=save_path,
written=save_path.stat().st_size
)
# TODO: Design a control file format so we know how much of the file is missing
control_file.write_bytes(b"")
attempts = 1
try:
while True:
written = 0
download_sizes = []
last_speed_refresh = time.time()
try:
stream = session.get(url, stream=True, **kwargs)
stream.raise_for_status()
try:
content_length = int(stream.headers.get("Content-Length", "0"))
except ValueError:
content_length = 0
if content_length > 0:
yield dict(total=math.ceil(content_length / CHUNK_SIZE))
else:
# we have no data to calculate total chunks
yield dict(total=None) # indeterminate mode
with open(save_path, "wb") as f:
for chunk in stream.iter_content(chunk_size=CHUNK_SIZE):
download_size = len(chunk)
f.write(chunk)
written += download_size
yield dict(advance=1)
now = time.time()
time_since = now - last_speed_refresh
download_sizes.append(download_size)
if time_since > PROGRESS_WINDOW or download_size < CHUNK_SIZE:
data_size = sum(download_sizes)
download_speed = math.ceil(data_size / (time_since or 1))
yield dict(downloaded=f"{filesize.decimal(download_speed)}/s")
last_speed_refresh = now
download_sizes.clear()
yield dict(
file_downloaded=save_path,
written=written
)
break
except Exception as e:
save_path.unlink(missing_ok=True)
if DOWNLOAD_CANCELLED.is_set() or attempts == MAX_ATTEMPTS:
raise e
time.sleep(RETRY_WAIT)
attempts += 1
finally:
control_file.unlink()
def curl_impersonate(
urls: Union[str, list[str], dict[str, Any], list[dict[str, Any]]],
output_dir: Path,
filename: str,
headers: Optional[MutableMapping[str, Union[str, bytes]]] = None,
cookies: Optional[Union[MutableMapping[str, str], CookieJar]] = None,
proxy: Optional[str] = None,
max_workers: Optional[int] = None
) -> Generator[dict[str, Any], None, None]:
"""
Download files using Curl Impersonate.
https://github.com/lwthiker/curl-impersonate
Yields the following download status updates while chunks are downloading:
- {total: 123} (there are 123 chunks to download)
- {total: None} (there are an unknown number of chunks to download)
- {advance: 1} (one chunk was downloaded)
- {downloaded: "10.1 MB/s"} (currently downloading at a rate of 10.1 MB/s)
- {file_downloaded: Path(...), written: 1024} (download finished, has the save path and size)
The data is in the same format accepted by rich's progress.update() function.
However, The `downloaded`, `file_downloaded` and `written` keys are custom and not
natively accepted by rich progress bars.
Parameters:
urls: Web URL(s) to file(s) to download. You can use a dictionary with the key
"url" for the URI, and other keys for extra arguments to use per-URL.
output_dir: The folder to save the file into. If the save path's directory does
not exist then it will be made automatically.
filename: The filename or filename template to use for each file. The variables
you can use are `i` for the URL index and `ext` for the URL extension.
headers: A mapping of HTTP Header Key/Values to use for all downloads.
cookies: A mapping of Cookie Key/Values or a Cookie Jar to use for all downloads.
proxy: An optional proxy URI to route connections through for all downloads.
max_workers: The maximum amount of threads to use for downloads. Defaults to
min(32,(cpu_count+4)).
"""
if not urls:
raise ValueError("urls must be provided and not empty")
elif not isinstance(urls, (str, dict, list)):
raise TypeError(f"Expected urls to be {str} or {dict} or a list of one of them, not {type(urls)}")
if not output_dir:
raise ValueError("output_dir must be provided")
elif not isinstance(output_dir, Path):
raise TypeError(f"Expected output_dir to be {Path}, not {type(output_dir)}")
if not filename:
raise ValueError("filename must be provided")
elif not isinstance(filename, str):
raise TypeError(f"Expected filename to be {str}, not {type(filename)}")
if not isinstance(headers, (MutableMapping, type(None))):
raise TypeError(f"Expected headers to be {MutableMapping}, not {type(headers)}")
if not isinstance(cookies, (MutableMapping, CookieJar, type(None))):
raise TypeError(f"Expected cookies to be {MutableMapping} or {CookieJar}, not {type(cookies)}")
if not isinstance(proxy, (str, type(None))):
raise TypeError(f"Expected proxy to be {str}, not {type(proxy)}")
if not isinstance(max_workers, (int, type(None))):
raise TypeError(f"Expected max_workers to be {int}, not {type(max_workers)}")
if not isinstance(urls, list):
urls = [urls]
urls = [
dict(
save_path=save_path,
**url
) if isinstance(url, dict) else dict(
url=url,
save_path=save_path
)
for i, url in enumerate(urls)
for save_path in [output_dir / filename.format(
i=i,
ext=get_extension(url["url"] if isinstance(url, dict) else url)
)]
]
session = Session(impersonate=BROWSER)
if headers:
headers = {
k: v
for k, v in headers.items()
if k.lower() != "accept-encoding"
}
session.headers.update(headers)
if cookies:
session.cookies.update(cookies)
if proxy:
session.proxies.update({"all": proxy})
yield dict(total=len(urls))
download_sizes = []
last_speed_refresh = time.time()
with ThreadPoolExecutor(max_workers=max_workers) as pool:
for i, future in enumerate(futures.as_completed((
pool.submit(
download,
session=session,
**url
)
for url in urls
))):
file_path, download_size = None, None
try:
for status_update in future.result():
if status_update.get("file_downloaded") and status_update.get("written"):
file_path = status_update["file_downloaded"]
download_size = status_update["written"]
elif len(urls) == 1:
# these are per-chunk updates, only useful if it's one big file
yield status_update
except KeyboardInterrupt:
DOWNLOAD_CANCELLED.set() # skip pending track downloads
yield dict(downloaded="[yellow]CANCELLING")
pool.shutdown(wait=True, cancel_futures=True)
yield dict(downloaded="[yellow]CANCELLED")
# tell dl that it was cancelled
# the pool is already shut down, so exiting loop is fine
raise
except Exception:
DOWNLOAD_CANCELLED.set() # skip pending track downloads
yield dict(downloaded="[red]FAILING")
pool.shutdown(wait=True, cancel_futures=True)
yield dict(downloaded="[red]FAILED")
# tell dl that it failed
# the pool is already shut down, so exiting loop is fine
raise
else:
yield dict(file_downloaded=file_path)
yield dict(advance=1)
now = time.time()
time_since = now - last_speed_refresh
if download_size: # no size == skipped dl
download_sizes.append(download_size)
if download_sizes and (time_since > PROGRESS_WINDOW or i == len(urls)):
data_size = sum(download_sizes)
download_speed = math.ceil(data_size / (time_since or 1))
yield dict(downloaded=f"{filesize.decimal(download_speed)}/s")
last_speed_refresh = now
download_sizes.clear()
__all__ = ("curl_impersonate",)

View File

@ -0,0 +1,292 @@
import math
import os
import time
from concurrent.futures import as_completed
from concurrent.futures.thread import ThreadPoolExecutor
from http.cookiejar import CookieJar
from pathlib import Path
from typing import Any, Generator, MutableMapping, Optional, Union
from requests import Session
from requests.adapters import HTTPAdapter
from rich import filesize
from devine.core.constants import DOWNLOAD_CANCELLED
from devine.core.utilities import get_extension
MAX_ATTEMPTS = 5
RETRY_WAIT = 2
CHUNK_SIZE = 1024
PROGRESS_WINDOW = 5
DOWNLOAD_SIZES = []
LAST_SPEED_REFRESH = time.time()
def download(
url: str,
save_path: Path,
session: Optional[Session] = None,
segmented: bool = False,
**kwargs: Any
) -> Generator[dict[str, Any], None, None]:
"""
Download a file using Python Requests.
https://requests.readthedocs.io
Yields the following download status updates while chunks are downloading:
- {total: 123} (there are 123 chunks to download)
- {total: None} (there are an unknown number of chunks to download)
- {advance: 1} (one chunk was downloaded)
- {downloaded: "10.1 MB/s"} (currently downloading at a rate of 10.1 MB/s)
- {file_downloaded: Path(...), written: 1024} (download finished, has the save path and size)
The data is in the same format accepted by rich's progress.update() function. The
`downloaded` key is custom and is not natively accepted by all rich progress bars.
Parameters:
url: Web URL of a file to download.
save_path: The path to save the file to. If the save path's directory does not
exist then it will be made automatically.
session: The Requests Session to make HTTP requests with. Useful to set Header,
Cookie, and Proxy data. Connections are saved and re-used with the session
so long as the server keeps the connection alive.
segmented: If downloads are segments or parts of one bigger file.
kwargs: Any extra keyword arguments to pass to the session.get() call. Use this
for one-time request changes like a header, cookie, or proxy. For example,
to request Byte-ranges use e.g., `headers={"Range": "bytes=0-128"}`.
"""
global LAST_SPEED_REFRESH
session = session or Session()
save_dir = save_path.parent
control_file = save_path.with_name(f"{save_path.name}.!dev")
save_dir.mkdir(parents=True, exist_ok=True)
if control_file.exists():
# consider the file corrupt if the control file exists
save_path.unlink(missing_ok=True)
control_file.unlink()
elif save_path.exists():
# if it exists, and no control file, then it should be safe
yield dict(
file_downloaded=save_path,
written=save_path.stat().st_size
)
# TODO: This should return, potential recovery bug
# TODO: Design a control file format so we know how much of the file is missing
control_file.write_bytes(b"")
attempts = 1
try:
while True:
written = 0
# these are for single-url speed calcs only
download_sizes = []
last_speed_refresh = time.time()
try:
stream = session.get(url, stream=True, **kwargs)
stream.raise_for_status()
if not segmented:
try:
content_length = int(stream.headers.get("Content-Length", "0"))
except ValueError:
content_length = 0
if content_length > 0:
yield dict(total=math.ceil(content_length / CHUNK_SIZE))
else:
# we have no data to calculate total chunks
yield dict(total=None) # indeterminate mode
with open(save_path, "wb") as f:
for chunk in stream.iter_content(chunk_size=CHUNK_SIZE):
download_size = len(chunk)
f.write(chunk)
written += download_size
if not segmented:
yield dict(advance=1)
now = time.time()
time_since = now - last_speed_refresh
download_sizes.append(download_size)
if time_since > PROGRESS_WINDOW or download_size < CHUNK_SIZE:
data_size = sum(download_sizes)
download_speed = math.ceil(data_size / (time_since or 1))
yield dict(downloaded=f"{filesize.decimal(download_speed)}/s")
last_speed_refresh = now
download_sizes.clear()
yield dict(file_downloaded=save_path, written=written)
if segmented:
yield dict(advance=1)
now = time.time()
time_since = now - LAST_SPEED_REFRESH
if written: # no size == skipped dl
DOWNLOAD_SIZES.append(written)
if DOWNLOAD_SIZES and time_since > PROGRESS_WINDOW:
data_size = sum(DOWNLOAD_SIZES)
download_speed = math.ceil(data_size / (time_since or 1))
yield dict(downloaded=f"{filesize.decimal(download_speed)}/s")
LAST_SPEED_REFRESH = now
DOWNLOAD_SIZES.clear()
break
except Exception as e:
save_path.unlink(missing_ok=True)
if DOWNLOAD_CANCELLED.is_set() or attempts == MAX_ATTEMPTS:
raise e
time.sleep(RETRY_WAIT)
attempts += 1
finally:
control_file.unlink()
def requests(
urls: Union[str, list[str], dict[str, Any], list[dict[str, Any]]],
output_dir: Path,
filename: str,
headers: Optional[MutableMapping[str, Union[str, bytes]]] = None,
cookies: Optional[Union[MutableMapping[str, str], CookieJar]] = None,
proxy: Optional[str] = None,
max_workers: Optional[int] = None
) -> Generator[dict[str, Any], None, None]:
"""
Download a file using Python Requests.
https://requests.readthedocs.io
Yields the following download status updates while chunks are downloading:
- {total: 123} (there are 123 chunks to download)
- {total: None} (there are an unknown number of chunks to download)
- {advance: 1} (one chunk was downloaded)
- {downloaded: "10.1 MB/s"} (currently downloading at a rate of 10.1 MB/s)
- {file_downloaded: Path(...), written: 1024} (download finished, has the save path and size)
The data is in the same format accepted by rich's progress.update() function.
However, The `downloaded`, `file_downloaded` and `written` keys are custom and not
natively accepted by rich progress bars.
Parameters:
urls: Web URL(s) to file(s) to download. You can use a dictionary with the key
"url" for the URI, and other keys for extra arguments to use per-URL.
output_dir: The folder to save the file into. If the save path's directory does
not exist then it will be made automatically.
filename: The filename or filename template to use for each file. The variables
you can use are `i` for the URL index and `ext` for the URL extension.
headers: A mapping of HTTP Header Key/Values to use for all downloads.
cookies: A mapping of Cookie Key/Values or a Cookie Jar to use for all downloads.
proxy: An optional proxy URI to route connections through for all downloads.
max_workers: The maximum amount of threads to use for downloads. Defaults to
min(32,(cpu_count+4)).
"""
if not urls:
raise ValueError("urls must be provided and not empty")
elif not isinstance(urls, (str, dict, list)):
raise TypeError(f"Expected urls to be {str} or {dict} or a list of one of them, not {type(urls)}")
if not output_dir:
raise ValueError("output_dir must be provided")
elif not isinstance(output_dir, Path):
raise TypeError(f"Expected output_dir to be {Path}, not {type(output_dir)}")
if not filename:
raise ValueError("filename must be provided")
elif not isinstance(filename, str):
raise TypeError(f"Expected filename to be {str}, not {type(filename)}")
if not isinstance(headers, (MutableMapping, type(None))):
raise TypeError(f"Expected headers to be {MutableMapping}, not {type(headers)}")
if not isinstance(cookies, (MutableMapping, CookieJar, type(None))):
raise TypeError(f"Expected cookies to be {MutableMapping} or {CookieJar}, not {type(cookies)}")
if not isinstance(proxy, (str, type(None))):
raise TypeError(f"Expected proxy to be {str}, not {type(proxy)}")
if not isinstance(max_workers, (int, type(None))):
raise TypeError(f"Expected max_workers to be {int}, not {type(max_workers)}")
if not isinstance(urls, list):
urls = [urls]
if not max_workers:
max_workers = min(32, (os.cpu_count() or 1) + 4)
urls = [
dict(
save_path=save_path,
**url
) if isinstance(url, dict) else dict(
url=url,
save_path=save_path
)
for i, url in enumerate(urls)
for save_path in [output_dir / filename.format(
i=i,
ext=get_extension(url["url"] if isinstance(url, dict) else url)
)]
]
session = Session()
session.mount("https://", HTTPAdapter(
pool_connections=max_workers,
pool_maxsize=max_workers,
pool_block=True
))
session.mount("http://", session.adapters["https://"])
if headers:
headers = {
k: v
for k, v in headers.items()
if k.lower() != "accept-encoding"
}
session.headers.update(headers)
if cookies:
session.cookies.update(cookies)
if proxy:
session.proxies.update({"all": proxy})
yield dict(total=len(urls))
try:
with ThreadPoolExecutor(max_workers=max_workers) as pool:
for future in as_completed(
pool.submit(
download,
session=session,
segmented=True,
**url
)
for url in urls
):
try:
yield from future.result()
except KeyboardInterrupt:
DOWNLOAD_CANCELLED.set() # skip pending track downloads
yield dict(downloaded="[yellow]CANCELLING")
pool.shutdown(wait=True, cancel_futures=True)
yield dict(downloaded="[yellow]CANCELLED")
# tell dl that it was cancelled
# the pool is already shut down, so exiting loop is fine
raise
except Exception:
DOWNLOAD_CANCELLED.set() # skip pending track downloads
yield dict(downloaded="[red]FAILING")
pool.shutdown(wait=True, cancel_futures=True)
yield dict(downloaded="[red]FAILED")
# tell dl that it failed
# the pool is already shut down, so exiting loop is fine
raise
finally:
DOWNLOAD_SIZES.clear()
__all__ = ("requests",)

View File

@ -1,51 +0,0 @@
import subprocess
from pathlib import Path
from typing import Union, Optional
from devine.core.utilities import get_binary_path
async def saldl(
uri: Union[str, list[str]],
out: Union[Path, str],
headers: Optional[dict] = None,
proxy: Optional[str] = None
) -> int:
out = Path(out)
if headers:
headers.update({k: v for k, v in headers.items() if k.lower() != "accept-encoding"})
executable = get_binary_path("saldl", "saldl-win64", "saldl-win32")
if not executable:
raise EnvironmentError("Saldl executable not found...")
arguments = [
executable,
# "--no-status",
"--skip-TLS-verification",
"--resume",
"--merge-in-order",
"-c8",
"--auto-size", "1",
"-D", str(out.parent),
"-o", out.name
]
if headers:
arguments.extend([
"--custom-headers",
"\r\n".join([f"{k}: {v}" for k, v in headers.items()])
])
if proxy:
arguments.extend(["--proxy", proxy])
if isinstance(uri, list):
raise ValueError("Saldl code does not yet support multiple uri (e.g. segmented) downloads.")
arguments.append(uri)
return subprocess.check_call(arguments)
__ALL__ = (saldl,)

View File

@ -4,3 +4,6 @@ from devine.core.drm.clearkey import ClearKey
from devine.core.drm.widevine import Widevine
DRM_T = Union[ClearKey, Widevine]
__all__ = ("ClearKey", "Widevine", "DRM_T")

View File

@ -1,13 +1,15 @@
from __future__ import annotations
import base64
import shutil
from pathlib import Path
from typing import Optional, Union
from urllib.parse import urljoin
import requests
from Cryptodome.Cipher import AES
from Cryptodome.Util.Padding import unpad
from m3u8.model import Key
from devine.core.constants import TrackT
from requests import Session
class ClearKey:
@ -34,49 +36,77 @@ class ClearKey:
self.key: bytes = key
self.iv: bytes = iv
def decrypt(self, track: TrackT) -> None:
def decrypt(self, path: Path) -> None:
"""Decrypt a Track with AES Clear Key DRM."""
if not track.path or not track.path.exists():
raise ValueError("Tried to decrypt a track that has not yet been downloaded.")
if not path or not path.exists():
raise ValueError("Tried to decrypt a file that does not exist.")
decrypted = AES. \
new(self.key, AES.MODE_CBC, self.iv). \
decrypt(track.path.read_bytes())
decrypt(path.read_bytes())
decrypted_path = track.path.with_suffix(f".decrypted{track.path.suffix}")
try:
decrypted = unpad(decrypted, AES.block_size)
except ValueError:
# the decrypted data is likely already in the block size boundary
pass
decrypted_path = path.with_suffix(f".decrypted{path.suffix}")
decrypted_path.write_bytes(decrypted)
track.swap(decrypted_path)
track.drm = None
path.unlink()
shutil.move(decrypted_path, path)
@classmethod
def from_m3u_key(cls, m3u_key: Key, proxy: Optional[str] = None) -> ClearKey:
def from_m3u_key(cls, m3u_key: Key, session: Optional[Session] = None) -> ClearKey:
"""
Load a ClearKey from an M3U(8) Playlist's EXT-X-KEY.
Parameters:
m3u_key: A Key object parsed from a m3u(8) playlist using
the `m3u8` library.
session: Optional session used to request external URIs with.
Useful to set headers, proxies, cookies, and so forth.
"""
if not isinstance(m3u_key, Key):
raise ValueError(f"Provided M3U Key is in an unexpected type {m3u_key!r}")
if not isinstance(session, (Session, type(None))):
raise TypeError(f"Expected session to be a {Session}, not a {type(session)}")
if not m3u_key.method.startswith("AES"):
raise ValueError(f"Provided M3U Key is not an AES Clear Key, {m3u_key.method}")
if not m3u_key.uri:
raise ValueError("No URI in M3U Key, unable to get Key.")
res = requests.get(
url=urljoin(m3u_key.base_uri, m3u_key.uri),
headers={
"User-Agent": "smartexoplayer/1.1.0 (Linux;Android 8.0.0) ExoPlayerLib/2.13.3"
},
proxies={"all": proxy} if proxy else None
)
res.raise_for_status()
if not res.content:
raise EOFError("Unexpected Empty Response by M3U Key URI.")
if len(res.content) < 16:
raise EOFError(f"Unexpected Length of Key ({len(res.content)} bytes) in M3U Key.")
if not session:
session = Session()
if not session.headers.get("User-Agent"):
# commonly needed default for HLS playlists
session.headers["User-Agent"] = "smartexoplayer/1.1.0 (Linux;Android 8.0.0) ExoPlayerLib/2.13.3"
if m3u_key.uri.startswith("data:"):
media_types, data = m3u_key.uri[5:].split(",")
media_types = media_types.split(";")
if "base64" in media_types:
data = base64.b64decode(data)
key = data
else:
url = urljoin(m3u_key.base_uri, m3u_key.uri)
res = session.get(url)
res.raise_for_status()
if not res.content:
raise EOFError("Unexpected Empty Response by M3U Key URI.")
if len(res.content) < 16:
raise EOFError(f"Unexpected Length of Key ({len(res.content)} bytes) in M3U Key.")
key = res.content
key = res.content
iv = None
if m3u_key.iv:
iv = bytes.fromhex(m3u_key.iv.replace("0x", ""))
else:
iv = None
return cls(key=key, iv=iv)
__ALL__ = (ClearKey,)
__all__ = ("ClearKey",)

View File

@ -1,9 +1,11 @@
from __future__ import annotations
import base64
import shutil
import subprocess
import sys
from typing import Any, Optional, Union, Callable
import textwrap
from pathlib import Path
from typing import Any, Callable, Optional, Union
from uuid import UUID
import m3u8
@ -12,10 +14,13 @@ from pymp4.parser import Box
from pywidevine.cdm import Cdm as WidevineCdm
from pywidevine.pssh import PSSH
from requests import Session
from rich.text import Text
from devine.core import binaries
from devine.core.config import config
from devine.core.constants import AnyTrack, TrackT
from devine.core.utilities import get_binary_path, get_boxes
from devine.core.console import console
from devine.core.constants import AnyTrack
from devine.core.utilities import get_boxes
from devine.core.utils.subprocess import ffprobe
@ -73,13 +78,8 @@ class Widevine:
pssh_boxes: list[Container] = []
tenc_boxes: list[Container] = []
if track.descriptor == track.Descriptor.M3U:
if track.descriptor == track.Descriptor.HLS:
m3u_url = track.url
if isinstance(m3u_url, list):
# TODO: Find out why exactly the track url could be a list in this
# scenario, as if its a list of segments, they would be files
# not m3u documents
m3u_url = m3u_url[0]
master = m3u8.loads(session.get(m3u_url).text, uri=m3u_url)
pssh_boxes.extend(
Box.parse(base64.b64decode(x.uri.split(",")[-1]))
@ -87,7 +87,7 @@ class Widevine:
if x and x.keyformat and x.keyformat.lower() == WidevineCdm.urn
)
init_data = track.get_init_segment(session)
init_data = track.get_init_segment(session=session)
if init_data:
# try get via ffprobe, needed for non mp4 data e.g. WEBM from Google Play
probe = ffprobe(init_data)
@ -114,6 +114,50 @@ class Widevine:
return cls(pssh=PSSH(pssh), kid=kid)
@classmethod
def from_init_data(cls, init_data: bytes) -> Widevine:
"""
Get PSSH and KID from within Initialization Segment Data.
This should only be used if a PSSH could not be provided directly.
It is *rare* to need to use this.
Raises:
PSSHNotFound - If the PSSH was not found within the data.
KIDNotFound - If the KID was not found within the data or PSSH.
"""
if not init_data:
raise ValueError("Init data should be provided.")
if not isinstance(init_data, bytes):
raise TypeError(f"Expected init data to be bytes, not {init_data!r}")
kid: Optional[UUID] = None
pssh_boxes: list[Container] = list(get_boxes(init_data, b"pssh"))
tenc_boxes: list[Container] = list(get_boxes(init_data, b"tenc"))
# try get via ffprobe, needed for non mp4 data e.g. WEBM from Google Play
probe = ffprobe(init_data)
if probe:
for stream in probe.get("streams") or []:
enc_key_id = stream.get("tags", {}).get("enc_key_id")
if enc_key_id:
kid = UUID(bytes=base64.b64decode(enc_key_id))
pssh_boxes.sort(key=lambda b: {
PSSH.SystemId.Widevine: 0,
PSSH.SystemId.PlayReady: 1
}[b.system_ID])
pssh = next(iter(pssh_boxes), None)
if not pssh:
raise Widevine.Exceptions.PSSHNotFound("PSSH was not found in track data.")
tenc = next(iter(tenc_boxes), None)
if not kid and tenc and tenc.key_ID.int != 0:
kid = tenc.key_ID
return cls(pssh=PSSH(pssh), kid=kid)
@property
def pssh(self) -> PSSH:
"""Get Protection System Specific Header Box."""
@ -161,14 +205,14 @@ class Widevine:
for key in cdm.get_keys(session_id, "CONTENT")
}
if not self.content_keys:
raise ValueError("No Content Keys were returned by the License")
raise Widevine.Exceptions.EmptyLicense("No Content Keys were within the License")
if kid not in self.content_keys:
raise ValueError(f"No Content Key with the KID ({kid.hex}) was returned")
raise Widevine.Exceptions.CEKNotFound(f"No Content Key for KID {kid.hex} within the License")
finally:
cdm.close(session_id)
def decrypt(self, track: TrackT) -> None:
def decrypt(self, path: Path) -> None:
"""
Decrypt a Track with Widevine DRM.
Raises:
@ -179,19 +223,17 @@ class Widevine:
if not self.content_keys:
raise ValueError("Cannot decrypt a Track without any Content Keys...")
platform = {"win32": "win", "darwin": "osx"}.get(sys.platform, sys.platform)
executable = get_binary_path("shaka-packager", f"packager-{platform}", f"packager-{platform}-x64")
if not executable:
if not binaries.ShakaPackager:
raise EnvironmentError("Shaka Packager executable not found but is required.")
if not track.path or not track.path.exists():
raise ValueError("Tried to decrypt a track that has not yet been downloaded.")
if not path or not path.exists():
raise ValueError("Tried to decrypt a file that does not exist.")
decrypted_path = track.path.with_suffix(f".decrypted{track.path.suffix}")
output_path = path.with_stem(f"{path.stem}_decrypted")
config.directories.temp.mkdir(parents=True, exist_ok=True)
try:
subprocess.check_call([
executable,
f"input={track.path},stream=0,output={decrypted_path}",
arguments = [
f"input={path},stream=0,output={output_path},output_format=MP4",
"--enable_raw_key_decryption", "--keys",
",".join([
*[
@ -199,17 +241,62 @@ class Widevine:
for i, (kid, key) in enumerate(self.content_keys.items())
],
*[
# Apple TV+ needs this as their files do not use the KID supplied in it's manifest
# some services use a blank KID on the file, but real KID for license server
"label={}:key_id={}:key={}".format(i, "00" * 16, key.lower())
for i, (kid, key) in enumerate(self.content_keys.items(), len(self.content_keys))
]
]),
"--temp_dir", config.directories.temp
])
]
p = subprocess.Popen(
[binaries.ShakaPackager, *arguments],
stdout=subprocess.DEVNULL,
stderr=subprocess.PIPE,
universal_newlines=True
)
stream_skipped = False
had_error = False
shaka_log_buffer = ""
for line in iter(p.stderr.readline, ""):
line = line.strip()
if not line:
continue
if "Skip stream" in line:
# file/segment was so small that it didn't have any actual data, ignore
stream_skipped = True
if ":INFO:" in line:
continue
if ":ERROR:" in line:
had_error = True
if "Insufficient bits in bitstream for given AVC profile" in line:
# this is a warning and is something we don't have to worry about
continue
shaka_log_buffer += f"{line.strip()}\n"
if shaka_log_buffer:
# wrap to console width - padding - '[Widevine]: '
shaka_log_buffer = "\n ".join(textwrap.wrap(
shaka_log_buffer.rstrip(),
width=console.width - 22,
initial_indent=""
))
console.log(Text.from_ansi("\n[Widevine]: " + shaka_log_buffer))
p.wait()
if p.returncode != 0 or had_error:
raise subprocess.CalledProcessError(p.returncode, arguments)
path.unlink()
if not stream_skipped:
shutil.move(output_path, path)
except subprocess.CalledProcessError as e:
raise subprocess.SubprocessError(f"Failed to Decrypt! Shaka Packager Error: {e}")
track.swap(decrypted_path)
track.drm = None
if e.returncode == 0xC000013A: # STATUS_CONTROL_C_EXIT
raise KeyboardInterrupt()
raise
class Exceptions:
class PSSHNotFound(Exception):
@ -218,5 +305,11 @@ class Widevine:
class KIDNotFound(Exception):
"""KID (Encryption Key ID) was not found."""
class CEKNotFound(Exception):
"""CEK (Content Encryption Key) for KID was not found in License."""
__ALL__ = (Widevine,)
class EmptyLicense(Exception):
"""License returned no Content Encryption Keys."""
__all__ = ("Widevine",)

79
devine/core/events.py Normal file
View File

@ -0,0 +1,79 @@
from __future__ import annotations
from copy import deepcopy
from enum import Enum
from typing import Any, Callable
class Events:
class Types(Enum):
_reserved = 0
# A Track's segment has finished downloading
SEGMENT_DOWNLOADED = 1
# Track has finished downloading
TRACK_DOWNLOADED = 2
# Track has finished decrypting
TRACK_DECRYPTED = 3
# Track has finished repacking
TRACK_REPACKED = 4
# Track is about to be Multiplexed into a Container
TRACK_MULTIPLEX = 5
def __init__(self):
self.__subscriptions: dict[Events.Types, list[Callable]] = {}
self.__ephemeral: dict[Events.Types, list[Callable]] = {}
self.reset()
def reset(self):
"""Reset Event Observer clearing all Subscriptions."""
self.__subscriptions = {
k: []
for k in Events.Types.__members__.values()
}
self.__ephemeral = deepcopy(self.__subscriptions)
def subscribe(self, event_type: Events.Types, callback: Callable, ephemeral: bool = False) -> None:
"""
Subscribe to an Event with a Callback.
Parameters:
event_type: The Events.Type to subscribe to.
callback: The function or lambda to call on event emit.
ephemeral: Unsubscribe the callback from the event on first emit.
Note that this is not thread-safe and may be called multiple
times at roughly the same time.
"""
[self.__subscriptions, self.__ephemeral][ephemeral][event_type].append(callback)
def unsubscribe(self, event_type: Events.Types, callback: Callable) -> None:
"""
Unsubscribe a Callback from an Event.
Parameters:
event_type: The Events.Type to unsubscribe from.
callback: The function or lambda to remove from event emit.
"""
if callback in self.__subscriptions[event_type]:
self.__subscriptions[event_type].remove(callback)
if callback in self.__ephemeral[event_type]:
self.__ephemeral[event_type].remove(callback)
def emit(self, event_type: Events.Types, *args: Any, **kwargs: Any) -> None:
"""
Emit an Event, executing all subscribed Callbacks.
Parameters:
event_type: The Events.Type to emit.
args: Positional arguments to pass to callbacks.
kwargs: Keyword arguments to pass to callbacks.
"""
if event_type not in self.__subscriptions:
raise ValueError(f"Event type \"{event_type}\" is invalid")
for callback in self.__subscriptions[event_type] + self.__ephemeral[event_type]:
callback(*args, **kwargs)
self.__ephemeral[event_type].clear()
events = Events()

View File

@ -1,2 +1,4 @@
from .dash import DASH
from .hls import HLS
__all__ = ("DASH", "HLS")

View File

@ -1,24 +1,32 @@
from __future__ import annotations
import base64
from hashlib import md5
import html
import logging
import math
import re
import sys
from copy import copy
from typing import Any, Optional, Union, Callable
from functools import partial
from pathlib import Path
from typing import Any, Callable, Optional, Union
from urllib.parse import urljoin, urlparse
from uuid import UUID
from zlib import crc32
import requests
from langcodes import Language, tag_is_valid
from lxml.etree import Element, ElementTree
from pywidevine.cdm import Cdm as WidevineCdm
from pywidevine.pssh import PSSH
from requests import Session
from devine.core.constants import DOWNLOAD_CANCELLED, DOWNLOAD_LICENCE_ONLY, AnyTrack
from devine.core.downloaders import requests as requests_downloader
from devine.core.drm import Widevine
from devine.core.tracks import Tracks, Video, Audio, Subtitle
from devine.core.utilities import is_close_match, FPS
from devine.core.events import events
from devine.core.tracks import Audio, Subtitle, Tracks, Video
from devine.core.utilities import is_close_match, try_ensure_utf8
from devine.core.utils.xml import load_xml
@ -50,6 +58,9 @@ class DASH:
raise TypeError(f"Expected session to be a {Session}, not {session!r}")
res = session.get(url, **args)
if res.url != url:
url = res.url
if not res.ok:
raise requests.ConnectionError(
"Failed to request the MPD document.",
@ -74,12 +85,17 @@ class DASH:
return cls(manifest, url)
def to_tracks(self, language: Union[str, Language], period_filter: Optional[Callable] = None) -> Tracks:
def to_tracks(
self,
language: Optional[Union[str, Language]] = None,
period_filter: Optional[Callable] = None
) -> Tracks:
"""
Convert an MPEG-DASH MPD (Media Presentation Description) document to Video, Audio and Subtitle Track objects.
Convert an MPEG-DASH document to Video, Audio and Subtitle Track objects.
Parameters:
language: Language you expect the Primary Track to be in.
language: The Title's Original Recorded Language. It will also be used as a fallback
track language value if the manifest does not list language information.
period_filter: Filter out period's within the manifest.
All Track URLs will be a list of segment URLs.
@ -90,180 +106,121 @@ class DASH:
if callable(period_filter) and period_filter(period):
continue
period_base_url = period.findtext("BaseURL") or self.manifest.findtext("BaseURL")
if not period_base_url or not re.match("^https?://", period_base_url, re.IGNORECASE):
period_base_url = urljoin(self.url, period_base_url)
for adaptation_set in period.findall("AdaptationSet"):
# flags
trick_mode = any(
x.get("schemeIdUri") == "http://dashif.org/guidelines/trickmode"
for x in (
adaptation_set.findall("EssentialProperty") +
adaptation_set.findall("SupplementalProperty")
)
)
descriptive = any(
(x.get("schemeIdUri"), x.get("value")) == ("urn:mpeg:dash:role:2011", "descriptive")
for x in adaptation_set.findall("Accessibility")
) or any(
(x.get("schemeIdUri"), x.get("value")) == ("urn:tva:metadata:cs:AudioPurposeCS:2007", "1")
for x in adaptation_set.findall("Accessibility")
)
forced = any(
(x.get("schemeIdUri"), x.get("value")) == ("urn:mpeg:dash:role:2011", "forced-subtitle")
for x in adaptation_set.findall("Role")
)
cc = any(
(x.get("schemeIdUri"), x.get("value")) == ("urn:mpeg:dash:role:2011", "caption")
for x in adaptation_set.findall("Role")
)
if trick_mode:
if self.is_trick_mode(adaptation_set):
# we don't want trick mode streams (they are only used for fast-forward/rewind)
continue
for rep in adaptation_set.findall("Representation"):
supplements = rep.findall("SupplementalProperty") + adaptation_set.findall("SupplementalProperty")
get = partial(self._get, adaptation_set=adaptation_set, representation=rep)
findall = partial(self._findall, adaptation_set=adaptation_set, representation=rep, both=True)
segment_base = rep.find("SegmentBase")
content_type = adaptation_set.get("contentType") or \
adaptation_set.get("mimeType") or \
rep.get("contentType") or \
rep.get("mimeType")
if not content_type:
raise ValueError("No content type value could be found")
content_type = content_type.split("/")[0]
codecs = get("codecs")
content_type = get("contentType")
mime_type = get("mimeType")
codecs = rep.get("codecs") or adaptation_set.get("codecs")
if not content_type and mime_type:
content_type = mime_type.split("/")[0]
if not content_type and not mime_type:
raise ValueError("Unable to determine the format of a Representation, cannot continue...")
if content_type.startswith("image"):
# we don't want what's likely thumbnails for the seekbar
continue
if content_type == "application":
# possibly application/mp4 which could be mp4-boxed subtitles
if mime_type == "application/mp4" or content_type == "application":
# likely mp4-boxed subtitles
# TODO: It may not actually be subtitles
try:
Subtitle.Codec.from_mime(codecs)
real_codec = Subtitle.Codec.from_mime(codecs)
content_type = "text"
mime_type = f"application/mp4; codecs='{real_codec.value.lower()}'"
except ValueError:
raise ValueError(f"Unsupported content type '{content_type}' with codecs of '{codecs}'")
if content_type == "text":
mime = adaptation_set.get("mimeType")
if mime and not mime.endswith("/mp4"):
codecs = mime.split("/")[1]
if content_type == "text" and mime_type and "/mp4" not in mime_type:
# mimeType likely specifies the subtitle codec better than `codecs`
codecs = mime_type.split("/")[1]
joc = next((
x.get("value")
for x in supplements
if x.get("schemeIdUri") == "tag:dolby.com,2018:dash:EC3_ExtensionComplexityIndex:2018"
), None)
if content_type == "video":
track_type = Video
track_codec = Video.Codec.from_codecs(codecs)
track_fps = get("frameRate")
if not track_fps and segment_base is not None:
track_fps = segment_base.get("timescale")
track_lang = DASH.get_language(rep.get("lang"), adaptation_set.get("lang"), language)
if not track_lang:
raise ValueError(
"One or more Tracks had no Language information. "
"The provided fallback language is not valid or is `None` or `und`."
track_args = dict(
range_=self.get_video_range(
codecs,
findall("SupplementalProperty"),
findall("EssentialProperty")
),
bitrate=get("bandwidth") or None,
width=get("width") or 0,
height=get("height") or 0,
fps=track_fps or None
)
elif content_type == "audio":
track_type = Audio
track_codec = Audio.Codec.from_codecs(codecs)
track_args = dict(
bitrate=get("bandwidth") or None,
channels=next(iter(
rep.xpath("AudioChannelConfiguration/@value")
or adaptation_set.xpath("AudioChannelConfiguration/@value")
), None),
joc=self.get_ddp_complexity_index(adaptation_set, rep),
descriptive=self.is_descriptive(adaptation_set)
)
elif content_type == "text":
track_type = Subtitle
track_codec = Subtitle.Codec.from_codecs(codecs or "vtt")
track_args = dict(
cc=self.is_closed_caption(adaptation_set),
sdh=self.is_sdh(adaptation_set),
forced=self.is_forced(adaptation_set)
)
elif content_type == "image":
# we don't want what's likely thumbnails for the seekbar
continue
else:
raise ValueError(f"Unknown Track Type '{content_type}'")
drm = DASH.get_drm(rep.findall("ContentProtection") + adaptation_set.findall("ContentProtection"))
# from here we need to calculate the Segment Template and compute a final list of URLs
segment_urls = DASH.get_segment_urls(
representation=rep,
period_duration=period.get("duration") or self.manifest.get("mediaPresentationDuration"),
fallback_segment_template=adaptation_set.find("SegmentTemplate"),
fallback_base_url=period_base_url,
fallback_query=urlparse(self.url).query
)
track_lang = self.get_language(adaptation_set, rep, fallback=language)
if not track_lang:
msg = "Language information could not be derived from a Representation."
if language is None:
msg += " No fallback language was provided when calling DASH.to_tracks()."
elif not tag_is_valid((str(language) or "").strip()) or str(language).startswith("und"):
msg += f" The fallback language provided is also invalid: {language}"
raise ValueError(msg)
# for some reason it's incredibly common for services to not provide
# a good and actually unique track ID, sometimes because of the lang
# dialect not being represented in the id, or the bitrate, or such.
# this combines all of them as one and hashes it to keep it small(ish).
track_id = md5("{codec}-{lang}-{bitrate}-{base_url}-{extra}".format(
track_id = hex(crc32("{codec}-{lang}-{bitrate}-{base_url}-{ids}-{track_args}".format(
codec=codecs,
lang=track_lang,
bitrate=rep.get("bandwidth") or 0, # subs may not state bandwidth
bitrate=get("bitrate"),
base_url=(rep.findtext("BaseURL") or "").split("?")[0],
extra=(adaptation_set.get("audioTrackId") or "") + (rep.get("id") or "") +
(period.get("id") or "")
).encode()).hexdigest()
if content_type == "video":
track_type = Video
track_codec = Video.Codec.from_codecs(codecs)
elif content_type == "audio":
track_type = Audio
track_codec = Audio.Codec.from_codecs(codecs)
elif content_type == "text":
track_type = Subtitle
track_codec = Subtitle.Codec.from_codecs(codecs or "vtt")
else:
raise ValueError(f"Unknown Track Type '{content_type}'")
ids=[get("audioTrackId"), get("id"), period.get("id")],
track_args=track_args
).encode()))[2:]
tracks.add(track_type(
id_=track_id,
url=segment_urls,
url=self.url,
codec=track_codec,
language=track_lang,
is_original_lang=not track_lang or not language or is_close_match(track_lang, [language]),
descriptor=Video.Descriptor.MPD,
extra=(rep, adaptation_set),
# video track args
**(dict(
range_=(
Video.Range.DV
if codecs.startswith(("dva1", "dvav", "dvhe", "dvh1")) else
Video.Range.from_cicp(
primaries=next((
int(x.get("value"))
for x in (
adaptation_set.findall("SupplementalProperty")
+ adaptation_set.findall("EssentialProperty")
)
if x.get("schemeIdUri") == "urn:mpeg:mpegB:cicp:ColourPrimaries"
), 0),
transfer=next((
int(x.get("value"))
for x in (
adaptation_set.findall("SupplementalProperty")
+ adaptation_set.findall("EssentialProperty")
)
if x.get("schemeIdUri") == "urn:mpeg:mpegB:cicp:TransferCharacteristics"
), 0),
matrix=next((
int(x.get("value"))
for x in (
adaptation_set.findall("SupplementalProperty")
+ adaptation_set.findall("EssentialProperty")
)
if x.get("schemeIdUri") == "urn:mpeg:mpegB:cicp:MatrixCoefficients"
), 0)
)
),
bitrate=rep.get("bandwidth"),
width=int(rep.get("width") or 0) or adaptation_set.get("width"),
height=int(rep.get("height") or 0) or adaptation_set.get("height"),
fps=(
rep.get("frameRate") or
adaptation_set.get("frameRate") or
FPS.parse(rep.find("SegmentBase").get("timescale"))
),
drm=drm
) if track_type is Video else dict(
bitrate=rep.get("bandwidth"),
channels=next(iter(
rep.xpath("AudioChannelConfiguration/@value")
or adaptation_set.xpath("AudioChannelConfiguration/@value")
), None),
joc=joc,
descriptive=descriptive,
drm=drm
) if track_type is Audio else dict(
forced=forced,
cc=cc
) if track_type is Subtitle else {})
is_original_lang=language and is_close_match(track_lang, [language]),
descriptor=Video.Descriptor.DASH,
data={
"dash": {
"manifest": self.manifest,
"period": period,
"adaptation_set": adaptation_set,
"representation": rep
}
},
**track_args
))
# only get tracks from the first main-content period
@ -272,7 +229,398 @@ class DASH:
return tracks
@staticmethod
def get_language(*options: Any) -> Optional[Language]:
def download_track(
track: AnyTrack,
save_path: Path,
save_dir: Path,
progress: partial,
session: Optional[Session] = None,
proxy: Optional[str] = None,
max_workers: Optional[int] = None,
license_widevine: Optional[Callable] = None
):
if not session:
session = Session()
elif not isinstance(session, Session):
raise TypeError(f"Expected session to be a {Session}, not {session!r}")
if proxy:
session.proxies.update({
"all": proxy
})
log = logging.getLogger("DASH")
manifest: ElementTree = track.data["dash"]["manifest"]
period: Element = track.data["dash"]["period"]
adaptation_set: Element = track.data["dash"]["adaptation_set"]
representation: Element = track.data["dash"]["representation"]
track.drm = DASH.get_drm(
representation.findall("ContentProtection") +
adaptation_set.findall("ContentProtection")
)
manifest_base_url = manifest.findtext("BaseURL")
if not manifest_base_url:
manifest_base_url = track.url
elif not re.match("^https?://", manifest_base_url, re.IGNORECASE):
manifest_base_url = urljoin(track.url, f"./{manifest_base_url}")
period_base_url = urljoin(manifest_base_url, period.findtext("BaseURL"))
rep_base_url = urljoin(period_base_url, representation.findtext("BaseURL"))
period_duration = period.get("duration") or manifest.get("mediaPresentationDuration")
init_data: Optional[bytes] = None
segment_template = representation.find("SegmentTemplate")
if segment_template is None:
segment_template = adaptation_set.find("SegmentTemplate")
segment_list = representation.find("SegmentList")
if segment_list is None:
segment_list = adaptation_set.find("SegmentList")
segment_base = representation.find("SegmentBase")
if segment_base is None:
segment_base = adaptation_set.find("SegmentBase")
segments: list[tuple[str, Optional[str]]] = []
segment_timescale: float = 0
segment_durations: list[int] = []
track_kid: Optional[UUID] = None
if segment_template is not None:
segment_template = copy(segment_template)
start_number = int(segment_template.get("startNumber") or 1)
end_number = int(segment_template.get("endNumber") or 0) or None
segment_timeline = segment_template.find("SegmentTimeline")
segment_timescale = float(segment_template.get("timescale") or 1)
for item in ("initialization", "media"):
value = segment_template.get(item)
if not value:
continue
if not re.match("^https?://", value, re.IGNORECASE):
if not rep_base_url:
raise ValueError("Resolved Segment URL is not absolute, and no Base URL is available.")
value = urljoin(rep_base_url, value)
if not urlparse(value).query:
manifest_url_query = urlparse(track.url).query
if manifest_url_query:
value += f"?{manifest_url_query}"
segment_template.set(item, value)
init_url = segment_template.get("initialization")
if init_url:
res = session.get(DASH.replace_fields(
init_url,
Bandwidth=representation.get("bandwidth"),
RepresentationID=representation.get("id")
))
res.raise_for_status()
init_data = res.content
track_kid = track.get_key_id(init_data)
if segment_timeline is not None:
current_time = 0
for s in segment_timeline.findall("S"):
if s.get("t"):
current_time = int(s.get("t"))
for _ in range(1 + (int(s.get("r") or 0))):
segment_durations.append(current_time)
current_time += int(s.get("d"))
if not end_number:
end_number = len(segment_durations)
for t, n in zip(segment_durations, range(start_number, end_number + 1)):
segments.append((
DASH.replace_fields(
segment_template.get("media"),
Bandwidth=representation.get("bandwidth"),
Number=n,
RepresentationID=representation.get("id"),
Time=t
), None
))
else:
if not period_duration:
raise ValueError("Duration of the Period was unable to be determined.")
period_duration = DASH.pt_to_sec(period_duration)
segment_duration = float(segment_template.get("duration")) or 1
if not end_number:
end_number = math.ceil(period_duration / (segment_duration / segment_timescale))
for s in range(start_number, end_number + 1):
segments.append((
DASH.replace_fields(
segment_template.get("media"),
Bandwidth=representation.get("bandwidth"),
Number=s,
RepresentationID=representation.get("id"),
Time=s
), None
))
# TODO: Should we floor/ceil/round, or is int() ok?
segment_durations.append(int(segment_duration))
elif segment_list is not None:
segment_timescale = float(segment_list.get("timescale") or 1)
init_data = None
initialization = segment_list.find("Initialization")
if initialization is not None:
source_url = initialization.get("sourceURL")
if not source_url:
source_url = rep_base_url
elif not re.match("^https?://", source_url, re.IGNORECASE):
source_url = urljoin(rep_base_url, f"./{source_url}")
if initialization.get("range"):
init_range_header = {"Range": f"bytes={initialization.get('range')}"}
else:
init_range_header = None
res = session.get(url=source_url, headers=init_range_header)
res.raise_for_status()
init_data = res.content
track_kid = track.get_key_id(init_data)
segment_urls = segment_list.findall("SegmentURL")
for segment_url in segment_urls:
media_url = segment_url.get("media")
if not media_url:
media_url = rep_base_url
elif not re.match("^https?://", media_url, re.IGNORECASE):
media_url = urljoin(rep_base_url, f"./{media_url}")
segments.append((
media_url,
segment_url.get("mediaRange")
))
segment_durations.append(int(segment_url.get("duration") or 1))
elif segment_base is not None:
media_range = None
init_data = None
initialization = segment_base.find("Initialization")
if initialization is not None:
if initialization.get("range"):
init_range_header = {"Range": f"bytes={initialization.get('range')}"}
else:
init_range_header = None
res = session.get(url=rep_base_url, headers=init_range_header)
res.raise_for_status()
init_data = res.content
track_kid = track.get_key_id(init_data)
total_size = res.headers.get("Content-Range", "").split("/")[-1]
if total_size:
media_range = f"{len(init_data)}-{total_size}"
segments.append((
rep_base_url,
media_range
))
elif rep_base_url:
segments.append((
rep_base_url,
None
))
else:
log.error("Could not find a way to get segments from this MPD manifest.")
log.debug(track.url)
sys.exit(1)
# TODO: Should we floor/ceil/round, or is int() ok?
track.data["dash"]["timescale"] = int(segment_timescale)
track.data["dash"]["segment_durations"] = segment_durations
if not track.drm and isinstance(track, (Video, Audio)):
try:
track.drm = [Widevine.from_init_data(init_data)]
except Widevine.Exceptions.PSSHNotFound:
# it might not have Widevine DRM, or might not have found the PSSH
log.warning("No Widevine PSSH was found for this track, is it DRM free?")
if track.drm:
# last chance to find the KID, assumes first segment will hold the init data
track_kid = track_kid or track.get_key_id(url=segments[0][0], session=session)
# TODO: What if we don't want to use the first DRM system?
drm = track.drm[0]
if isinstance(drm, Widevine):
# license and grab content keys
try:
if not license_widevine:
raise ValueError("license_widevine func must be supplied to use Widevine DRM")
progress(downloaded="LICENSING")
license_widevine(drm, track_kid=track_kid)
progress(downloaded="[yellow]LICENSED")
except Exception: # noqa
DOWNLOAD_CANCELLED.set() # skip pending track downloads
progress(downloaded="[red]FAILED")
raise
else:
drm = None
if DOWNLOAD_LICENCE_ONLY.is_set():
progress(downloaded="[yellow]SKIPPED")
return
progress(total=len(segments))
downloader = track.downloader
if downloader.__name__ == "aria2c" and any(bytes_range is not None for url, bytes_range in segments):
# aria2(c) is shit and doesn't support the Range header, fallback to the requests downloader
downloader = requests_downloader
log.warning("Falling back to the requests downloader as aria2(c) doesn't support the Range header")
for status_update in downloader(
urls=[
{
"url": url,
"headers": {
"Range": f"bytes={bytes_range}"
} if bytes_range else {}
}
for url, bytes_range in segments
],
output_dir=save_dir,
filename="{i:0%d}.mp4" % (len(str(len(segments)))),
headers=session.headers,
cookies=session.cookies,
proxy=proxy,
max_workers=max_workers
):
file_downloaded = status_update.get("file_downloaded")
if file_downloaded:
events.emit(events.Types.SEGMENT_DOWNLOADED, track=track, segment=file_downloaded)
else:
downloaded = status_update.get("downloaded")
if downloaded and downloaded.endswith("/s"):
status_update["downloaded"] = f"DASH {downloaded}"
progress(**status_update)
# see https://github.com/devine-dl/devine/issues/71
for control_file in save_dir.glob("*.aria2__temp"):
control_file.unlink()
segments_to_merge = [
x
for x in sorted(save_dir.iterdir())
if x.is_file()
]
with open(save_path, "wb") as f:
if init_data:
f.write(init_data)
if len(segments_to_merge) > 1:
progress(downloaded="Merging", completed=0, total=len(segments_to_merge))
for segment_file in segments_to_merge:
segment_data = segment_file.read_bytes()
# TODO: fix encoding after decryption?
if (
not drm and isinstance(track, Subtitle) and
track.codec not in (Subtitle.Codec.fVTT, Subtitle.Codec.fTTML)
):
segment_data = try_ensure_utf8(segment_data)
segment_data = segment_data.decode("utf8"). \
replace("&lrm;", html.unescape("&lrm;")). \
replace("&rlm;", html.unescape("&rlm;")). \
encode("utf8")
f.write(segment_data)
f.flush()
segment_file.unlink()
progress(advance=1)
track.path = save_path
events.emit(events.Types.TRACK_DOWNLOADED, track=track)
if drm:
progress(downloaded="Decrypting", completed=0, total=100)
drm.decrypt(save_path)
track.drm = None
events.emit(
events.Types.TRACK_DECRYPTED,
track=track,
drm=drm,
segment=None
)
progress(downloaded="Decrypting", advance=100)
save_dir.rmdir()
progress(downloaded="Downloaded")
@staticmethod
def _get(
item: str,
adaptation_set: Element,
representation: Optional[Element] = None
) -> Optional[Any]:
"""Helper to get a requested item from the Representation, otherwise from the AdaptationSet."""
adaptation_set_item = adaptation_set.get(item)
if representation is None:
return adaptation_set_item
representation_item = representation.get(item)
if representation_item is not None:
return representation_item
return adaptation_set_item
@staticmethod
def _findall(
item: str,
adaptation_set: Element,
representation: Optional[Element] = None,
both: bool = False
) -> list[Any]:
"""
Helper to get all requested items from the Representation, otherwise from the AdaptationSet.
Optionally, you may pass both=True to keep both values (where available).
"""
adaptation_set_items = adaptation_set.findall(item)
if representation is None:
return adaptation_set_items
representation_items = representation.findall(item)
if both:
return representation_items + adaptation_set_items
if representation_items:
return representation_items
return adaptation_set_items
@staticmethod
def get_language(
adaptation_set: Element,
representation: Optional[Element] = None,
fallback: Optional[Union[str, Language]] = None
) -> Optional[Language]:
"""
Get Language (if any) from the AdaptationSet or Representation.
A fallback language may be provided if no language information could be
retrieved.
"""
options = []
if representation is not None:
options.append(representation.get("lang"))
# derive language from somewhat common id string format
# the format is typically "{rep_id}_{lang}={bitrate}" or similar
rep_id = representation.get("id")
if rep_id:
m = re.match(r"\w+_(\w+)=\d+", rep_id)
if m:
options.append(m.group(1))
options.append(adaptation_set.get("lang"))
if fallback:
options.append(fallback)
for option in options:
option = (str(option) or "").strip()
if not tag_is_valid(option) or option.startswith("und"):
@ -280,8 +628,92 @@ class DASH:
return Language.get(option)
@staticmethod
def get_drm(protections) -> Optional[list[Widevine]]:
def get_video_range(
codecs: str,
all_supplemental_props: list[Element],
all_essential_props: list[Element]
) -> Video.Range:
if codecs.startswith(("dva1", "dvav", "dvhe", "dvh1")):
return Video.Range.DV
return Video.Range.from_cicp(
primaries=next((
int(x.get("value"))
for x in all_supplemental_props + all_essential_props
if x.get("schemeIdUri") == "urn:mpeg:mpegB:cicp:ColourPrimaries"
), 0),
transfer=next((
int(x.get("value"))
for x in all_supplemental_props + all_essential_props
if x.get("schemeIdUri") == "urn:mpeg:mpegB:cicp:TransferCharacteristics"
), 0),
matrix=next((
int(x.get("value"))
for x in all_supplemental_props + all_essential_props
if x.get("schemeIdUri") == "urn:mpeg:mpegB:cicp:MatrixCoefficients"
), 0)
)
@staticmethod
def is_trick_mode(adaptation_set: Element) -> bool:
"""Check if contents of Adaptation Set is a Trick-Mode stream."""
essential_props = adaptation_set.findall("EssentialProperty")
supplemental_props = adaptation_set.findall("SupplementalProperty")
return any(
prop.get("schemeIdUri") == "http://dashif.org/guidelines/trickmode"
for prop in essential_props + supplemental_props
)
@staticmethod
def is_descriptive(adaptation_set: Element) -> bool:
"""Check if contents of Adaptation Set is Descriptive."""
return any(
(x.get("schemeIdUri"), x.get("value")) in (
("urn:mpeg:dash:role:2011", "descriptive"),
("urn:tva:metadata:cs:AudioPurposeCS:2007", "1")
)
for x in adaptation_set.findall("Accessibility")
)
@staticmethod
def is_forced(adaptation_set: Element) -> bool:
"""Check if contents of Adaptation Set is a Forced Subtitle."""
return any(
x.get("schemeIdUri") == "urn:mpeg:dash:role:2011"
and x.get("value") in ("forced-subtitle", "forced_subtitle")
for x in adaptation_set.findall("Role")
)
@staticmethod
def is_sdh(adaptation_set: Element) -> bool:
"""Check if contents of Adaptation Set is for the Hearing Impaired."""
return any(
(x.get("schemeIdUri"), x.get("value")) == ("urn:tva:metadata:cs:AudioPurposeCS:2007", "2")
for x in adaptation_set.findall("Accessibility")
)
@staticmethod
def is_closed_caption(adaptation_set: Element) -> bool:
"""Check if contents of Adaptation Set is a Closed Caption Subtitle."""
return any(
(x.get("schemeIdUri"), x.get("value")) == ("urn:mpeg:dash:role:2011", "caption")
for x in adaptation_set.findall("Role")
)
@staticmethod
def get_ddp_complexity_index(adaptation_set: Element, representation: Optional[Element]) -> Optional[int]:
"""Get the DD+ Complexity Index (if any) from the AdaptationSet or Representation."""
return next((
int(x.get("value"))
for x in DASH._findall("SupplementalProperty", adaptation_set, representation, both=True)
if x.get("schemeIdUri") == "tag:dolby.com,2018:dash:EC3_ExtensionComplexityIndex:2018"
), None)
@staticmethod
def get_drm(protections: list[Element]) -> list[Widevine]:
drm = []
for protection in protections:
# TODO: Add checks for PlayReady, FairPlay, maybe more
urn = (protection.get("schemeIdUri") or "").lower()
@ -314,9 +746,6 @@ class DASH:
kid=kid
))
if not drm:
drm = None
return drm
@staticmethod
@ -345,88 +774,5 @@ class DASH:
url = url.replace(m.group(), f"{value:{m.group(1)}}")
return url
@staticmethod
def get_segment_urls(
representation,
period_duration: str,
fallback_segment_template,
fallback_base_url: Optional[str] = None,
fallback_query: Optional[str] = None
) -> list[str]:
segment_urls: list[str] = []
segment_template = representation.find("SegmentTemplate") or fallback_segment_template
base_url = representation.findtext("BaseURL") or fallback_base_url
if segment_template is None:
# We could implement SegmentBase, but it's basically a list of Byte Range's to download
# So just return the Base URL as a segment, why give the downloader extra effort
return [urljoin(fallback_base_url, base_url)]
segment_template = copy(segment_template)
start_number = int(segment_template.get("startNumber") or 1)
segment_timeline = segment_template.find("SegmentTimeline")
for item in ("initialization", "media"):
value = segment_template.get(item)
if not value:
continue
if not re.match("^https?://", value, re.IGNORECASE):
if not base_url:
raise ValueError("Resolved Segment URL is not absolute, and no Base URL is available.")
value = urljoin(base_url, value)
if not urlparse(value).query and fallback_query:
value += f"?{fallback_query}"
segment_template.set(item, value)
initialization = segment_template.get("initialization")
if initialization:
segment_urls.append(DASH.replace_fields(
initialization,
Bandwidth=representation.get("bandwidth"),
RepresentationID=representation.get("id")
))
if segment_timeline is not None:
seg_time_list = []
current_time = 0
for s in segment_timeline.findall("S"):
if s.get("t"):
current_time = int(s.get("t"))
for _ in range(1 + (int(s.get("r") or 0))):
seg_time_list.append(current_time)
current_time += int(s.get("d"))
seg_num_list = list(range(start_number, len(seg_time_list) + start_number))
segment_urls += [
DASH.replace_fields(
segment_template.get("media"),
Bandwidth=representation.get("bandwidth"),
Number=n,
RepresentationID=representation.get("id"),
Time=t
)
for t, n in zip(seg_time_list, seg_num_list)
]
else:
if not period_duration:
raise ValueError("Duration of the Period was unable to be determined.")
period_duration = DASH.pt_to_sec(period_duration)
segment_duration = (
float(segment_template.get("duration")) / float(segment_template.get("timescale") or 1)
)
total_segments = math.ceil(period_duration / segment_duration)
segment_urls += [
DASH.replace_fields(
segment_template.get("media"),
Bandwidth=representation.get("bandwidth"),
Number=s,
RepresentationID=representation.get("id"),
Time=s
)
for s in range(start_number, start_number + total_segments)
]
return segment_urls
__ALL__ = (DASH,)
__all__ = ("DASH",)

View File

@ -1,20 +1,31 @@
from __future__ import annotations
import re
from hashlib import md5
from typing import Union, Any, Optional
import html
import logging
import shutil
import subprocess
import sys
from functools import partial
from pathlib import Path
from typing import Any, Callable, Optional, Union
from urllib.parse import urljoin
from zlib import crc32
import m3u8
import requests
from langcodes import Language
from langcodes import Language, tag_is_valid
from m3u8 import M3U8
from pywidevine.cdm import Cdm as WidevineCdm
from pywidevine.pssh import PSSH
from requests import Session
from devine.core.drm import ClearKey, Widevine, DRM_T
from devine.core.tracks import Tracks, Video, Audio, Subtitle
from devine.core.utilities import is_close_match
from devine.core import binaries
from devine.core.constants import DOWNLOAD_CANCELLED, DOWNLOAD_LICENCE_ONLY, AnyTrack
from devine.core.downloaders import requests as requests_downloader
from devine.core.drm import DRM_T, ClearKey, Widevine
from devine.core.events import events
from devine.core.tracks import Audio, Subtitle, Tracks, Video
from devine.core.utilities import get_extension, is_close_match, try_ensure_utf8
class HLS:
@ -68,122 +79,112 @@ class HLS:
return cls(master)
def to_tracks(self, language: Union[str, Language], **args: Any) -> Tracks:
def to_tracks(self, language: Union[str, Language]) -> Tracks:
"""
Convert a Variant Playlist M3U(8) document to Video, Audio and Subtitle Track objects.
Parameters:
language: Language you expect the Primary Track to be in.
args: You may pass any arbitrary named header to be passed to all requests made within
this method.
All Track objects' URL will be to another M3U(8) document. However, these documents
will be Invariant Playlists and contain the list of segments URIs among other metadata.
"""
session_drm = HLS.get_drm(self.manifest.session_keys)
session_drm = HLS.get_all_drm(self.manifest.session_keys)
audio_codecs_by_group_id: dict[str, Audio.Codec] = {}
tracks = Tracks()
for playlist in self.manifest.playlists:
url = playlist.uri
if not re.match("^https?://", url):
url = playlist.base_uri + url
audio_group = playlist.stream_info.audio
if audio_group:
audio_codec = Audio.Codec.from_codecs(playlist.stream_info.codecs)
audio_codecs_by_group_id[audio_group] = audio_codec
if session_drm:
drm = session_drm
else:
# keys may be in the invariant playlist instead, annoying...
res = self.session.get(url, **args)
if not res.ok:
raise requests.ConnectionError(
"Failed to request an invariant M3U(8) document.",
response=res
)
invariant_playlist = m3u8.loads(res.text, url)
drm = HLS.get_drm(invariant_playlist.keys)
try:
# TODO: Any better way to figure out the primary track type?
Video.Codec.from_codecs(playlist.stream_info.codecs)
if playlist.stream_info.codecs:
Video.Codec.from_codecs(playlist.stream_info.codecs)
except ValueError:
primary_track_type = Audio
else:
primary_track_type = Video
tracks.add(primary_track_type(
id_=md5(str(playlist).encode()).hexdigest()[0:7], # 7 chars only for filename length
url=url,
codec=primary_track_type.Codec.from_codecs(playlist.stream_info.codecs),
id_=hex(crc32(str(playlist).encode()))[2:],
url=urljoin(playlist.base_uri, playlist.uri),
codec=(
primary_track_type.Codec.from_codecs(playlist.stream_info.codecs)
if playlist.stream_info.codecs else None
),
language=language, # HLS manifests do not seem to have language info
is_original_lang=True, # TODO: All we can do is assume Yes
bitrate=playlist.stream_info.average_bandwidth or playlist.stream_info.bandwidth,
descriptor=Video.Descriptor.M3U,
drm=drm,
extra=playlist,
descriptor=Video.Descriptor.HLS,
drm=session_drm,
data={
"hls": {
"playlist": playlist
}
},
# video track args
**(dict(
range_=Video.Range.DV if any(
codec.split(".")[0] in ("dva1", "dvav", "dvhe", "dvh1")
for codec in playlist.stream_info.codecs.lower().split(",")
for codec in (playlist.stream_info.codecs or "").lower().split(",")
) else Video.Range.from_m3u_range_tag(playlist.stream_info.video_range),
width=playlist.stream_info.resolution[0],
height=playlist.stream_info.resolution[1],
width=playlist.stream_info.resolution[0] if playlist.stream_info.resolution else None,
height=playlist.stream_info.resolution[1] if playlist.stream_info.resolution else None,
fps=playlist.stream_info.frame_rate
) if primary_track_type is Video else {})
))
for media in self.manifest.media:
url = media.uri
if not url:
if not media.uri:
continue
if not re.match("^https?://", url):
url = media.base_uri + url
if media.type == "AUDIO":
if session_drm:
drm = session_drm
else:
# keys may be in the invariant playlist instead, annoying...
res = self.session.get(url, **args)
if not res.ok:
raise requests.ConnectionError(
"Failed to request an invariant M3U(8) document.",
response=res
)
invariant_playlist = m3u8.loads(res.text, url)
drm = HLS.get_drm(invariant_playlist.keys)
else:
drm = None
joc = 0
if media.type == "AUDIO":
track_type = Audio
codec = audio_codecs_by_group_id.get(media.group_id)
if media.channels and media.channels.endswith("/JOC"):
joc = int(media.channels.split("/JOC")[0])
media.channels = "5.1"
else:
track_type = Subtitle
codec = Subtitle.Codec.WebVTT # assuming WebVTT, codec info isn't shown
track_lang = next((
Language.get(option)
for x in (media.language, language)
for option in [(str(x) or "").strip()]
if tag_is_valid(option) and not option.startswith("und")
), None)
if not track_lang:
msg = "Language information could not be derived for a media."
if language is None:
msg += " No fallback language was provided when calling HLS.to_tracks()."
elif not tag_is_valid((str(language) or "").strip()) or str(language).startswith("und"):
msg += f" The fallback language provided is also invalid: {language}"
raise ValueError(msg)
tracks.add(track_type(
id_=md5(str(media).encode()).hexdigest()[0:6], # 6 chars only for filename length
url=url,
id_=hex(crc32(str(media).encode()))[2:],
url=urljoin(media.base_uri, media.uri),
codec=codec,
language=media.language or language, # HLS media may not have language info, fallback if needed
is_original_lang=language and is_close_match(media.language, [language]),
descriptor=Audio.Descriptor.M3U,
drm=drm,
extra=media,
language=track_lang, # HLS media may not have language info, fallback if needed
is_original_lang=language and is_close_match(track_lang, [language]),
descriptor=Audio.Descriptor.HLS,
drm=session_drm if media.type == "AUDIO" else None,
data={
"hls": {
"media": media
}
},
# audio track args
**(dict(
bitrate=0, # TODO: M3U doesn't seem to state bitrate?
channels=media.channels,
joc=joc,
descriptive="public.accessibility.describes-video" in (media.characteristics or ""),
) if track_type is Audio else dict(
forced=media.forced == "YES",
@ -194,24 +195,531 @@ class HLS:
return tracks
@staticmethod
def get_drm(keys: list[Union[m3u8.model.SessionKey, m3u8.model.Key]]) -> list[DRM_T]:
drm = []
def download_track(
track: AnyTrack,
save_path: Path,
save_dir: Path,
progress: partial,
session: Optional[Session] = None,
proxy: Optional[str] = None,
max_workers: Optional[int] = None,
license_widevine: Optional[Callable] = None
) -> None:
if not session:
session = Session()
elif not isinstance(session, Session):
raise TypeError(f"Expected session to be a {Session}, not {session!r}")
if proxy:
session.proxies.update({
"all": proxy
})
log = logging.getLogger("HLS")
master = m3u8.loads(
# should be an invariant m3u8 playlist URI
session.get(track.url).text,
uri=track.url
)
if not master.segments:
log.error("Track's HLS playlist has no segments, expecting an invariant M3U8 playlist.")
sys.exit(1)
if track.drm:
# TODO: What if we don't want to use the first DRM system?
session_drm = track.drm[0]
if isinstance(session_drm, Widevine):
# license and grab content keys
try:
if not license_widevine:
raise ValueError("license_widevine func must be supplied to use Widevine DRM")
progress(downloaded="LICENSING")
license_widevine(session_drm)
progress(downloaded="[yellow]LICENSED")
except Exception: # noqa
DOWNLOAD_CANCELLED.set() # skip pending track downloads
progress(downloaded="[red]FAILED")
raise
else:
session_drm = None
unwanted_segments = [
segment for segment in master.segments
if callable(track.OnSegmentFilter) and track.OnSegmentFilter(segment)
]
total_segments = len(master.segments) - len(unwanted_segments)
progress(total=total_segments)
downloader = track.downloader
if (
downloader.__name__ == "aria2c" and
any(x.byterange for x in master.segments if x not in unwanted_segments)
):
downloader = requests_downloader
log.warning("Falling back to the requests downloader as aria2(c) doesn't support the Range header")
urls: list[dict[str, Any]] = []
segment_durations: list[int] = []
range_offset = 0
for segment in master.segments:
if segment in unwanted_segments:
continue
segment_durations.append(int(segment.duration))
if segment.byterange:
byte_range = HLS.calculate_byte_range(segment.byterange, range_offset)
range_offset = byte_range.split("-")[0]
else:
byte_range = None
urls.append({
"url": urljoin(segment.base_uri, segment.uri),
"headers": {
"Range": f"bytes={byte_range}"
} if byte_range else {}
})
track.data["hls"]["segment_durations"] = segment_durations
segment_save_dir = save_dir / "segments"
for status_update in downloader(
urls=urls,
output_dir=segment_save_dir,
filename="{i:0%d}{ext}" % len(str(len(urls))),
headers=session.headers,
cookies=session.cookies,
proxy=proxy,
max_workers=max_workers
):
file_downloaded = status_update.get("file_downloaded")
if file_downloaded:
events.emit(events.Types.SEGMENT_DOWNLOADED, track=track, segment=file_downloaded)
else:
downloaded = status_update.get("downloaded")
if downloaded and downloaded.endswith("/s"):
status_update["downloaded"] = f"HLS {downloaded}"
progress(**status_update)
# see https://github.com/devine-dl/devine/issues/71
for control_file in segment_save_dir.glob("*.aria2__temp"):
control_file.unlink()
progress(total=total_segments, completed=0, downloaded="Merging")
name_len = len(str(total_segments))
discon_i = 0
range_offset = 0
map_data: Optional[tuple[m3u8.model.InitializationSection, bytes]] = None
if session_drm:
encryption_data: Optional[tuple[Optional[m3u8.Key], DRM_T]] = (None, session_drm)
else:
encryption_data: Optional[tuple[Optional[m3u8.Key], DRM_T]] = None
i = -1
for real_i, segment in enumerate(master.segments):
if segment not in unwanted_segments:
i += 1
is_last_segment = (real_i + 1) == len(master.segments)
def merge(to: Path, via: list[Path], delete: bool = False, include_map_data: bool = False):
"""
Merge all files to a given path, optionally including map data.
Parameters:
to: The output file with all merged data.
via: List of files to merge, in sequence.
delete: Delete the file once it's been merged.
include_map_data: Whether to include the init map data.
"""
with open(to, "wb") as x:
if include_map_data and map_data and map_data[1]:
x.write(map_data[1])
for file in via:
x.write(file.read_bytes())
x.flush()
if delete:
file.unlink()
def decrypt(include_this_segment: bool) -> Path:
"""
Decrypt all segments that uses the currently set DRM.
All segments that will be decrypted with this DRM will be merged together
in sequence, prefixed with the init data (if any), and then deleted. Once
merged they will be decrypted. The merged and decrypted file names state
the range of segments that were used.
Parameters:
include_this_segment: Whether to include the current segment in the
list of segments to merge and decrypt. This should be False if
decrypting on EXT-X-KEY changes, or True when decrypting on the
last segment.
Returns the decrypted path.
"""
drm = encryption_data[1]
first_segment_i = next(
int(file.stem)
for file in sorted(segment_save_dir.iterdir())
if file.stem.isdigit()
)
last_segment_i = max(0, i - int(not include_this_segment))
range_len = (last_segment_i - first_segment_i) + 1
segment_range = f"{str(first_segment_i).zfill(name_len)}-{str(last_segment_i).zfill(name_len)}"
merged_path = segment_save_dir / f"{segment_range}{get_extension(master.segments[last_segment_i].uri)}"
decrypted_path = segment_save_dir / f"{merged_path.stem}_decrypted{merged_path.suffix}"
files = [
file
for file in sorted(segment_save_dir.iterdir())
if file.stem.isdigit() and first_segment_i <= int(file.stem) <= last_segment_i
]
if not files:
raise ValueError(f"None of the segment files for {segment_range} exist...")
elif len(files) != range_len:
raise ValueError(f"Missing {range_len - len(files)} segment files for {segment_range}...")
if isinstance(drm, Widevine):
# with widevine we can merge all segments and decrypt once
merge(
to=merged_path,
via=files,
delete=True,
include_map_data=True
)
drm.decrypt(merged_path)
merged_path.rename(decrypted_path)
else:
# with other drm we must decrypt separately and then merge them
# for aes this is because each segment likely has 16-byte padding
for file in files:
drm.decrypt(file)
merge(
to=merged_path,
via=files,
delete=True,
include_map_data=True
)
events.emit(
events.Types.TRACK_DECRYPTED,
track=track,
drm=drm,
segment=decrypted_path
)
return decrypted_path
def merge_discontinuity(include_this_segment: bool, include_map_data: bool = True):
"""
Merge all segments of the discontinuity.
All segment files for this discontinuity must already be downloaded and
already decrypted (if it needs to be decrypted).
Parameters:
include_this_segment: Whether to include the current segment in the
list of segments to merge and decrypt. This should be False if
decrypting on EXT-X-KEY changes, or True when decrypting on the
last segment.
include_map_data: Whether to prepend the init map data before the
segment files when merging.
"""
last_segment_i = max(0, i - int(not include_this_segment))
files = [
file
for file in sorted(segment_save_dir.iterdir())
if int(file.stem.replace("_decrypted", "").split("-")[-1]) <= last_segment_i
]
if files:
to_dir = segment_save_dir.parent
to_path = to_dir / f"{str(discon_i).zfill(name_len)}{files[-1].suffix}"
merge(
to=to_path,
via=files,
delete=True,
include_map_data=include_map_data
)
if segment not in unwanted_segments:
if isinstance(track, Subtitle):
segment_file_ext = get_extension(segment.uri)
segment_file_path = segment_save_dir / f"{str(i).zfill(name_len)}{segment_file_ext}"
segment_data = try_ensure_utf8(segment_file_path.read_bytes())
if track.codec not in (Subtitle.Codec.fVTT, Subtitle.Codec.fTTML):
segment_data = segment_data.decode("utf8"). \
replace("&lrm;", html.unescape("&lrm;")). \
replace("&rlm;", html.unescape("&rlm;")). \
encode("utf8")
segment_file_path.write_bytes(segment_data)
if segment.discontinuity and i != 0:
if encryption_data:
decrypt(include_this_segment=False)
merge_discontinuity(
include_this_segment=False,
include_map_data=not encryption_data or not encryption_data[1]
)
discon_i += 1
range_offset = 0 # TODO: Should this be reset or not?
map_data = None
if encryption_data:
encryption_data = (encryption_data[0], encryption_data[1])
if segment.init_section and (not map_data or segment.init_section != map_data[0]):
if segment.init_section.byterange:
init_byte_range = HLS.calculate_byte_range(
segment.init_section.byterange,
range_offset
)
range_offset = init_byte_range.split("-")[0]
init_range_header = {
"Range": f"bytes={init_byte_range}"
}
else:
init_range_header = {}
res = session.get(
url=urljoin(segment.init_section.base_uri, segment.init_section.uri),
headers=init_range_header
)
res.raise_for_status()
map_data = (segment.init_section, res.content)
if segment.keys:
key = HLS.get_supported_key(segment.keys)
if encryption_data and encryption_data[0] != key and i != 0 and segment not in unwanted_segments:
decrypt(include_this_segment=False)
if key is None:
encryption_data = None
elif not encryption_data or encryption_data[0] != key:
drm = HLS.get_drm(key, session)
if isinstance(drm, Widevine):
try:
if map_data:
track_kid = track.get_key_id(map_data[1])
else:
track_kid = None
progress(downloaded="LICENSING")
license_widevine(drm, track_kid=track_kid)
progress(downloaded="[yellow]LICENSED")
except Exception: # noqa
DOWNLOAD_CANCELLED.set() # skip pending track downloads
progress(downloaded="[red]FAILED")
raise
encryption_data = (key, drm)
# TODO: This wont work as we already downloaded
if DOWNLOAD_LICENCE_ONLY.is_set():
continue
if is_last_segment:
# required as it won't end with EXT-X-DISCONTINUITY nor a new key
if encryption_data:
decrypt(include_this_segment=True)
merge_discontinuity(
include_this_segment=True,
include_map_data=not encryption_data or not encryption_data[1]
)
progress(advance=1)
# TODO: Again still wont work, we've already downloaded
if DOWNLOAD_LICENCE_ONLY.is_set():
return
segment_save_dir.rmdir()
# finally merge all the discontinuity save files together to the final path
segments_to_merge = [
x
for x in sorted(save_dir.iterdir())
if x.is_file()
]
if len(segments_to_merge) == 1:
shutil.move(segments_to_merge[0], save_path)
else:
progress(downloaded="Merging")
if isinstance(track, (Video, Audio)):
HLS.merge_segments(
segments=segments_to_merge,
save_path=save_path
)
else:
with open(save_path, "wb") as f:
for discontinuity_file in segments_to_merge:
discontinuity_data = discontinuity_file.read_bytes()
f.write(discontinuity_data)
f.flush()
discontinuity_file.unlink()
save_dir.rmdir()
progress(downloaded="Downloaded")
track.path = save_path
events.emit(events.Types.TRACK_DOWNLOADED, track=track)
@staticmethod
def merge_segments(segments: list[Path], save_path: Path) -> int:
"""
Concatenate Segments by first demuxing with FFmpeg.
Returns the file size of the merged file.
"""
if not binaries.FFMPEG:
raise EnvironmentError("FFmpeg executable was not found but is required to merge HLS segments.")
demuxer_file = segments[0].parent / "ffmpeg_concat_demuxer.txt"
demuxer_file.write_text("\n".join([
f"file '{segment}'"
for segment in segments
]))
subprocess.check_call([
binaries.FFMPEG, "-hide_banner",
"-loglevel", "panic",
"-f", "concat",
"-safe", "0",
"-i", demuxer_file,
"-map", "0",
"-c", "copy",
save_path
])
demuxer_file.unlink()
for segment in segments:
segment.unlink()
return save_path.stat().st_size
@staticmethod
def get_supported_key(keys: list[Union[m3u8.model.SessionKey, m3u8.model.Key]]) -> Optional[m3u8.Key]:
"""
Get a support Key System from a list of Key systems.
Note that the key systems are chosen in an opinionated order.
Returns None if one of the key systems is method=NONE, which means all segments
from hence forth should be treated as plain text until another key system is
encountered, unless it's also method=NONE.
Raises NotImplementedError if none of the key systems are supported.
"""
if any(key.method == "NONE" for key in keys):
return None
unsupported_systems = []
for key in keys:
if not key:
continue
# TODO: Add checks for Merlin, FairPlay, PlayReady, maybe more.
if key.method.startswith("AES"):
drm.append(ClearKey.from_m3u_key(key))
# TODO: Add a way to specify which supported key system to use
# TODO: Add support for 'SAMPLE-AES', 'AES-CTR', 'AES-CBC', 'ClearKey'
elif key.method == "AES-128":
return key
elif key.method == "ISO-23001-7":
drm.append(Widevine(PSSH.new(key_ids=[key.uri.split(",")[-1]], system_id=PSSH.SystemId.Widevine)))
return key
elif key.keyformat and key.keyformat.lower() == WidevineCdm.urn:
drm.append(Widevine(
pssh=PSSH(key.uri.split(",")[-1]),
**key._extra_params # noqa
))
return key
else:
unsupported_systems.append(key.method + (f" ({key.keyformat})" if key.keyformat else ""))
else:
raise NotImplementedError(f"None of the key systems are supported: {', '.join(unsupported_systems)}")
@staticmethod
def get_drm(
key: Union[m3u8.model.SessionKey, m3u8.model.Key],
session: Optional[requests.Session] = None
) -> DRM_T:
"""
Convert HLS EXT-X-KEY data to an initialized DRM object.
Parameters:
key: m3u8 key system (EXT-X-KEY) object.
session: Optional session used to request AES-128 URIs.
Useful to set headers, proxies, cookies, and so forth.
Raises a NotImplementedError if the key system is not supported.
"""
if not isinstance(session, (Session, type(None))):
raise TypeError(f"Expected session to be a {Session}, not {type(session)}")
if not session:
session = Session()
# TODO: Add support for 'SAMPLE-AES', 'AES-CTR', 'AES-CBC', 'ClearKey'
if key.method == "AES-128":
drm = ClearKey.from_m3u_key(key, session)
elif key.method == "ISO-23001-7":
drm = Widevine(
pssh=PSSH.new(
key_ids=[key.uri.split(",")[-1]],
system_id=PSSH.SystemId.Widevine
)
)
elif key.keyformat and key.keyformat.lower() == WidevineCdm.urn:
drm = Widevine(
pssh=PSSH(key.uri.split(",")[-1]),
**key._extra_params # noqa
)
else:
raise NotImplementedError(f"The key system is not supported: {key}")
return drm
@staticmethod
def get_all_drm(
keys: list[Union[m3u8.model.SessionKey, m3u8.model.Key]],
proxy: Optional[str] = None
) -> list[DRM_T]:
"""
Convert HLS EXT-X-KEY data to initialized DRM objects.
__ALL__ = (HLS,)
Parameters:
keys: m3u8 key system (EXT-X-KEY) objects.
proxy: Optional proxy string used for requesting AES-128 URIs.
Raises a NotImplementedError if none of the key systems are supported.
"""
unsupported_keys: list[m3u8.Key] = []
drm_objects: list[DRM_T] = []
if any(key.method == "NONE" for key in keys):
return []
for key in keys:
try:
drm = HLS.get_drm(key, proxy)
drm_objects.append(drm)
except NotImplementedError:
unsupported_keys.append(key)
if not drm_objects and unsupported_keys:
raise NotImplementedError(f"None of the key systems are supported: {unsupported_keys}")
return drm_objects
@staticmethod
def calculate_byte_range(m3u_range: str, fallback_offset: int = 0) -> str:
"""
Convert a HLS EXT-X-BYTERANGE value to a more traditional range value.
E.g., '1433@0' -> '0-1432', '357392@1433' -> '1433-358824'.
"""
parts = [int(x) for x in m3u_range.split("@")]
if len(parts) != 2:
parts.append(fallback_offset)
length, offset = parts
return f"{offset}-{offset + length - 1}"
__all__ = ("HLS",)

View File

@ -1,3 +1,5 @@
from .basic import Basic
from .hola import Hola
from .nordvpn import NordVPN
__all__ = ("Basic", "Hola", "NordVPN")

View File

@ -1,13 +1,20 @@
import random
from typing import Optional
import re
from typing import Optional, Union
from requests.utils import prepend_scheme_if_needed
from urllib3.util import parse_url
from devine.core.proxies.proxy import Proxy
class Basic(Proxy):
def __init__(self, **countries):
def __init__(self, **countries: dict[str, Union[str, list[str]]]):
"""Basic Proxy Service using Proxies specified in the config."""
self.countries = countries
self.countries = {
k.lower(): v
for k, v in countries.items()
}
def __repr__(self) -> str:
countries = len(self.countries)
@ -17,14 +24,35 @@ class Basic(Proxy):
def get_proxy(self, query: str) -> Optional[str]:
"""Get a proxy URI from the config."""
servers = self.countries.get(query)
query = query.lower()
match = re.match(r"^([a-z]{2})(\d+)?$", query, re.IGNORECASE)
if not match:
raise ValueError(f"The query \"{query}\" was not recognized...")
country_code = match.group(1)
entry = match.group(2)
servers: Optional[Union[str, list[str]]] = self.countries.get(country_code)
if not servers:
return
return None
proxy = random.choice(servers)
if isinstance(servers, str):
proxy = servers
elif entry:
try:
proxy = servers[int(entry) - 1]
except IndexError:
raise ValueError(
f"There's only {len(servers)} prox{'y' if len(servers) == 1 else 'ies'} "
f"for \"{country_code}\"..."
)
else:
proxy = random.choice(servers)
if "://" not in proxy:
# TODO: Improve the test for a valid URI
proxy = prepend_scheme_if_needed(proxy, "http")
parsed_proxy = parse_url(proxy)
if not parsed_proxy.host:
raise ValueError(f"The proxy '{proxy}' is not a valid proxy URI supported by Python-Requests.")
return proxy

View File

@ -3,8 +3,8 @@ import re
import subprocess
from typing import Optional
from devine.core import binaries
from devine.core.proxies.proxy import Proxy
from devine.core.utilities import get_binary_path
class Hola(Proxy):
@ -13,7 +13,7 @@ class Hola(Proxy):
Proxy Service using Hola's direct connections via the hola-proxy project.
https://github.com/Snawoot/hola-proxy
"""
self.binary = get_binary_path("hola-proxy")
self.binary = binaries.HolaProxy
if not self.binary:
raise EnvironmentError("hola-proxy executable not found but is required for the Hola proxy provider.")

View File

@ -0,0 +1,44 @@
from typing import Optional, Union
class SearchResult:
def __init__(
self,
id_: Union[str, int],
title: str,
description: Optional[str] = None,
label: Optional[str] = None,
url: Optional[str] = None
):
"""
A Search Result for any support Title Type.
Parameters:
id_: The search result's Title ID.
title: The primary display text, e.g., the Title's Name.
description: The secondary display text, e.g., the Title's Description or
further title information.
label: The tertiary display text. This will typically be used to display
an informative label or tag to the result. E.g., "unavailable", the
title's price tag, region, etc.
url: A hyperlink to the search result or title's page.
"""
if not isinstance(id_, (str, int)):
raise TypeError(f"Expected id_ to be a {str} or {int}, not {type(id_)}")
if not isinstance(title, str):
raise TypeError(f"Expected title to be a {str}, not {type(title)}")
if not isinstance(description, (str, type(None))):
raise TypeError(f"Expected description to be a {str}, not {type(description)}")
if not isinstance(label, (str, type(None))):
raise TypeError(f"Expected label to be a {str}, not {type(label)}")
if not isinstance(url, (str, type(None))):
raise TypeError(f"Expected url to be a {str}, not {type(url)}")
self.id = id_
self.title = title
self.description = description
self.label = label
self.url = url
__all__ = ("SearchResult",)

View File

@ -1,23 +1,29 @@
from __future__ import annotations
import base64
import logging
from abc import ABCMeta, abstractmethod
from http.cookiejar import MozillaCookieJar, CookieJar
from collections.abc import Generator
from http.cookiejar import CookieJar
from pathlib import Path
from typing import Optional, Union
from urllib.parse import urlparse
import click
import m3u8
import requests
from requests.adapters import Retry, HTTPAdapter
from requests.adapters import HTTPAdapter, Retry
from rich.padding import Padding
from rich.rule import Rule
from devine.core.config import config
from devine.core.constants import AnyTrack
from devine.core.titles import Titles_T, Title_T
from devine.core.tracks import Chapter, Tracks
from devine.core.utilities import get_ip_info
from devine.core.cacher import Cacher
from devine.core.config import config
from devine.core.console import console
from devine.core.constants import AnyTrack
from devine.core.credential import Credential
from devine.core.drm import DRM_T
from devine.core.search_result import SearchResult
from devine.core.titles import Title_T, Titles_T
from devine.core.tracks import Chapters, Tracks
from devine.core.utilities import get_ip_info
class Service(metaclass=ABCMeta):
@ -28,44 +34,60 @@ class Service(metaclass=ABCMeta):
GEOFENCE: tuple[str, ...] = () # list of ip regions required to use the service. empty list == no specific region.
def __init__(self, ctx: click.Context):
console.print(Padding(
Rule(f"[rule.text]Service: {self.__class__.__name__}"),
(1, 2)
))
self.config = ctx.obj.config
assert ctx.parent is not None
assert ctx.parent.parent is not None
self.log = logging.getLogger(self.__class__.__name__)
self.session = self.get_session()
self.cache = Cacher(self.__class__.__name__)
self.proxy = ctx.parent.params["proxy"]
if not self.proxy and self.GEOFENCE:
# no explicit proxy, let's get one to GEOFENCE if needed
current_region = get_ip_info(self.session)["country"].lower()
if not any([x.lower() == current_region for x in self.GEOFENCE]):
requested_proxy = self.GEOFENCE[0] # first is likely main region
self.log.info(f"Current IP region is blocked by the service, getting Proxy to {requested_proxy}")
# current region is not in any of the service's supported regions
for proxy_provider in ctx.obj.proxy_providers:
self.proxy = proxy_provider.get_proxy(requested_proxy)
if self.proxy:
self.log.info(f" + {self.proxy} (from {proxy_provider.__class__.__name__})")
break
if self.proxy:
self.session.proxies.update({"all": self.proxy})
proxy_parse = urlparse(self.proxy)
if proxy_parse.username and proxy_parse.password:
self.session.headers.update({
"Proxy-Authorization": base64.b64encode(
f"{proxy_parse.username}:{proxy_parse.password}".encode("utf8")
).decode()
})
if not ctx.parent or not ctx.parent.params.get("no_proxy"):
if ctx.parent:
proxy = ctx.parent.params["proxy"]
else:
proxy = None
if not proxy:
# don't override the explicit proxy set by the user, even if they may be geoblocked
with console.status("Checking if current region is Geoblocked...", spinner="dots"):
if self.GEOFENCE:
# no explicit proxy, let's get one to GEOFENCE if needed
current_region = get_ip_info(self.session)["country"].lower()
if any(x.lower() == current_region for x in self.GEOFENCE):
self.log.info("Service is not Geoblocked in your region")
else:
requested_proxy = self.GEOFENCE[0] # first is likely main region
self.log.info(f"Service is Geoblocked in your region, getting a Proxy to {requested_proxy}")
for proxy_provider in ctx.obj.proxy_providers:
proxy = proxy_provider.get_proxy(requested_proxy)
if proxy:
self.log.info(f"Got Proxy from {proxy_provider.__class__.__name__}")
break
else:
self.log.info("Service has no Geofence")
if proxy:
self.session.proxies.update({"all": proxy})
proxy_parse = urlparse(proxy)
if proxy_parse.username and proxy_parse.password:
self.session.headers.update({
"Proxy-Authorization": base64.b64encode(
f"{proxy_parse.username}:{proxy_parse.password}".encode("utf8")
).decode()
})
# Optional Abstract functions
# The following functions may be implemented by the Service.
# Otherwise, the base service code (if any) of the function will be executed on call.
# The functions will be executed in shown order.
def get_session(self) -> requests.Session:
@staticmethod
def get_session() -> requests.Session:
"""
Creates a Python-requests Session, adds common headers
from config, cookies, retry handler, and a proxy if available.
@ -78,12 +100,13 @@ class Service(metaclass=ABCMeta):
total=15,
backoff_factor=0.2,
status_forcelist=[429, 500, 502, 503, 504]
)
),
pool_block=True
))
session.mount("http://", session.adapters["https://"])
return session
def authenticate(self, cookies: Optional[MozillaCookieJar] = None, credential: Optional[Credential] = None) -> None:
def authenticate(self, cookies: Optional[CookieJar] = None, credential: Optional[Credential] = None) -> None:
"""
Authenticate the Service with Cookies and/or Credentials (Email/Username and Password).
@ -99,10 +122,22 @@ class Service(metaclass=ABCMeta):
"""
if cookies is not None:
if not isinstance(cookies, CookieJar):
raise TypeError(f"Expected cookies to be a {MozillaCookieJar}, not {cookies!r}.")
raise TypeError(f"Expected cookies to be a {CookieJar}, not {cookies!r}.")
self.session.cookies.update(cookies)
def get_widevine_service_certificate(self, *, challenge: bytes, title: Title_T, track: AnyTrack) -> Union[bytes, str]:
def search(self) -> Generator[SearchResult, None, None]:
"""
Search by query for titles from the Service.
The query must be taken as a CLI argument by the Service class.
Ideally just re-use the title ID argument (i.e. self.title).
Search results will be displayed in the order yielded.
"""
raise NotImplementedError(f"Search functionality has not been implemented by {self.__class__.__name__}")
def get_widevine_service_certificate(self, *, challenge: bytes, title: Title_T, track: AnyTrack) \
-> Union[bytes, str]:
"""
Get the Widevine Service Certificate used for Privacy Mode.
@ -185,25 +220,71 @@ class Service(metaclass=ABCMeta):
"""
@abstractmethod
def get_chapters(self, title: Title_T) -> list[Chapter]:
def get_chapters(self, title: Title_T) -> Chapters:
"""
Get Chapter objects of the Title.
Get Chapters for the Title.
Return a list of Chapter objects. This will be run after get_tracks. If there's anything
from the get_tracks that may be needed, e.g. "device_id" or a-like, store it in the class
via `self` and re-use the value in get_chapters.
Parameters:
title: The current Title from `get_titles` that is being processed.
How it's used is generally the same as get_titles. These are only separated as to reduce
function complexity and keep them focused on simple tasks.
You must return a Chapters object containing 0 or more Chapter objects.
You do not need to sort or order the chapters in any way. However, you do need to filter
and alter them as needed by the service. No modification is made after get_chapters is
ran. So that means ensure that the Chapter objects returned have consistent Chapter Titles
and Chapter Numbers.
You do not need to set a Chapter number or sort/order the chapters in any way as
the Chapters class automatically handles all of that for you. If there's no
descriptive name for a Chapter then do not set a name at all.
:param title: The current `Title` from get_titles that is being executed.
:return: List of Chapter objects, if available, empty list otherwise.
You must not set Chapter names to "Chapter {n}" or such. If you (or the user)
wants "Chapter {n}" style Chapter names (or similar) then they can use the config
option `chapter_fallback_name`. For example, `"Chapter {i:02}"` for "Chapter 01".
"""
# Optional Event methods
__ALL__ = (Service,)
def on_segment_downloaded(self, track: AnyTrack, segment: Path) -> None:
"""
Called when one of a Track's Segments has finished downloading.
Parameters:
track: The Track object that had a Segment downloaded.
segment: The Path to the Segment that was downloaded.
"""
def on_track_downloaded(self, track: AnyTrack) -> None:
"""
Called when a Track has finished downloading.
Parameters:
track: The Track object that was downloaded.
"""
def on_track_decrypted(self, track: AnyTrack, drm: DRM_T, segment: Optional[m3u8.Segment] = None) -> None:
"""
Called when a Track has finished decrypting.
Parameters:
track: The Track object that was decrypted.
drm: The DRM object it decrypted with.
segment: The HLS segment information that was decrypted.
"""
def on_track_repacked(self, track: AnyTrack) -> None:
"""
Called when a Track has finished repacking.
Parameters:
track: The Track object that was repacked.
"""
def on_track_multiplex(self, track: AnyTrack) -> None:
"""
Called when a Track is about to be Multiplexed into a Container.
Note: Right now only MKV containers are multiplexed but in the future
this may also be called when multiplexing to other containers like
MP4 via ffmpeg/mp4box.
Parameters:
track: The Track object that was repacked.
"""
__all__ = ("Service",)

View File

@ -1,5 +1,3 @@
from __future__ import annotations
from pathlib import Path
import click
@ -39,7 +37,15 @@ class Services(click.MultiCommand):
def get_command(self, ctx: click.Context, name: str) -> click.Command:
"""Load the Service and return the Click CLI method."""
tag = Services.get_tag(name)
service = Services.load(tag)
try:
service = Services.load(tag)
except KeyError as e:
available_services = self.list_commands(ctx)
if not available_services:
raise click.ClickException(
f"There are no Services added yet, therefore the '{name}' Service could not be found."
)
raise click.ClickException(f"{e}. Available Services: {', '.join(available_services)}")
if hasattr(service, "cli"):
return service.cli
@ -60,7 +66,7 @@ class Services(click.MultiCommand):
for service in _SERVICES:
if service.parent.stem == tag:
return service.parent
raise click.ClickException(f"Unable to find service by the name '{name}'")
raise KeyError(f"There is no Service added by the Tag '{name}'")
@staticmethod
def get_tag(value: str) -> str:
@ -82,8 +88,8 @@ class Services(click.MultiCommand):
"""Load a Service module by Service tag."""
module = _MODULES.get(tag)
if not module:
raise click.ClickException(f"Unable to find Service by the tag '{tag}'")
raise KeyError(f"There is no Service added by the Tag '{tag}'")
return module
__ALL__ = (Services,)
__all__ = ("Services",)

View File

@ -2,8 +2,10 @@ from typing import Union
from .episode import Episode, Series
from .movie import Movie, Movies
from .song import Song, Album
from .song import Album, Song
Title_T = Union[Movie, Episode, Song]
Titles_T = Union[Movies, Series, Album]
__all__ = ("Episode", "Series", "Movie", "Movies", "Album", "Song", "Title_T", "Titles_T")

View File

@ -1,10 +1,11 @@
import re
from abc import ABC
from collections import Counter
from typing import Any, Optional, Union, Iterable
from typing import Any, Iterable, Optional, Union
from langcodes import Language
from pymediainfo import MediaInfo
from rich.tree import Tree
from sortedcontainers import SortedKeyList
from devine.core.config import config
@ -177,19 +178,33 @@ class Series(SortedKeyList, ABC):
def __str__(self) -> str:
if not self:
return super().__str__()
return self[0].title + (f" ({self[0].year})" if self[0].year else "")
lines = [
f"Series: {self[0].title} ({self[0].year or '?'})",
f"Episodes: ({len(self)})",
*[
f"├─ S{season:02}: {episodes} episodes"
for season, episodes in Counter(x.season for x in self).items()
]
]
last_line = lines.pop(-1)
lines.append(last_line.replace("", ""))
def tree(self, verbose: bool = False) -> Tree:
seasons = Counter(x.season for x in self)
num_seasons = len(seasons)
num_episodes = sum(seasons.values())
tree = Tree(
f"{num_seasons} Season{['s', ''][num_seasons == 1]}, {num_episodes} Episode{['s', ''][num_episodes == 1]}",
guide_style="bright_black"
)
if verbose:
for season, episodes in seasons.items():
season_tree = tree.add(
f"[bold]Season {str(season).zfill(len(str(num_seasons)))}[/]: [bright_black]{episodes} episodes",
guide_style="bright_black"
)
for episode in self:
if episode.season == season:
if episode.name:
season_tree.add(
f"[bold]{str(episode.number).zfill(len(str(episodes)))}.[/] "
f"[bright_black]{episode.name}"
)
else:
season_tree.add(f"[bright_black]Episode {str(episode.number).zfill(len(str(episodes)))}")
return "\n".join(lines)
return tree
__ALL__ = (Episode, Series)
__all__ = ("Episode", "Series")

View File

@ -1,8 +1,9 @@
from abc import ABC
from typing import Any, Optional, Union, Iterable
from typing import Any, Iterable, Optional, Union
from langcodes import Language
from pymediainfo import MediaInfo
from rich.tree import Tree
from sortedcontainers import SortedKeyList
from devine.core.config import config
@ -133,23 +134,23 @@ class Movies(SortedKeyList, ABC):
def __str__(self) -> str:
if not self:
return super().__str__()
# TODO: Assumes there's only one movie
return self[0].name + (f" ({self[0].year})" if self[0].year else "")
if len(self) > 1:
lines = [
f"Movies: ({len(self)})",
*[
f"├─ {movie.name} ({movie.year or '?'})"
for movie in self
]
]
last_line = lines.pop(-1)
lines.append(last_line.replace("", ""))
else:
lines = [
f"Movie: {self[0].name} ({self[0].year or '?'})"
]
def tree(self, verbose: bool = False) -> Tree:
num_movies = len(self)
tree = Tree(
f"{num_movies} Movie{['s', ''][num_movies == 1]}",
guide_style="bright_black"
)
if verbose:
for movie in self:
tree.add(
f"[bold]{movie.name}[/] [bright_black]({movie.year or '?'})",
guide_style="bright_black"
)
return "\n".join(lines)
return tree
__ALL__ = (Movie, Movies)
__all__ = ("Movie", "Movies")

View File

@ -1,8 +1,9 @@
from abc import ABC
from typing import Any, Optional, Union, Iterable
from typing import Any, Iterable, Optional, Union
from langcodes import Language
from pymediainfo import MediaInfo
from rich.tree import Tree
from sortedcontainers import SortedKeyList
from devine.core.config import config
@ -129,20 +130,22 @@ class Album(SortedKeyList, ABC):
def __str__(self) -> str:
if not self:
return super().__str__()
return f"{self[0].artist} - {self[0].album} ({self[0].year or '?'})"
lines = [
f"Album: {self[0].album} ({self[0].year or '?'})",
f"Artist: {self[0].artist}",
f"Tracks: ({len(self)})",
*[
f"├─ {song.track:02}. {song.name}"
for song in self
]
]
last_line = lines.pop(-1)
lines.append(last_line.replace("", ""))
def tree(self, verbose: bool = False) -> Tree:
num_songs = len(self)
tree = Tree(
f"{num_songs} Song{['s', ''][num_songs == 1]}",
guide_style="bright_black"
)
if verbose:
for song in self:
tree.add(
f"[bold]Track {song.track:02}.[/] [bright_black]({song.name})",
guide_style="bright_black"
)
return "\n".join(lines)
return tree
__ALL__ = (Song, Album)
__all__ = ("Song", "Album")

View File

@ -1,7 +1,7 @@
from __future__ import annotations
from abc import abstractmethod
from typing import Optional, Union, Any
from typing import Any, Optional, Union
from langcodes import Language
from pymediainfo import MediaInfo
@ -69,4 +69,4 @@ class Title:
"""
__ALL__ = (Title,)
__all__ = ("Title",)

View File

@ -1,6 +1,9 @@
from .audio import Audio
from .track import Track
from .chapter import Chapter
from .chapters import Chapters
from .subtitle import Subtitle
from .track import Track
from .tracks import Tracks
from .video import Video
__all__ = ("Audio", "Chapter", "Chapters", "Subtitle", "Track", "Tracks", "Video")

View File

@ -0,0 +1,70 @@
from __future__ import annotations
import mimetypes
from pathlib import Path
from typing import Optional, Union
from zlib import crc32
class Attachment:
def __init__(
self,
path: Union[Path, str],
name: Optional[str] = None,
mime_type: Optional[str] = None,
description: Optional[str] = None
):
"""
Create a new Attachment.
If name is not provided it will use the file name (without extension).
If mime_type is not provided, it will try to guess it.
"""
if not isinstance(path, (str, Path)):
raise ValueError("The attachment path must be provided.")
if not isinstance(name, (str, type(None))):
raise ValueError("The attachment name must be provided.")
path = Path(path)
if not path.exists():
raise ValueError("The attachment file does not exist.")
name = (name or path.stem).strip()
mime_type = (mime_type or "").strip() or None
description = (description or "").strip() or None
if not mime_type:
mime_type = {
".ttf": "application/x-truetype-font",
".otf": "application/vnd.ms-opentype"
}.get(path.suffix.lower(), mimetypes.guess_type(path)[0])
if not mime_type:
raise ValueError("The attachment mime-type could not be automatically detected.")
self.path = path
self.name = name
self.mime_type = mime_type
self.description = description
def __repr__(self) -> str:
return "{name}({items})".format(
name=self.__class__.__name__,
items=", ".join([f"{k}={repr(v)}" for k, v in self.__dict__.items()])
)
def __str__(self) -> str:
return " | ".join(filter(bool, [
"ATT",
self.name,
self.mime_type,
self.description
]))
@property
def id(self) -> str:
"""Compute an ID from the attachment data."""
checksum = crc32(self.path.read_bytes())
return hex(checksum)
__all__ = ("Attachment",)

View File

@ -13,9 +13,10 @@ class Audio(Track):
AC3 = "DD" # https://wikipedia.org/wiki/Dolby_Digital
EC3 = "DD+" # https://wikipedia.org/wiki/Dolby_Digital_Plus
OPUS = "OPUS" # https://wikipedia.org/wiki/Opus_(audio_format)
OGG = "VORB" # https://wikipedia.org/wiki/Vorbis
DTS = "DTS" # https://en.wikipedia.org/wiki/DTS_(company)#DTS_Digital_Surround
OGG = "VORB" # https://wikipedia.org/wiki/Vorbis
DTS = "DTS" # https://en.wikipedia.org/wiki/DTS_(company)#DTS_Digital_Surround
ALAC = "ALAC" # https://en.wikipedia.org/wiki/Apple_Lossless_Audio_Codec
FLAC = "FLAC" # https://en.wikipedia.org/wiki/FLAC
@property
def extension(self) -> str:
@ -36,6 +37,8 @@ class Audio(Track):
return Audio.Codec.DTS
if mime == "alac":
return Audio.Codec.ALAC
if mime == "flac":
return Audio.Codec.FLAC
raise ValueError(f"The MIME '{mime}' is not a supported Audio Codec")
@staticmethod
@ -61,40 +64,102 @@ class Audio(Track):
return Audio.Codec.OGG
raise ValueError(f"The Content Profile '{profile}' is not a supported Audio Codec")
def __init__(self, *args: Any, codec: Audio.Codec, bitrate: Union[str, int, float],
channels: Optional[Union[str, int, float]] = None, joc: int = 0, descriptive: bool = False,
**kwargs: Any):
def __init__(
self,
*args: Any,
codec: Optional[Audio.Codec] = None,
bitrate: Optional[Union[str, int, float]] = None,
channels: Optional[Union[str, int, float]] = None,
joc: Optional[int] = None,
descriptive: Union[bool, int] = False,
**kwargs: Any
):
"""
Create a new Audio track object.
Parameters:
codec: An Audio.Codec enum representing the audio codec.
If not specified, MediaInfo will be used to retrieve the codec
once the track has been downloaded.
bitrate: A number or float representing the average bandwidth in bytes/s.
Float values are rounded up to the nearest integer.
channels: A number, float, or string representing the number of audio channels.
Strings may represent numbers or floats. Expanded layouts like 7.1.1 is
not supported. All numbers and strings will be cast to float.
joc: The number of Joint-Object-Coding Channels/Objects in the audio stream.
descriptive: Mark this audio as being descriptive audio for the blind.
Note: If codec, bitrate, channels, or joc is not specified some checks may be
skipped or assume a value. Specifying as much information as possible is highly
recommended.
"""
super().__init__(*args, **kwargs)
# required
if not isinstance(codec, (Audio.Codec, type(None))):
raise TypeError(f"Expected codec to be a {Audio.Codec}, not {codec!r}")
if not isinstance(bitrate, (str, int, float, type(None))):
raise TypeError(f"Expected bitrate to be a {str}, {int}, or {float}, not {bitrate!r}")
if not isinstance(channels, (str, int, float, type(None))):
raise TypeError(f"Expected channels to be a {str}, {int}, or {float}, not {channels!r}")
if not isinstance(joc, (int, type(None))):
raise TypeError(f"Expected joc to be a {int}, not {joc!r}")
if (
not isinstance(descriptive, (bool, int)) or
(isinstance(descriptive, int) and descriptive not in (0, 1))
):
raise TypeError(f"Expected descriptive to be a {bool} or bool-like {int}, not {descriptive!r}")
self.codec = codec
self.bitrate = int(math.ceil(float(bitrate))) if bitrate else None
self.channels = self.parse_channels(channels) if channels else None
# optional
try:
self.bitrate = int(math.ceil(float(bitrate))) if bitrate else None
except (ValueError, TypeError) as e:
raise ValueError(f"Expected bitrate to be a number or float, {e}")
try:
self.channels = self.parse_channels(channels) if channels else None
except (ValueError, NotImplementedError) as e:
raise ValueError(f"Expected channels to be a number, float, or a string, {e}")
self.joc = joc
self.descriptive = bool(descriptive)
def __str__(self) -> str:
return " | ".join(filter(bool, [
"AUD",
f"[{self.codec.value}]" if self.codec else None,
str(self.language),
", ".join(filter(bool, [
str(self.channels) if self.channels else None,
f"JOC {self.joc}" if self.joc else None,
])),
f"{self.bitrate // 1000} kb/s" if self.bitrate else None,
self.get_track_name(),
self.edition
]))
@staticmethod
def parse_channels(channels: Union[str, float]) -> str:
def parse_channels(channels: Union[str, int, float]) -> float:
"""
Converts a string to a float-like string which represents audio channels.
It does not handle values that are incorrect/out of bounds or e.g. 6.0->5.1, as that
isn't what this is intended for.
Converts a Channel string to a float representing audio channel count and layout.
E.g. "3" -> "3.0", "2.1" -> "2.1", ".1" -> "0.1".
This does not validate channel strings as genuine channel counts or valid layouts.
It does not convert the value to assume a sub speaker channel layout, e.g. 5.1->6.0.
It also does not support expanded surround sound channel layout strings like 7.1.2.
"""
# TODO: Support all possible DASH channel configurations (https://datatracker.ietf.org/doc/html/rfc8216)
if channels.upper() == "A000":
return "2.0"
if channels.upper() == "F801":
return "5.1"
if isinstance(channels, str):
# TODO: Support all possible DASH channel configurations (https://datatracker.ietf.org/doc/html/rfc8216)
if channels.upper() == "A000":
return 2.0
elif channels.upper() == "F801":
return 5.1
elif channels.replace("ch", "").replace(".", "", 1).isdigit():
# e.g., '2ch', '2', '2.0', '5.1ch', '5.1'
return float(channels.replace("ch", ""))
raise NotImplementedError(f"Unsupported Channels string value, '{channels}'")
if str(channels).isdigit():
# This is to avoid incorrectly transforming channels=6 to 6.0, for example
return f"{channels}ch"
try:
return str(float(channels))
except ValueError:
return str(channels)
return float(channels)
def get_track_name(self) -> Optional[str]:
"""Return the base Track Name."""
@ -106,16 +171,5 @@ class Audio(Track):
track_name += flag
return track_name or None
def __str__(self) -> str:
return " | ".join(filter(bool, [
"AUD",
f"[{self.codec.value}]",
(self.channels or "2.0?") + (f" (JOC {self.joc})" if self.joc else ""),
f"{self.bitrate // 1000 if self.bitrate else '?'} kb/s",
str(self.language),
self.get_track_name(),
self.edition
]))
__ALL__ = (Audio,)
__all__ = ("Audio",)

View File

@ -1,95 +1,82 @@
from __future__ import annotations
import re
from pathlib import Path
from typing import Optional, Union
from zlib import crc32
TIMESTAMP_FORMAT = re.compile(r"^(?P<hour>\d{2}):(?P<minute>\d{2}):(?P<second>\d{2})(?P<ms>.\d{3}|)$")
class Chapter:
line_1 = re.compile(r"^CHAPTER(?P<number>\d+)=(?P<timecode>[\d\\.]+)$")
line_2 = re.compile(r"^CHAPTER(?P<number>\d+)NAME=(?P<title>[\d\\.]+)$")
def __init__(self, timestamp: Union[str, int, float], name: Optional[str] = None):
"""
Create a new Chapter with a Timestamp and optional name.
def __init__(self, number: int, timecode: str, title: Optional[str] = None):
self.id = f"chapter-{number}"
self.number = number
self.timecode = timecode
self.title = title
The timestamp may be in the following formats:
- "HH:MM:SS" string, e.g., `25:05:23`.
- "HH:MM:SS.mss" string, e.g., `25:05:23.120`.
- a timecode integer in milliseconds, e.g., `90323120` is `25:05:23.120`.
- a timecode float in seconds, e.g., `90323.12` is `25:05:23.120`.
if "." not in self.timecode:
self.timecode += ".000"
If you have a timecode integer in seconds, just multiply it by 1000.
If you have a timecode float in milliseconds (no decimal value), just convert
it to an integer.
"""
if timestamp is None:
raise ValueError("The timestamp must be provided.")
def __bool__(self) -> bool:
return self.number and self.number >= 0 and self.timecode
if not isinstance(timestamp, (str, int, float)):
raise TypeError(f"Expected timestamp to be {str}, {int} or {float}, not {type(timestamp)}")
if not isinstance(name, (str, type(None))):
raise TypeError(f"Expected name to be {str}, not {type(name)}")
if not isinstance(timestamp, str):
if isinstance(timestamp, int): # ms
hours, remainder = divmod(timestamp, 1000 * 60 * 60)
minutes, remainder = divmod(remainder, 1000 * 60)
seconds, ms = divmod(remainder, 1000)
elif isinstance(timestamp, float): # seconds.ms
hours, remainder = divmod(timestamp, 60 * 60)
minutes, remainder = divmod(remainder, 60)
seconds, ms = divmod(int(remainder * 1000), 1000)
else:
raise TypeError
timestamp = f"{int(hours):02}:{int(minutes):02}:{int(seconds):02}.{str(ms).zfill(3)[:3]}"
timestamp_m = TIMESTAMP_FORMAT.match(timestamp)
if not timestamp_m:
raise ValueError(f"The timestamp format is invalid: {timestamp}")
hour, minute, second, ms = timestamp_m.groups()
if not ms:
timestamp += ".000"
self.timestamp = timestamp
self.name = name
def __repr__(self) -> str:
"""
OGM-based Simple Chapter Format intended for use with MKVToolNix.
This format is not officially part of the Matroska spec. This was a format
designed for OGM tools that MKVToolNix has since re-used. More Information:
https://mkvtoolnix.download/doc/mkvmerge.html#mkvmerge.chapters.simple
"""
return "CHAPTER{num}={time}\nCHAPTER{num}NAME={name}".format(
num=f"{self.number:02}",
time=self.timecode,
name=self.title or ""
return "{name}({items})".format(
name=self.__class__.__name__,
items=", ".join([f"{k}={repr(v)}" for k, v in self.__dict__.items()])
)
def __str__(self) -> str:
return " | ".join(filter(bool, [
"CHP",
f"[{self.number:02}]",
self.timecode,
self.title
self.timestamp,
self.name
]))
@property
def id(self) -> str:
"""Compute an ID from the Chapter data."""
checksum = crc32(str(self).encode("utf8"))
return hex(checksum)
@property
def named(self) -> bool:
"""Check if Chapter is named."""
return bool(self.title)
@classmethod
def loads(cls, data: str) -> Chapter:
"""Load chapter data from a string."""
lines = [x.strip() for x in data.strip().splitlines(keepends=False)]
if len(lines) > 2:
return cls.loads("\n".join(lines))
one, two = lines
one_m = cls.line_1.match(one)
two_m = cls.line_2.match(two)
if not one_m or not two_m:
raise SyntaxError(f"An unexpected syntax error near:\n{one}\n{two}")
one_str, timecode = one_m.groups()
two_str, title = two_m.groups()
one_num, two_num = int(one_str.lstrip("0")), int(two_str.lstrip("0"))
if one_num != two_num:
raise SyntaxError(f"The chapter numbers ({one_num},{two_num}) does not match.")
if not timecode:
raise SyntaxError("The timecode is missing.")
if not title:
title = None
return cls(number=one_num, timecode=timecode, title=title)
@classmethod
def load(cls, path: Union[Path, str]) -> Chapter:
"""Load chapter data from a file."""
if isinstance(path, str):
path = Path(path)
return cls.loads(path.read_text(encoding="utf8"))
def dumps(self) -> str:
"""Return chapter data as a string."""
return repr(self)
def dump(self, path: Union[Path, str]) -> int:
"""Write chapter data to a file."""
if isinstance(path, str):
path = Path(path)
return path.write_text(self.dumps(), encoding="utf8")
return bool(self.name)
__ALL__ = (Chapter,)
__all__ = ("Chapter",)

View File

@ -0,0 +1,156 @@
from __future__ import annotations
import re
from abc import ABC
from pathlib import Path
from typing import Any, Iterable, Optional, Union
from zlib import crc32
from sortedcontainers import SortedKeyList
from devine.core.tracks import Chapter
OGM_SIMPLE_LINE_1_FORMAT = re.compile(r"^CHAPTER(?P<number>\d+)=(?P<timestamp>\d{2,}:\d{2}:\d{2}\.\d{3})$")
OGM_SIMPLE_LINE_2_FORMAT = re.compile(r"^CHAPTER(?P<number>\d+)NAME=(?P<name>.*)$")
class Chapters(SortedKeyList, ABC):
def __init__(self, iterable: Optional[Iterable[Chapter]] = None):
super().__init__(key=lambda x: x.timestamp or 0)
for chapter in iterable or []:
self.add(chapter)
def __repr__(self) -> str:
return "{name}({items})".format(
name=self.__class__.__name__,
items=", ".join([f"{k}={repr(v)}" for k, v in self.__dict__.items()])
)
def __str__(self) -> str:
return "\n".join([
" | ".join(filter(bool, [
"CHP",
f"[{i:02}]",
chapter.timestamp,
chapter.name
]))
for i, chapter in enumerate(self, start=1)
])
@classmethod
def loads(cls, data: str) -> Chapters:
"""Load chapter data from a string."""
lines = [
line.strip()
for line in data.strip().splitlines(keepends=False)
]
if len(lines) % 2 != 0:
raise ValueError("The number of chapter lines must be even.")
chapters = []
for line_1, line_2 in zip(lines[::2], lines[1::2]):
line_1_match = OGM_SIMPLE_LINE_1_FORMAT.match(line_1)
if not line_1_match:
raise SyntaxError(f"An unexpected syntax error occurred on: {line_1}")
line_2_match = OGM_SIMPLE_LINE_2_FORMAT.match(line_2)
if not line_2_match:
raise SyntaxError(f"An unexpected syntax error occurred on: {line_2}")
line_1_number, timestamp = line_1_match.groups()
line_2_number, name = line_2_match.groups()
if line_1_number != line_2_number:
raise SyntaxError(
f"The chapter numbers {line_1_number} and {line_2_number} do not match on:\n{line_1}\n{line_2}")
if not timestamp:
raise SyntaxError(f"The timestamp is missing on: {line_1}")
chapters.append(Chapter(timestamp, name))
return cls(chapters)
@classmethod
def load(cls, path: Union[Path, str]) -> Chapters:
"""Load chapter data from a file."""
if isinstance(path, str):
path = Path(path)
return cls.loads(path.read_text(encoding="utf8"))
def dumps(self, fallback_name: str = "") -> str:
"""
Return chapter data in OGM-based Simple Chapter format.
https://mkvtoolnix.download/doc/mkvmerge.html#mkvmerge.chapters.simple
Parameters:
fallback_name: Name used for Chapters without a Name set.
The fallback name can use the following variables in f-string style:
- {i}: The Chapter number starting at 1.
E.g., `"Chapter {i}"`: "Chapter 1", "Intro", "Chapter 3".
- {j}: A number starting at 1 that increments any time a Chapter has no name.
E.g., `"Chapter {j}"`: "Chapter 1", "Intro", "Chapter 2".
These are formatted with f-strings, directives are supported.
For example, `"Chapter {i:02}"` will result in `"Chapter 01"`.
"""
chapters = []
j = 0
for i, chapter in enumerate(self, start=1):
if not chapter.name:
j += 1
chapters.append("CHAPTER{num}={time}\nCHAPTER{num}NAME={name}".format(
num=f"{i:02}",
time=chapter.timestamp,
name=chapter.name or fallback_name.format(
i=i,
j=j
)
))
return "\n".join(chapters)
def dump(self, path: Union[Path, str], *args: Any, **kwargs: Any) -> int:
"""
Write chapter data in OGM-based Simple Chapter format to a file.
Parameters:
path: The file path to write the Chapter data to, overwriting
any existing data.
See `Chapters.dumps` for more parameter documentation.
"""
if isinstance(path, str):
path = Path(path)
path.parent.mkdir(parents=True, exist_ok=True)
ogm_text = self.dumps(*args, **kwargs)
return path.write_text(ogm_text, encoding="utf8")
def add(self, value: Chapter) -> None:
if not isinstance(value, Chapter):
raise TypeError(f"Can only add {Chapter} objects, not {type(value)}")
if any(chapter.timestamp == value.timestamp for chapter in self):
raise ValueError(f"A Chapter with the Timestamp {value.timestamp} already exists")
super().add(value)
if not any(chapter.timestamp == "00:00:00.000" for chapter in self):
self.add(Chapter(0))
@property
def id(self) -> str:
"""Compute an ID from the Chapter data."""
checksum = crc32("\n".join([
chapter.id
for chapter in self
]).encode("utf8"))
return hex(checksum)
__all__ = ("Chapters", "Chapter")

View File

@ -1,21 +1,26 @@
from __future__ import annotations
import re
import subprocess
from collections import defaultdict
from enum import Enum
from functools import partial
from io import BytesIO
from pathlib import Path
from typing import Any, Iterable, Optional
from typing import Any, Callable, Iterable, Optional, Union
import pycaption
import requests
from construct import Container
from pycaption import Caption, CaptionList, CaptionNode, WebVTTReader
from pycaption.geometry import Layout
from pymp4.parser import MP4
from subtitle_filter import Subtitles
from devine.core import binaries
from devine.core.tracks.track import Track
from devine.core.utilities import get_binary_path
from devine.core.utilities import try_ensure_utf8
from devine.core.utils.webvtt import merge_segmented_webvtt
class Subtitle(Track):
@ -71,22 +76,22 @@ class Subtitle(Track):
return Subtitle.Codec.TimedTextMarkupLang
raise ValueError(f"The Content Profile '{profile}' is not a supported Subtitle Codec")
def __init__(self, *args: Any, codec: Subtitle.Codec, cc: bool = False, sdh: bool = False, forced: bool = False,
**kwargs: Any):
def __init__(
self,
*args: Any,
codec: Optional[Subtitle.Codec] = None,
cc: bool = False,
sdh: bool = False,
forced: bool = False,
**kwargs: Any
):
"""
Information on Subtitle Types:
https://bit.ly/2Oe4fLC (3PlayMedia Blog on SUB vs CC vs SDH).
However, I wouldn't pay much attention to the claims about SDH needing to
be in the original source language. It's logically not true.
CC == Closed Captions. Source: Basically every site.
SDH = Subtitles for the Deaf or Hard-of-Hearing. Source: Basically every site.
HOH = Exact same as SDH. Is a term used in the UK. Source: https://bit.ly/2PGJatz (ICO UK)
More in-depth information, examples, and stuff to look for can be found in the Parameter
explanation list below.
Create a new Subtitle track object.
Parameters:
codec: A Subtitle.Codec enum representing the subtitle format.
If not specified, MediaInfo will be used to retrieve the format
once the track has been downloaded.
cc: Closed Caption.
- Intended as if you couldn't hear the audio at all.
- Can have Sound as well as Dialogue, but doesn't have to.
@ -122,17 +127,57 @@ class Subtitle(Track):
no other way to reliably work with Forced subtitles where multiple
forced subtitles may be in the output file. Just know what to expect
with "forced" subtitles.
Note: If codec is not specified some checks may be skipped or assume a value.
Specifying as much information as possible is highly recommended.
Information on Subtitle Types:
https://bit.ly/2Oe4fLC (3PlayMedia Blog on SUB vs CC vs SDH).
However, I wouldn't pay much attention to the claims about SDH needing to
be in the original source language. It's logically not true.
CC == Closed Captions. Source: Basically every site.
SDH = Subtitles for the Deaf or Hard-of-Hearing. Source: Basically every site.
HOH = Exact same as SDH. Is a term used in the UK. Source: https://bit.ly/2PGJatz (ICO UK)
More in-depth information, examples, and stuff to look for can be found in the Parameter
explanation list above.
"""
super().__init__(*args, **kwargs)
if not isinstance(codec, (Subtitle.Codec, type(None))):
raise TypeError(f"Expected codec to be a {Subtitle.Codec}, not {codec!r}")
if not isinstance(cc, (bool, int)) or (isinstance(cc, int) and cc not in (0, 1)):
raise TypeError(f"Expected cc to be a {bool} or bool-like {int}, not {cc!r}")
if not isinstance(sdh, (bool, int)) or (isinstance(sdh, int) and sdh not in (0, 1)):
raise TypeError(f"Expected sdh to be a {bool} or bool-like {int}, not {sdh!r}")
if not isinstance(forced, (bool, int)) or (isinstance(forced, int) and forced not in (0, 1)):
raise TypeError(f"Expected forced to be a {bool} or bool-like {int}, not {forced!r}")
self.codec = codec
self.cc = bool(cc)
self.sdh = bool(sdh)
self.forced = bool(forced)
if self.cc and self.sdh:
raise ValueError("A text track cannot be both CC and SDH.")
self.forced = bool(forced)
if (self.cc or self.sdh) and self.forced:
if self.forced and (self.cc or self.sdh):
raise ValueError("A text track cannot be CC/SDH as well as Forced.")
# TODO: Migrate to new event observer system
# Called after Track has been converted to another format
self.OnConverted: Optional[Callable[[Subtitle.Codec], None]] = None
def __str__(self) -> str:
return " | ".join(filter(bool, [
"SUB",
f"[{self.codec.value}]" if self.codec else None,
str(self.language),
self.get_track_name()
]))
def get_track_name(self) -> Optional[str]:
"""Return the base Track Name."""
track_name = super().get_track_name() or ""
@ -143,42 +188,184 @@ class Subtitle(Track):
track_name += flag
return track_name or None
def download(
self,
session: requests.Session,
prepare_drm: partial,
max_workers: Optional[int] = None,
progress: Optional[partial] = None
):
super().download(session, prepare_drm, max_workers, progress)
if not self.path:
return
if self.codec == Subtitle.Codec.fTTML:
self.convert(Subtitle.Codec.TimedTextMarkupLang)
elif self.codec == Subtitle.Codec.fVTT:
self.convert(Subtitle.Codec.WebVTT)
elif self.codec == Subtitle.Codec.WebVTT:
text = self.path.read_text("utf8")
if self.descriptor == Track.Descriptor.DASH:
if len(self.data["dash"]["segment_durations"]) > 1:
text = merge_segmented_webvtt(
text,
segment_durations=self.data["dash"]["segment_durations"],
timescale=self.data["dash"]["timescale"]
)
elif self.descriptor == Track.Descriptor.HLS:
if len(self.data["hls"]["segment_durations"]) > 1:
text = merge_segmented_webvtt(
text,
segment_durations=self.data["hls"]["segment_durations"],
timescale=1 # ?
)
caption_set = pycaption.WebVTTReader().read(text)
Subtitle.merge_same_cues(caption_set)
subtitle_text = pycaption.WebVTTWriter().write(caption_set)
self.path.write_text(subtitle_text, encoding="utf8")
def convert(self, codec: Subtitle.Codec) -> Path:
"""
Convert this Subtitle to another Format.
The file path location of the Subtitle data will be kept at the same
location but the file extension will be changed appropriately.
Supported formats:
- SubRip - SubtitleEdit or pycaption.SRTWriter
- TimedTextMarkupLang - SubtitleEdit or pycaption.DFXPWriter
- WebVTT - SubtitleEdit or pycaption.WebVTTWriter
- SubStationAlphav4 - SubtitleEdit
- fTTML* - custom code using some pycaption functions
- fVTT* - custom code using some pycaption functions
*: Can read from format, but cannot convert to format
Note: It currently prioritizes using SubtitleEdit over PyCaption as
I have personally noticed more oddities with PyCaption parsing over
SubtitleEdit. Especially when working with TTML/DFXP where it would
often have timecodes and stuff mixed in/duplicated.
Returns the new file path of the Subtitle.
"""
if not self.path or not self.path.exists():
raise ValueError("You must download the subtitle track first.")
if self.codec == codec:
return self.path
output_path = self.path.with_suffix(f".{codec.value.lower()}")
if binaries.SubtitleEdit and self.codec not in (Subtitle.Codec.fTTML, Subtitle.Codec.fVTT):
sub_edit_format = {
Subtitle.Codec.SubStationAlphav4: "AdvancedSubStationAlpha",
Subtitle.Codec.TimedTextMarkupLang: "TimedText1.0"
}.get(codec, codec.name)
sub_edit_args = [
binaries.SubtitleEdit,
"/Convert", self.path, sub_edit_format,
f"/outputfilename:{output_path.name}",
"/encoding:utf8"
]
if codec == Subtitle.Codec.SubRip:
sub_edit_args.append("/ConvertColorsToDialog")
subprocess.run(
sub_edit_args,
check=True,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL
)
else:
writer = {
# pycaption generally only supports these subtitle formats
Subtitle.Codec.SubRip: pycaption.SRTWriter,
Subtitle.Codec.TimedTextMarkupLang: pycaption.DFXPWriter,
Subtitle.Codec.WebVTT: pycaption.WebVTTWriter,
}.get(codec)
if writer is None:
raise NotImplementedError(f"Cannot yet convert {self.codec.name} to {codec.name}.")
caption_set = self.parse(self.path.read_bytes(), self.codec)
Subtitle.merge_same_cues(caption_set)
subtitle_text = writer().write(caption_set)
output_path.write_text(subtitle_text, encoding="utf8")
self.path = output_path
self.codec = codec
if callable(self.OnConverted):
self.OnConverted(codec)
return output_path
@staticmethod
def parse(data: bytes, codec: Subtitle.Codec) -> pycaption.CaptionSet:
# TODO: Use an "enum" for subtitle codecs
if not isinstance(data, bytes):
raise ValueError(f"Subtitle data must be parsed as bytes data, not {type(data).__name__}")
try:
if codec == Subtitle.Codec.fTTML:
captions: dict[str, pycaption.CaptionList] = defaultdict(pycaption.CaptionList)
if codec == Subtitle.Codec.SubRip:
text = try_ensure_utf8(data).decode("utf8")
caption_set = pycaption.SRTReader().read(text)
elif codec == Subtitle.Codec.fTTML:
caption_lists: dict[str, pycaption.CaptionList] = defaultdict(pycaption.CaptionList)
for segment in (
Subtitle.parse(box.data, Subtitle.Codec.TimedTextMarkupLang)
for box in MP4.parse_stream(BytesIO(data))
if box.type == b"mdat"
):
for lang in segment.get_languages():
captions[lang].extend(segment.get_captions(lang))
captions: pycaption.CaptionSet = pycaption.CaptionSet(captions)
return captions
if codec == Subtitle.Codec.TimedTextMarkupLang:
text = data.decode("utf8").replace("tt:", "")
return pycaption.DFXPReader().read(text)
if codec == Subtitle.Codec.fVTT:
caption_lists[lang].extend(segment.get_captions(lang))
caption_set: pycaption.CaptionSet = pycaption.CaptionSet(caption_lists)
elif codec == Subtitle.Codec.TimedTextMarkupLang:
text = try_ensure_utf8(data).decode("utf8")
text = text.replace("tt:", "")
# negative size values aren't allowed in TTML/DFXP spec, replace with 0
text = re.sub(r'"(-\d+(\.\d+)?(px|em|%|c|pt))"', '"0"', text)
caption_set = pycaption.DFXPReader().read(text)
elif codec == Subtitle.Codec.fVTT:
caption_lists: dict[str, pycaption.CaptionList] = defaultdict(pycaption.CaptionList)
caption_list, language = Subtitle.merge_segmented_wvtt(data)
caption_lists[language] = caption_list
caption_set: pycaption.CaptionSet = pycaption.CaptionSet(caption_lists)
return caption_set
if codec == Subtitle.Codec.WebVTT:
text = data.decode("utf8").replace("\r", "").replace("\n\n\n", "\n \n\n").replace("\n\n<", "\n<")
captions: pycaption.CaptionSet = pycaption.WebVTTReader().read(text)
return captions
except pycaption.exceptions.CaptionReadSyntaxError:
raise SyntaxError(f"A syntax error has occurred when reading the \"{codec}\" subtitle")
elif codec == Subtitle.Codec.WebVTT:
text = Subtitle.space_webvtt_headers(data)
caption_set = pycaption.WebVTTReader().read(text)
else:
raise ValueError(f"Unknown Subtitle format \"{codec}\"...")
except pycaption.exceptions.CaptionReadSyntaxError as e:
raise SyntaxError(f"A syntax error has occurred when reading the \"{codec}\" subtitle: {e}")
except pycaption.exceptions.CaptionReadNoCaptions:
return pycaption.CaptionSet({"en": []})
raise ValueError(f"Unknown Subtitle Format \"{codec}\"...")
# remove empty caption lists or some code breaks, especially if it's the first list
for language in caption_set.get_languages():
if not caption_set.get_captions(language):
# noinspection PyProtectedMember
del caption_set._captions[language]
return caption_set
@staticmethod
def space_webvtt_headers(data: Union[str, bytes]):
"""
Space out the WEBVTT Headers from Captions.
Segmented VTT when merged may have the WEBVTT headers part of the next caption
as they were not separated far enough from the previous caption and ended up
being considered as caption text rather than the header for the next segment.
"""
if isinstance(data, bytes):
data = try_ensure_utf8(data).decode("utf8")
elif not isinstance(data, str):
raise ValueError(f"Expecting data to be a str, not {data!r}")
text = data.replace("WEBVTT", "\n\nWEBVTT").\
replace("\r", "").\
replace("\n\n\n", "\n \n\n").\
replace("\n\n<", "\n<")
return text
@staticmethod
def merge_same_cues(caption_set: pycaption.CaptionSet):
@ -307,17 +494,16 @@ class Subtitle(Track):
layout: Optional[Layout] = None
nodes: list[CaptionNode] = []
for cue_box in MP4.parse_stream(BytesIO(vttc_box.data)):
for cue_box in vttc_box.children:
if cue_box.type == b"vsid":
# this is a V(?) Source ID box, we don't care
continue
cue_data = cue_box.data.decode("utf8")
if cue_box.type == b"sttg":
layout = Layout(webvtt_positioning=cue_data)
layout = Layout(webvtt_positioning=cue_box.settings)
elif cue_box.type == b"payl":
nodes.extend([
node
for line in cue_data.split("\n")
for line in cue_box.cue_text.split("\n")
for node in [
CaptionNode.create_text(WebVTTReader()._decode(line)),
CaptionNode.create_break()
@ -349,18 +535,23 @@ class Subtitle(Track):
if not self.path or not self.path.exists():
raise ValueError("You must download the subtitle track first.")
executable = get_binary_path("SubtitleEdit")
if executable:
subprocess.run([
executable,
"/Convert", self.path, "srt",
"/overwrite",
"/RemoveTextForHI"
], check=True)
# Remove UTF-8 Byte Order Marks
self.path.write_text(
self.path.read_text(encoding="utf-8-sig"),
encoding="utf8"
if binaries.SubtitleEdit:
if self.codec == Subtitle.Codec.SubStationAlphav4:
output_format = "AdvancedSubStationAlpha"
elif self.codec == Subtitle.Codec.TimedTextMarkupLang:
output_format = "TimedText1.0"
else:
output_format = self.codec.name
subprocess.run(
[
binaries.SubtitleEdit,
"/Convert", self.path, output_format,
"/encoding:utf8",
"/overwrite",
"/RemoveTextForHI"
],
check=True,
stdout=subprocess.DEVNULL
)
else:
sub = Subtitles(self.path)
@ -374,26 +565,35 @@ class Subtitle(Track):
)
sub.save()
def download(self, *args, **kwargs) -> Path:
save_path = super().download(*args, **kwargs)
if self.codec not in (Subtitle.Codec.SubRip, Subtitle.Codec.SubStationAlphav4):
caption_set = self.parse(save_path.read_bytes(), self.codec)
self.merge_same_cues(caption_set)
srt = pycaption.SRTWriter().write(caption_set)
# NowTV sometimes has this, when it isn't, causing mux problems
srt = srt.replace("MULTI-LANGUAGE SRT\n", "")
save_path.write_text(srt, encoding="utf8")
self.codec = Subtitle.Codec.SubRip
self.move(self.path.with_suffix(".srt"))
return save_path
def reverse_rtl(self) -> None:
"""
Reverse RTL (Right to Left) Start/End on Captions.
This can be used to fix the positioning of sentence-ending characters.
"""
if not self.path or not self.path.exists():
raise ValueError("You must download the subtitle track first.")
def __str__(self) -> str:
return " | ".join(filter(bool, [
"SUB",
f"[{self.codec.value}]",
str(self.language),
self.get_track_name()
]))
if not binaries.SubtitleEdit:
raise EnvironmentError("SubtitleEdit executable not found...")
if self.codec == Subtitle.Codec.SubStationAlphav4:
output_format = "AdvancedSubStationAlpha"
elif self.codec == Subtitle.Codec.TimedTextMarkupLang:
output_format = "TimedText1.0"
else:
output_format = self.codec.name
subprocess.run(
[
binaries.SubtitleEdit,
"/Convert", self.path, output_format,
"/ReverseRtlStartEnd",
"/encoding:utf8",
"/overwrite"
],
check=True,
stdout=subprocess.DEVNULL
)
__ALL__ = (Subtitle,)
__all__ = ("Subtitle",)

View File

@ -1,67 +1,129 @@
import asyncio
import base64
import html
import logging
import re
import shutil
import subprocess
from collections import defaultdict
from copy import copy
from enum import Enum
from functools import partial
from pathlib import Path
from typing import Any, Callable, Iterable, Optional, Union
from urllib.parse import urljoin
from uuid import UUID
from zlib import crc32
import m3u8
import requests
from langcodes import Language
from requests import Session
from devine.core.constants import TERRITORY_MAP
from devine.core.downloaders import aria2c
from devine.core.drm import DRM_T
from devine.core.utilities import get_binary_path
from devine.core import binaries
from devine.core.config import config
from devine.core.constants import DOWNLOAD_CANCELLED, DOWNLOAD_LICENCE_ONLY
from devine.core.downloaders import aria2c, curl_impersonate, requests
from devine.core.drm import DRM_T, Widevine
from devine.core.events import events
from devine.core.utilities import get_boxes, try_ensure_utf8
from devine.core.utils.subprocess import ffprobe
class Track:
class DRM(Enum):
pass
class Descriptor(Enum):
URL = 1 # Direct URL, nothing fancy
M3U = 2 # https://en.wikipedia.org/wiki/M3U (and M3U8)
MPD = 3 # https://en.wikipedia.org/wiki/Dynamic_Adaptive_Streaming_over_HTTP
HLS = 2 # https://en.wikipedia.org/wiki/HTTP_Live_Streaming
DASH = 3 # https://en.wikipedia.org/wiki/Dynamic_Adaptive_Streaming_over_HTTP
def __init__(
self,
id_: str,
url: Union[str, list[str]],
language: Union[Language, str],
is_original_lang: bool = False,
descriptor: Descriptor = Descriptor.URL,
needs_proxy: bool = False,
needs_repack: bool = False,
name: Optional[str] = None,
drm: Optional[Iterable[DRM_T]] = None,
edition: Optional[str] = None,
extra: Optional[Any] = None
downloader: Optional[Callable] = None,
data: Optional[Union[dict, defaultdict]] = None,
id_: Optional[str] = None,
) -> None:
self.id = id_
self.url = url
# required basic metadata
self.language = Language.get(language)
self.is_original_lang = bool(is_original_lang)
# optional io metadata
self.descriptor = descriptor
self.needs_proxy = bool(needs_proxy)
self.needs_repack = bool(needs_repack)
# drm
self.drm = drm
# extra data
self.edition: str = edition
self.extra: Any = extra or {} # allow anything for extra, but default to a dict
if not isinstance(url, (str, list)):
raise TypeError(f"Expected url to be a {str}, or list of {str}, not {type(url)}")
if not isinstance(language, (Language, str)):
raise TypeError(f"Expected language to be a {Language} or {str}, not {type(language)}")
if not isinstance(is_original_lang, bool):
raise TypeError(f"Expected is_original_lang to be a {bool}, not {type(is_original_lang)}")
if not isinstance(descriptor, Track.Descriptor):
raise TypeError(f"Expected descriptor to be a {Track.Descriptor}, not {type(descriptor)}")
if not isinstance(needs_repack, bool):
raise TypeError(f"Expected needs_repack to be a {bool}, not {type(needs_repack)}")
if not isinstance(name, (str, type(None))):
raise TypeError(f"Expected name to be a {str}, not {type(name)}")
if not isinstance(id_, (str, type(None))):
raise TypeError(f"Expected id_ to be a {str}, not {type(id_)}")
if not isinstance(edition, (str, type(None))):
raise TypeError(f"Expected edition to be a {str}, not {type(edition)}")
if not isinstance(downloader, (Callable, type(None))):
raise TypeError(f"Expected downloader to be a {Callable}, not {type(downloader)}")
if not isinstance(data, (dict, defaultdict, type(None))):
raise TypeError(f"Expected data to be a {dict} or {defaultdict}, not {type(data)}")
# events
self.OnSegmentFilter: Optional[Callable] = None
self.OnDownloaded: Optional[Callable] = None
self.OnDecrypted: Optional[Callable] = None
self.OnRepacked: Optional[Callable] = None
invalid_urls = ", ".join(set(type(x) for x in url if not isinstance(x, str)))
if invalid_urls:
raise TypeError(f"Expected all items in url to be a {str}, but found {invalid_urls}")
if drm is not None:
try:
iter(drm)
except TypeError:
raise TypeError(f"Expected drm to be an iterable, not {type(drm)}")
if downloader is None:
downloader = {
"aria2c": aria2c,
"curl_impersonate": curl_impersonate,
"requests": requests
}[config.downloader]
# should only be set internally
self.path: Optional[Path] = None
self.url = url
self.language = Language.get(language)
self.is_original_lang = is_original_lang
self.descriptor = descriptor
self.needs_repack = needs_repack
self.name = name
self.drm = drm
self.edition: str = edition
self.downloader = downloader
self._data: defaultdict[Any, Any] = defaultdict(dict)
self.data = data or {}
if self.name is None:
lang = Language.get(self.language)
if (lang.language or "").lower() == (lang.territory or "").lower():
lang.territory = None # e.g. en-en, de-DE
reduced = lang.simplify_script()
extra_parts = []
if reduced.script is not None:
script = reduced.script_name(max_distance=25)
if script and script != "Zzzz":
extra_parts.append(script)
if reduced.territory is not None:
territory = reduced.territory_name(max_distance=25)
if territory and territory != "ZZ":
territory = territory.removesuffix(" SAR China")
extra_parts.append(territory)
self.name = ", ".join(extra_parts) or None
if not id_:
this = copy(self)
this.url = self.url.rsplit("?", maxsplit=1)[0]
checksum = crc32(repr(this).encode("utf8"))
id_ = hex(checksum)[2:]
self.id = id_
# TODO: Currently using OnFoo event naming, change to just segment_filter
self.OnSegmentFilter: Optional[Callable] = None
def __repr__(self) -> str:
return "{name}({items})".format(
@ -69,224 +131,402 @@ class Track:
items=", ".join([f"{k}={repr(v)}" for k, v in self.__dict__.items()])
)
def __eq__(self, other: object) -> bool:
def __eq__(self, other: Any) -> bool:
return isinstance(other, Track) and self.id == other.id
def get_track_name(self) -> Optional[str]:
"""Return the base Track Name. This may be enhanced in sub-classes."""
if (self.language.language or "").lower() == (self.language.territory or "").lower():
self.language.territory = None # e.g. en-en, de-DE
if self.language.territory == "US":
self.language.territory = None
reduced = self.language.simplify_script()
extra_parts = []
if reduced.script is not None:
extra_parts.append(reduced.script_name(max_distance=25))
if reduced.territory is not None:
territory = reduced.territory_name(max_distance=25)
extra_parts.append(TERRITORY_MAP.get(territory, territory))
return ", ".join(extra_parts) or None
def get_init_segment(self, session: Optional[requests.Session] = None) -> bytes:
@property
def data(self) -> defaultdict[Any, Any]:
"""
Get the Track's Initial Segment Data Stream.
If the Track URL is not detected to be an init segment, it will download
up to the first 20,000 (20KB) bytes only.
Arbitrary track data dictionary.
A defaultdict is used with a dict as the factory for easier
nested saving and safer exists-checks.
Reserved keys:
- "hls" used by the HLS class.
- playlist: m3u8.model.Playlist - The primary track information.
- media: m3u8.model.Media - The audio/subtitle track information.
- segment_durations: list[int] - A list of each segment's duration.
- "dash" used by the DASH class.
- manifest: lxml.ElementTree - DASH MPD manifest.
- period: lxml.Element - The period of this track.
- adaptation_set: lxml.Element - The adaptation set of this track.
- representation: lxml.Element - The representation of this track.
- timescale: int - The timescale of the track's segments.
- segment_durations: list[int] - A list of each segment's duration.
You should not add, change, or remove any data within reserved keys.
You may use their data but do note that the values of them may change
or be removed at any point.
"""
if not session:
session = requests.Session()
return self._data
url = None
is_init_stream = False
@data.setter
def data(self, value: Union[dict, defaultdict]) -> None:
if not isinstance(value, (dict, defaultdict)):
raise TypeError(f"Expected data to be a {dict} or {defaultdict}, not {type(value)}")
if isinstance(value, dict):
value = defaultdict(dict, **value)
self._data = value
if self.descriptor == self.Descriptor.M3U:
master = m3u8.loads(session.get(self.url).text, uri=self.url)
for segment in master.segments:
if not segment.init_section:
continue
# skip any segment that would be skipped from the download
# as we cant consider these a true initial segment
if callable(self.OnSegmentFilter) and self.OnSegmentFilter(segment):
continue
url = ("" if re.match("^https?://", segment.init_section.uri) else segment.init_section.base_uri)
url += segment.init_section.uri
is_init_stream = True
break
def download(
self,
session: Session,
prepare_drm: partial,
max_workers: Optional[int] = None,
progress: Optional[partial] = None
):
"""Download and optionally Decrypt this Track."""
from devine.core.manifests import DASH, HLS
if not url:
url = self.url
if DOWNLOAD_LICENCE_ONLY.is_set():
progress(downloaded="[yellow]SKIPPING")
if isinstance(url, list):
url = url[0]
is_init_stream = True
if DOWNLOAD_CANCELLED.is_set():
progress(downloaded="[yellow]SKIPPED")
return
if is_init_stream:
return session.get(url).content
log = logging.getLogger("track")
# likely a full single-file download, get first 20k bytes
with session.get(url, stream=True) as s:
# assuming enough to contain the pssh/kid
for chunk in s.iter_content(20000):
# we only want the first chunk
return chunk
proxy = next(iter(session.proxies.values()), None)
def download(self, out: Path, name_template: str = "{type}_{id}", headers: Optional[dict] = None,
proxy: Optional[str] = None) -> Path:
"""
Download the Track and apply any necessary post-edits like Subtitle conversion.
Parameters:
out: Output Directory Path for the downloaded track.
name_template: Override the default filename template.
Must contain both `{type}` and `{id}` variables.
headers: Headers to use when downloading.
proxy: Proxy to use when downloading.
Returns:
Where the file was saved, as a Path object.
"""
if out.is_file():
raise ValueError("Path must be to a directory and not a file")
log = logging.getLogger("download")
out.mkdir(parents=True, exist_ok=True)
file_name = name_template.format(
type=self.__class__.__name__,
id=self.id
)
# we must use .mp4 on tracks:
# - as shaka-packager expects mp4 input and mp4 output
# - and mkvtoolnix would try to parse the file in raw-bitstream
save_path = (out / file_name).with_suffix(".mp4")
if self.__class__.__name__ == "Subtitle":
track_type = self.__class__.__name__
save_path = config.directories.temp / f"{track_type}_{self.id}.mp4"
if track_type == "Subtitle":
save_path = save_path.with_suffix(f".{self.codec.extension}")
# these would be files like .decrypted, .repack and such.
# we cannot trust that these files were not interrupted while writing to disc
# lets just delete them before re-attempting a download
for existing_file in save_path.parent.glob(f"{save_path.stem}.*{save_path.suffix}"):
existing_file.unlink()
save_path.with_suffix(".srt").unlink(missing_ok=True)
if self.descriptor != self.Descriptor.URL:
save_dir = save_path.with_name(save_path.name + "_segments")
else:
save_dir = save_path.parent
if self.descriptor == self.Descriptor.M3U:
master = m3u8.loads(
requests.get(
self.url,
headers=headers,
proxies={"all": proxy} if self.needs_proxy and proxy else None
).text,
uri=self.url
)
def cleanup():
# track file (e.g., "foo.mp4")
save_path.unlink(missing_ok=True)
# aria2c control file (e.g., "foo.mp4.aria2" or "foo.mp4.aria2__temp")
save_path.with_suffix(f"{save_path.suffix}.aria2").unlink(missing_ok=True)
save_path.with_suffix(f"{save_path.suffix}.aria2__temp").unlink(missing_ok=True)
if save_dir.exists() and save_dir.name.endswith("_segments"):
shutil.rmtree(save_dir)
if not master.segments:
raise ValueError("Track URI (an M3U8) has no segments...")
if not DOWNLOAD_LICENCE_ONLY.is_set():
if config.directories.temp.is_file():
raise ValueError(f"Temp Directory '{config.directories.temp}' must be a Directory, not a file")
if all(segment.uri == master.segments[0].uri for segment in master.segments):
# all segments use the same file, presumably an EXT-X-BYTERANGE M3U (FUNI)
# TODO: This might be a risky way to deal with these kinds of Playlists
# What if there's an init section, or one segment is reusing a byte-range
segment = master.segments[0]
if not re.match("^https?://", segment.uri):
segment.uri = urljoin(segment.base_uri, segment.uri)
self.url = segment.uri
self.descriptor = self.Descriptor.URL
else:
has_init = False
segments = []
for segment in master.segments:
# merge base uri with uri where needed in both normal and init segments
if not re.match("^https?://", segment.uri):
segment.uri = segment.base_uri + segment.uri
if segment.init_section and not re.match("^https?://", segment.init_section.uri):
segment.init_section.uri = segment.init_section.base_uri + segment.init_section.uri
config.directories.temp.mkdir(parents=True, exist_ok=True)
if segment.discontinuity:
has_init = False
# Delete any pre-existing temp files matching this track.
# We can't re-use or continue downloading these tracks as they do not use a
# lock file. Or at least the majority don't. Even if they did I've encountered
# corruptions caused by sudden interruptions to the lock file.
cleanup()
# skip segments we don't want to download (e.g., bumpers, dub cards)
if callable(self.OnSegmentFilter) and self.OnSegmentFilter(segment):
continue
if segment.init_section and not has_init:
segments.append(segment.init_section.uri)
has_init = True
segments.append(segment.uri)
self.url = list(dict.fromkeys(segments))
is_segmented = isinstance(self.url, list) and len(self.url) > 1
segments_dir = save_path.with_name(save_path.name + "_segments")
attempts = 1
while True:
try:
asyncio.run(aria2c(
self.url,
[save_path, segments_dir][is_segmented],
headers,
proxy if self.needs_proxy else None
))
break
except subprocess.CalledProcessError:
log.info(f" - Download attempt {attempts} failed, {['retrying', 'stopping'][attempts == 3]}...")
if attempts == 3:
raise
attempts += 1
if is_segmented:
# merge the segments together
with open(save_path, "wb") as f:
for file in sorted(segments_dir.iterdir()):
data = file.read_bytes()
# Apple TV+ needs this done to fix audio decryption
data = re.sub(b"(tfhd\x00\x02\x00\x1a\x00\x00\x00\x01\x00\x00\x00)\x02", b"\\g<1>\x01", data)
f.write(data)
file.unlink() # delete, we don't need it anymore
segments_dir.rmdir()
self.path = save_path
if self.path.stat().st_size <= 3: # Empty UTF-8 BOM == 3 bytes
raise IOError(
"Download failed, the downloaded file is empty. "
f"This {'was' if self.needs_proxy else 'was not'} downloaded with a proxy." +
(
" Perhaps you need to set `needs_proxy` as True to use the proxy for this track."
if not self.needs_proxy else ""
try:
if self.descriptor == self.Descriptor.HLS:
HLS.download_track(
track=self,
save_path=save_path,
save_dir=save_dir,
progress=progress,
session=session,
proxy=proxy,
max_workers=max_workers,
license_widevine=prepare_drm
)
)
elif self.descriptor == self.Descriptor.DASH:
DASH.download_track(
track=self,
save_path=save_path,
save_dir=save_dir,
progress=progress,
session=session,
proxy=proxy,
max_workers=max_workers,
license_widevine=prepare_drm
)
elif self.descriptor == self.Descriptor.URL:
try:
if not self.drm and track_type in ("Video", "Audio"):
# the service might not have explicitly defined the `drm` property
# try find widevine DRM information from the init data of URL
try:
self.drm = [Widevine.from_track(self, session)]
except Widevine.Exceptions.PSSHNotFound:
# it might not have Widevine DRM, or might not have found the PSSH
log.warning("No Widevine PSSH was found for this track, is it DRM free?")
return self.path
if self.drm:
track_kid = self.get_key_id(session=session)
drm = self.drm[0] # just use the first supported DRM system for now
if isinstance(drm, Widevine):
# license and grab content keys
if not prepare_drm:
raise ValueError("prepare_drm func must be supplied to use Widevine DRM")
progress(downloaded="LICENSING")
prepare_drm(drm, track_kid=track_kid)
progress(downloaded="[yellow]LICENSED")
else:
drm = None
if DOWNLOAD_LICENCE_ONLY.is_set():
progress(downloaded="[yellow]SKIPPED")
else:
for status_update in self.downloader(
urls=self.url,
output_dir=save_path.parent,
filename=save_path.name,
headers=session.headers,
cookies=session.cookies,
proxy=proxy,
max_workers=max_workers
):
file_downloaded = status_update.get("file_downloaded")
if not file_downloaded:
progress(**status_update)
# see https://github.com/devine-dl/devine/issues/71
save_path.with_suffix(f"{save_path.suffix}.aria2__temp").unlink(missing_ok=True)
self.path = save_path
events.emit(events.Types.TRACK_DOWNLOADED, track=self)
if drm:
progress(downloaded="Decrypting", completed=0, total=100)
drm.decrypt(save_path)
self.drm = None
events.emit(
events.Types.TRACK_DECRYPTED,
track=self,
drm=drm,
segment=None
)
progress(downloaded="Decrypted", completed=100)
if track_type == "Subtitle" and self.codec.name not in ("fVTT", "fTTML"):
track_data = self.path.read_bytes()
track_data = try_ensure_utf8(track_data)
track_data = track_data.decode("utf8"). \
replace("&lrm;", html.unescape("&lrm;")). \
replace("&rlm;", html.unescape("&rlm;")). \
encode("utf8")
self.path.write_bytes(track_data)
progress(downloaded="Downloaded")
except KeyboardInterrupt:
DOWNLOAD_CANCELLED.set()
progress(downloaded="[yellow]CANCELLED")
raise
except Exception:
DOWNLOAD_CANCELLED.set()
progress(downloaded="[red]FAILED")
raise
except (Exception, KeyboardInterrupt):
if not DOWNLOAD_LICENCE_ONLY.is_set():
cleanup()
raise
if DOWNLOAD_CANCELLED.is_set():
# we stopped during the download, let's exit
return
if not DOWNLOAD_LICENCE_ONLY.is_set():
if self.path.stat().st_size <= 3: # Empty UTF-8 BOM == 3 bytes
raise IOError("Download failed, the downloaded file is empty.")
events.emit(events.Types.TRACK_DOWNLOADED, track=self)
def delete(self) -> None:
if self.path:
self.path.unlink()
self.path = None
def move(self, target: Union[Path, str]) -> Path:
"""
Move the Track's file from current location, to target location.
This will overwrite anything at the target path.
Raises:
TypeError: If the target argument is not the expected type.
ValueError: If track has no file to move, or the target does not exist.
OSError: If the file somehow failed to move.
Returns the new location of the track.
"""
if not isinstance(target, (str, Path)):
raise TypeError(f"Expected {target} to be a {Path} or {str}, not {type(target)}")
if not self.path:
raise ValueError("Track has no file to move")
if not isinstance(target, Path):
target = Path(target)
if not target.exists():
raise ValueError(f"Target file {repr(target)} does not exist")
moved_to = Path(shutil.move(self.path, target))
if moved_to.resolve() != target.resolve():
raise OSError(f"Failed to move {self.path} to {target}")
self.path = target
return target
def get_track_name(self) -> Optional[str]:
"""Get the Track Name."""
return self.name
def get_key_id(self, init_data: Optional[bytes] = None, *args, **kwargs) -> Optional[UUID]:
"""
Probe the DRM encryption Key ID (KID) for this specific track.
It currently supports finding the Key ID by probing the track's stream
with ffprobe for `enc_key_id` data, as well as for mp4 `tenc` (Track
Encryption) boxes.
It explicitly ignores PSSH information like the `PSSH` box, as the box
is likely to contain multiple Key IDs that may or may not be for this
specific track.
To retrieve the initialization segment, this method calls :meth:`get_init_segment`
with the positional and keyword arguments. The return value of `get_init_segment`
is then used to determine the Key ID.
Returns:
The Key ID as a UUID object, or None if the Key ID could not be determined.
"""
if not init_data:
init_data = self.get_init_segment(*args, **kwargs)
if not isinstance(init_data, bytes):
raise TypeError(f"Expected init_data to be bytes, not {init_data!r}")
probe = ffprobe(init_data)
if probe:
for stream in probe.get("streams") or []:
enc_key_id = stream.get("tags", {}).get("enc_key_id")
if enc_key_id:
return UUID(bytes=base64.b64decode(enc_key_id))
for tenc in get_boxes(init_data, b"tenc"):
if tenc.key_ID.int != 0:
return tenc.key_ID
for uuid_box in get_boxes(init_data, b"uuid"):
if uuid_box.extended_type == UUID("8974dbce-7be7-4c51-84f9-7148f9882554"): # tenc
tenc = uuid_box.data
if tenc.key_ID.int != 0:
return tenc.key_ID
def get_init_segment(
self,
maximum_size: int = 20000,
url: Optional[str] = None,
byte_range: Optional[str] = None,
session: Optional[Session] = None
) -> bytes:
"""
Get the Track's Initial Segment Data Stream.
HLS and DASH tracks must explicitly provide a URL to the init segment or file.
Providing the byte-range for the init segment is recommended where possible.
If `byte_range` is not set, it will make a HEAD request and check the size of
the file. If the size could not be determined, it will download up to the first
20KB only, which should contain the entirety of the init segment. You may
override this by changing the `maximum_size`.
The default maximum_size of 20000 (20KB) is a tried-and-tested value that
seems to work well across the board.
Parameters:
maximum_size: Size to assume as the content length if byte-range is not
used, the content size could not be determined, or the content size
is larger than it. A value of 20000 (20KB) or higher is recommended.
url: Explicit init map or file URL to probe from.
byte_range: Range of bytes to download from the explicit or implicit URL.
session: Session context, e.g., authorization and headers.
"""
if not isinstance(maximum_size, int):
raise TypeError(f"Expected maximum_size to be an {int}, not {type(maximum_size)}")
if not isinstance(url, (str, type(None))):
raise TypeError(f"Expected url to be a {str}, not {type(url)}")
if not isinstance(byte_range, (str, type(None))):
raise TypeError(f"Expected byte_range to be a {str}, not {type(byte_range)}")
if not isinstance(session, (Session, type(None))):
raise TypeError(f"Expected session to be a {Session}, not {type(session)}")
if not url:
if self.descriptor != self.Descriptor.URL:
raise ValueError(f"An explicit URL must be provided for {self.descriptor.name} tracks")
if not self.url:
raise ValueError("An explicit URL must be provided as the track has no URL")
url = self.url
if not session:
session = Session()
content_length = maximum_size
if byte_range:
if not isinstance(byte_range, str):
raise TypeError(f"Expected byte_range to be a str, not {byte_range!r}")
if not re.match(r"^\d+-\d+$", byte_range):
raise ValueError(f"The value of byte_range is unrecognized: '{byte_range}'")
start, end = byte_range.split("-")
if start > end:
raise ValueError(f"The start range cannot be greater than the end range: {start}>{end}")
else:
size_test = session.head(url)
if "Content-Length" in size_test.headers:
content_length_header = int(size_test.headers["Content-Length"])
if content_length_header > 0:
content_length = min(content_length_header, maximum_size)
range_test = session.head(url, headers={"Range": "bytes=0-1"})
if range_test.status_code == 206:
byte_range = f"0-{content_length-1}"
if byte_range:
res = session.get(
url=url,
headers={
"Range": f"bytes={byte_range}"
}
)
res.raise_for_status()
init_data = res.content
else:
init_data = None
with session.get(url, stream=True) as s:
for chunk in s.iter_content(content_length):
init_data = chunk
break
if not init_data:
raise ValueError(f"Failed to read {content_length} bytes from the track URI.")
return init_data
def repackage(self) -> None:
if not self.path or not self.path.exists():
raise ValueError("Cannot repackage a Track that has not been downloaded.")
executable = get_binary_path("ffmpeg")
if not executable:
if not binaries.FFMPEG:
raise EnvironmentError("FFmpeg executable \"ffmpeg\" was not found but is required for this call.")
repacked_path = self.path.with_suffix(f".repack{self.path.suffix}")
original_path = self.path
output_path = original_path.with_stem(f"{original_path.stem}_repack")
def _ffmpeg(extra_args: list[str] = None):
subprocess.run(
[
executable, "-hide_banner",
binaries.FFMPEG, "-hide_banner",
"-loglevel", "error",
"-i", self.path,
"-i", original_path,
*(extra_args or []),
# Following are very important!
"-map_metadata", "-1", # don't transfer metadata to output file
"-fflags", "bitexact", # only have minimal tag data, reproducible mux
"-codec", "copy",
str(repacked_path)
str(output_path)
],
check=True,
stdout=subprocess.PIPE,
@ -302,34 +542,8 @@ class Track:
else:
raise
self.swap(repacked_path)
def move(self, target: Union[str, Path]) -> bool:
"""
Move the Track's file from current location, to target location.
This will overwrite anything at the target path.
"""
if not self.path:
return False
target = Path(target)
ok = self.path.rename(target).resolve() == target.resolve()
if ok:
self.path = target
return ok
def swap(self, target: Union[str, Path]) -> bool:
"""
Swaps the Track's file with the Target file. The current Track's file is deleted.
Returns False if the Track is not yet downloaded, or the target path does not exist.
"""
target = Path(target)
if not target.exists() or not self.path:
return False
self.path.unlink()
ok = target.rename(self.path) == self.path
if not ok:
return False
return self.move(target)
original_path.unlink()
self.path = output_path
__ALL__ = (Track,)
__all__ = ("Track",)

View File

@ -2,26 +2,32 @@ from __future__ import annotations
import logging
import subprocess
from functools import partial
from pathlib import Path
from typing import Callable, Iterator, Optional, Sequence, Union
from Cryptodome.Random import get_random_bytes
from langcodes import Language, closest_supported_match
from rich.progress import BarColumn, Progress, SpinnerColumn, TextColumn, TimeRemainingColumn
from rich.table import Table
from rich.tree import Tree
from devine.core.config import config
from devine.core.constants import LANGUAGE_MAX_DISTANCE, LANGUAGE_MUX_MAP, AnyTrack, TrackT
from devine.core.console import console
from devine.core.constants import LANGUAGE_MAX_DISTANCE, AnyTrack, TrackT
from devine.core.events import events
from devine.core.tracks.attachment import Attachment
from devine.core.tracks.audio import Audio
from devine.core.tracks.track import Track
from devine.core.tracks.chapter import Chapter
from devine.core.tracks.chapters import Chapter, Chapters
from devine.core.tracks.subtitle import Subtitle
from devine.core.tracks.track import Track
from devine.core.tracks.video import Video
from devine.core.utilities import sanitize_filename, is_close_match
from devine.core.utilities import is_close_match, sanitize_filename
from devine.core.utils.collections import as_list, flatten
class Tracks:
"""
Video, Audio, Subtitle, and Chapter Track Store.
Video, Audio, Subtitle, Chapter, and Attachment Track Store.
It provides convenience functions for listing, sorting, and selecting tracks.
"""
@ -29,14 +35,23 @@ class Tracks:
Video: 0,
Audio: 1,
Subtitle: 2,
Chapter: 3
Chapter: 3,
Attachment: 4
}
def __init__(self, *args: Union[Tracks, list[Track], Track]):
def __init__(self, *args: Union[
Tracks,
Sequence[Union[AnyTrack, Chapter, Chapters, Attachment]],
Track,
Chapter,
Chapters,
Attachment
]):
self.videos: list[Video] = []
self.audio: list[Audio] = []
self.subtitles: list[Subtitle] = []
self.chapters: list[Chapter] = []
self.chapters = Chapters()
self.attachments: list[Attachment] = []
if args:
self.add(args)
@ -47,6 +62,20 @@ class Tracks:
def __len__(self) -> int:
return len(self.videos) + len(self.audio) + len(self.subtitles)
def __add__(
self,
other: Union[
Tracks,
Sequence[Union[AnyTrack, Chapter, Chapters, Attachment]],
Track,
Chapter,
Chapters,
Attachment
]
) -> Tracks:
self.add(other)
return self
def __repr__(self) -> str:
return "{name}({items})".format(
name=self.__class__.__name__,
@ -58,7 +87,8 @@ class Tracks:
Video: [],
Audio: [],
Subtitle: [],
Chapter: []
Chapter: [],
Attachment: []
}
tracks = [*list(self), *self.chapters]
@ -86,6 +116,42 @@ class Tracks:
return rep
def tree(self, add_progress: bool = False) -> tuple[Tree, list[partial]]:
all_tracks = [*list(self), *self.chapters, *self.attachments]
progress_callables = []
tree = Tree("", hide_root=True)
for track_type in self.TRACK_ORDER_MAP:
tracks = list(x for x in all_tracks if isinstance(x, track_type))
if not tracks:
continue
num_tracks = len(tracks)
track_type_plural = track_type.__name__ + ("s" if track_type != Audio and num_tracks != 1 else "")
tracks_tree = tree.add(f"[repr.number]{num_tracks}[/] {track_type_plural}")
for track in tracks:
if add_progress and track_type not in (Chapter, Attachment):
progress = Progress(
SpinnerColumn(finished_text=""),
BarColumn(),
"",
TimeRemainingColumn(compact=True, elapsed_when_finished=True),
"",
TextColumn("[progress.data.speed]{task.fields[downloaded]}"),
console=console,
speed_estimate_period=10
)
task = progress.add_task("", downloaded="-")
progress_callables.append(partial(progress.update, task_id=task))
track_table = Table.grid()
track_table.add_row(str(track)[6:], style="text2")
track_table.add_row(progress)
tracks_tree.add(track_table)
else:
tracks_tree.add(str(track)[6:], style="text2")
return tree, progress_callables
def exists(self, by_id: Optional[str] = None, by_url: Optional[Union[str, list[str]]] = None) -> bool:
"""Check if a track already exists by various methods."""
if by_id: # recommended
@ -96,12 +162,19 @@ class Tracks:
def add(
self,
tracks: Union[Tracks, Sequence[Union[AnyTrack, Chapter]], Track, Chapter],
tracks: Union[
Tracks,
Sequence[Union[AnyTrack, Chapter, Chapters, Attachment]],
Track,
Chapter,
Chapters,
Attachment
],
warn_only: bool = False
) -> None:
"""Add a provided track to its appropriate array and ensuring it's not a duplicate."""
if isinstance(tracks, Tracks):
tracks = [*list(tracks), *tracks.chapters]
tracks = [*list(tracks), *tracks.chapters, *tracks.attachments]
duplicates = 0
for track in flatten(tracks):
@ -125,7 +198,9 @@ class Tracks:
elif isinstance(track, Subtitle):
self.subtitles.append(track)
elif isinstance(track, Chapter):
self.chapters.append(track)
self.chapters.add(track)
elif isinstance(track, Attachment):
self.attachments.append(track)
else:
raise ValueError("Track type was not set or is invalid.")
@ -134,12 +209,6 @@ class Tracks:
if duplicates:
log.warning(f" - Found and skipped {duplicates} duplicate tracks...")
def print(self, level: int = logging.INFO) -> None:
"""Print the __str__ to log at a specified level."""
log = logging.getLogger("Tracks")
for line in str(self).splitlines(keepends=False):
log.log(level, line)
def sort_videos(self, by_language: Optional[Sequence[Union[str, Language]]] = None) -> None:
"""Sort video tracks by bitrate, and optionally language."""
if not self.videos:
@ -208,13 +277,6 @@ class Tracks:
continue
self.subtitles.sort(key=lambda x: is_close_match(language, [x.language]), reverse=True)
def sort_chapters(self) -> None:
"""Sort chapter tracks by chapter number."""
if not self.chapters:
return
# number
self.chapters.sort(key=lambda x: x.number)
def select_video(self, x: Callable[[Video], bool]) -> None:
self.videos = list(filter(x, self.videos))
@ -224,42 +286,46 @@ class Tracks:
def select_subtitles(self, x: Callable[[Subtitle], bool]) -> None:
self.subtitles = list(filter(x, self.subtitles))
def with_resolution(self, resolution: int) -> None:
if resolution:
# Note: Do not merge these list comprehensions. They must be done separately so the results
# from the 16:9 canvas check is only used if there's no exact height resolution match.
videos_quality = [x for x in self.videos if x.height == resolution]
if not videos_quality:
videos_quality = [x for x in self.videos if int(x.width * (9 / 16)) == resolution]
self.videos = videos_quality
def export_chapters(self, to_file: Optional[Union[Path, str]] = None) -> str:
"""Export all chapters in order to a string or file."""
self.sort_chapters()
data = "\n".join(map(repr, self.chapters))
if to_file:
to_file = Path(to_file)
to_file.parent.mkdir(parents=True, exist_ok=True)
to_file.write_text(data, encoding="utf8")
return data
def by_resolutions(self, resolutions: list[int], per_resolution: int = 0) -> None:
# Note: Do not merge these list comprehensions. They must be done separately so the results
# from the 16:9 canvas check is only used if there's no exact height resolution match.
selected = []
for resolution in resolutions:
matches = [ # exact matches
x
for x in self.videos
if x.height == resolution
]
if not matches:
matches = [ # 16:9 canvas matches
x
for x in self.videos
if int(x.width * (9 / 16)) == resolution
]
selected.extend(matches[:per_resolution or None])
self.videos = selected
@staticmethod
def select_per_language(tracks: list[TrackT], languages: list[str]) -> list[TrackT]:
"""
Enumerates and return the first Track per language.
You should sort the list so the wanted track is closer to the start of the list.
"""
tracks_ = []
def by_language(tracks: list[TrackT], languages: list[str], per_language: int = 0) -> list[TrackT]:
selected = []
for language in languages:
match = closest_supported_match(language, [str(x.language) for x in tracks], LANGUAGE_MAX_DISTANCE)
if match:
tracks_.append(next(x for x in tracks if str(x.language) == match))
return tracks_
selected.extend([
x
for x in tracks
if closest_supported_match(x.language, [language], LANGUAGE_MAX_DISTANCE)
][:per_language or None])
return selected
def mux(self, title: str, delete: bool = True) -> tuple[Path, int]:
def mux(self, title: str, delete: bool = True, progress: Optional[partial] = None) -> tuple[Path, int, list[str]]:
"""
Takes the Video, Audio and Subtitle Tracks, and muxes them into an MKV file.
It will attempt to detect Forced/Default tracks, and will try to parse the language codes of the Tracks
Multiplex all the Tracks into a Matroska Container file.
Parameters:
title: Set the Matroska Container file title. Usually displayed in players
instead of the filename if set.
delete: Delete all track files after multiplexing.
progress: Update a rich progress bar via `completed=...`. This must be the
progress object's update() func, pre-set with task id via functools.partial.
"""
cl = [
"mkvmerge",
@ -272,10 +338,9 @@ class Tracks:
for i, vt in enumerate(self.videos):
if not vt.path or not vt.path.exists():
raise ValueError("Video Track must be downloaded before muxing...")
events.emit(events.Types.TRACK_MULTIPLEX, track=vt)
cl.extend([
"--language", "0:{}".format(LANGUAGE_MUX_MAP.get(
str(vt.language), str(vt.language)
)),
"--language", f"0:{vt.language}",
"--default-track", f"0:{i == 0}",
"--original-flag", f"0:{vt.is_original_lang}",
"--compression", "0:none", # disable extra compression
@ -285,11 +350,10 @@ class Tracks:
for i, at in enumerate(self.audio):
if not at.path or not at.path.exists():
raise ValueError("Audio Track must be downloaded before muxing...")
events.emit(events.Types.TRACK_MULTIPLEX, track=at)
cl.extend([
"--track-name", f"0:{at.get_track_name() or ''}",
"--language", "0:{}".format(LANGUAGE_MUX_MAP.get(
str(at.language), str(at.language)
)),
"--language", f"0:{at.language}",
"--default-track", f"0:{i == 0}",
"--visual-impaired-flag", f"0:{at.descriptive}",
"--original-flag", f"0:{at.is_original_lang}",
@ -300,12 +364,11 @@ class Tracks:
for st in self.subtitles:
if not st.path or not st.path.exists():
raise ValueError("Text Track must be downloaded before muxing...")
events.emit(events.Types.TRACK_MULTIPLEX, track=st)
default = bool(self.audio and is_close_match(st.language, [self.audio[0].language]) and st.forced)
cl.extend([
"--track-name", f"0:{st.get_track_name() or ''}",
"--language", "0:{}".format(LANGUAGE_MUX_MAP.get(
str(st.language), str(st.language)
)),
"--language", f"0:{st.language}",
"--sub-charset", "0:UTF-8",
"--forced-track", f"0:{st.forced}",
"--default-track", f"0:{default}",
@ -318,13 +381,23 @@ class Tracks:
if self.chapters:
chapters_path = config.directories.temp / config.filenames.chapters.format(
title=sanitize_filename(title),
random=get_random_bytes(16).hex()
random=self.chapters.id
)
self.export_chapters(chapters_path)
cl.extend(["--chapters", str(chapters_path)])
self.chapters.dump(chapters_path, fallback_name=config.chapter_fallback_name)
cl.extend(["--chapter-charset", "UTF-8", "--chapters", str(chapters_path)])
else:
chapters_path = None
for attachment in self.attachments:
if not attachment.path or not attachment.path.exists():
raise ValueError("Attachment File was not found...")
cl.extend([
"--attachment-description", attachment.description or "",
"--attachment-mime-type", attachment.mime_type,
"--attachment-name", attachment.name,
"--attach-file", str(attachment.path.resolve())
])
output_path = (
self.videos[0].path.with_suffix(".muxed.mkv") if self.videos else
self.audio[0].path.with_suffix(".muxed.mka") if self.audio else
@ -337,11 +410,18 @@ class Tracks:
# let potential failures go to caller, caller should handle
try:
p = subprocess.run([
errors = []
p = subprocess.Popen([
*cl,
"--output", str(output_path)
])
return output_path, p.returncode
"--output", str(output_path),
"--gui-mode"
], text=True, stdout=subprocess.PIPE)
for line in iter(p.stdout.readline, ""):
if line.startswith("#GUI#error") or line.startswith("#GUI#warning"):
errors.append(line)
if "progress" in line:
progress(total=100, completed=int(line.strip()[14:-1]))
return output_path, p.wait(), errors
finally:
if chapters_path:
# regardless of delete param, we delete as it's a file we made during muxing
@ -351,4 +431,4 @@ class Tracks:
track.delete()
__ALL__ = (Tracks,)
__all__ = ("Tracks",)

View File

@ -10,10 +10,11 @@ from typing import Any, Optional, Union
from langcodes import Language
from devine.core import binaries
from devine.core.config import config
from devine.core.tracks.track import Track
from devine.core.tracks.subtitle import Subtitle
from devine.core.utilities import get_binary_path, get_boxes, FPS
from devine.core.tracks.track import Track
from devine.core.utilities import FPS, get_boxes
class Video(Track):
@ -67,7 +68,7 @@ class Video(Track):
@staticmethod
def from_netflix_profile(profile: str) -> Video.Codec:
profile = profile.lower().strip()
if profile.startswith("playready-h264"):
if profile.startswith(("h264", "playready-h264")):
return Video.Codec.AVC
if profile.startswith("hevc"):
return Video.Codec.HEVC
@ -88,32 +89,40 @@ class Video(Track):
def from_cicp(primaries: int, transfer: int, matrix: int) -> Video.Range:
"""
ISO/IEC 23001-8 Coding-independent code points to Video Range.
Sources for Code points:
https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-H.Sup19-201903-S!!PDF-E&type=items
Sources:
https://www.itu.int/rec/T-REC-H.Sup19-202104-I
"""
class Primaries(Enum):
Unspecified = 0
BT_709 = 1
BT_601_625 = 5
BT_601_525 = 6
BT_2020 = 9 # BT.2100 shares the same CP
BT_2020_and_2100 = 9
SMPTE_ST_2113_and_EG_4321 = 12 # P3D65
class Transfer(Enum):
Unspecified = 0
SDR_BT_709 = 1
SDR_BT_601_625 = 5
SDR_BT_601_525 = 6
SDR_BT_2020 = 14
SDR_BT_2100 = 15
PQ = 16
HLG = 18
BT_709 = 1
BT_601 = 6
BT_2020 = 14
BT_2100 = 15
BT_2100_PQ = 16
BT_2100_HLG = 18
class Matrix(Enum):
RGB = 0
YCbCr_BT_709 = 1
YCbCr_BT_601_625 = 5
YCbCr_BT_601_525 = 6
YCbCr_BT_2020 = 9 # YCbCr BT.2100 shares the same CP
YCbCr_BT_2020_and_2100 = 9 # YCbCr BT.2100 shares the same CP
ICtCp_BT_2100 = 14
if transfer == 5:
# While not part of any standard, it is typically used as a PAL variant of Transfer.BT_601=6.
# i.e. where Transfer 6 would be for BT.601-NTSC and Transfer 5 would be for BT.601-PAL.
# The codebase is currently agnostic to either, so a manual conversion to 6 is done.
transfer = 6
primaries = Primaries(primaries)
transfer = Transfer(transfer)
@ -123,21 +132,21 @@ class Video(Track):
if (primaries, transfer, matrix) == (0, 0, 0):
return Video.Range.SDR
if primaries in (Primaries.BT_601_525, Primaries.BT_601_625):
elif primaries in (Primaries.BT_601_625, Primaries.BT_601_525):
return Video.Range.SDR
if transfer == Transfer.PQ:
elif transfer == Transfer.BT_2100_PQ:
return Video.Range.HDR10
elif transfer == Transfer.HLG:
elif transfer == Transfer.BT_2100_HLG:
return Video.Range.HLG
else:
return Video.Range.SDR
@staticmethod
def from_m3u_range_tag(tag: str) -> Video.Range:
def from_m3u_range_tag(tag: str) -> Optional[Video.Range]:
tag = (tag or "").upper().replace('"', '').strip()
if not tag or tag == "SDR":
if not tag:
return None
if tag == "SDR":
return Video.Range.SDR
elif tag == "PQ":
return Video.Range.HDR10 # technically could be any PQ-transfer range
@ -146,35 +155,110 @@ class Video(Track):
# for some reason there's no Dolby Vision info tag
raise ValueError(f"The M3U Range Tag '{tag}' is not a supported Video Range")
def __init__(self, *args: Any, codec: Video.Codec, range_: Video.Range, bitrate: Union[str, int, float],
width: int, height: int, fps: Optional[Union[str, int, float]] = None, **kwargs: Any) -> None:
def __init__(
self,
*args: Any,
codec: Optional[Video.Codec] = None,
range_: Optional[Video.Range] = None,
bitrate: Optional[Union[str, int, float]] = None,
width: Optional[int] = None,
height: Optional[int] = None,
fps: Optional[Union[str, int, float]] = None,
**kwargs: Any
) -> None:
"""
Create a new Video track object.
Parameters:
codec: A Video.Codec enum representing the video codec.
If not specified, MediaInfo will be used to retrieve the codec
once the track has been downloaded.
range_: A Video.Range enum representing the video color range.
Defaults to SDR if not specified.
bitrate: A number or float representing the average bandwidth in bytes/s.
Float values are rounded up to the nearest integer.
width: The horizontal resolution of the video.
height: The vertical resolution of the video.
fps: A number, float, or string representing the frames/s of the video.
Strings may represent numbers, floats, or a fraction (num/den).
All strings will be cast to either a number or float.
Note: If codec, bitrate, width, height, or fps is not specified some checks
may be skipped or assume a value. Specifying as much information as possible
is highly recommended.
"""
super().__init__(*args, **kwargs)
# required
if not isinstance(codec, (Video.Codec, type(None))):
raise TypeError(f"Expected codec to be a {Video.Codec}, not {codec!r}")
if not isinstance(range_, (Video.Range, type(None))):
raise TypeError(f"Expected range_ to be a {Video.Range}, not {range_!r}")
if not isinstance(bitrate, (str, int, float, type(None))):
raise TypeError(f"Expected bitrate to be a {str}, {int}, or {float}, not {bitrate!r}")
if not isinstance(width, (int, str, type(None))):
raise TypeError(f"Expected width to be a {int}, not {width!r}")
if not isinstance(height, (int, str, type(None))):
raise TypeError(f"Expected height to be a {int}, not {height!r}")
if not isinstance(fps, (str, int, float, type(None))):
raise TypeError(f"Expected fps to be a {str}, {int}, or {float}, not {fps!r}")
self.codec = codec
self.range = range_ or Video.Range.SDR
self.bitrate = int(math.ceil(float(bitrate))) if bitrate else None
self.width = int(width)
self.height = int(height)
# optional
self.fps = FPS.parse(str(fps)) if fps else None
try:
self.bitrate = int(math.ceil(float(bitrate))) if bitrate else None
except (ValueError, TypeError) as e:
raise ValueError(f"Expected bitrate to be a number or float, {e}")
try:
self.width = int(width or 0) or None
except ValueError as e:
raise ValueError(f"Expected width to be a number, not {width!r}, {e}")
try:
self.height = int(height or 0) or None
except ValueError as e:
raise ValueError(f"Expected height to be a number, not {height!r}, {e}")
try:
self.fps = (FPS.parse(str(fps)) or None) if fps else None
except Exception as e:
raise ValueError(
"Expected fps to be a number, float, or a string as numerator/denominator form, " +
str(e)
)
def __str__(self) -> str:
fps = f"{self.fps:.3f}" if self.fps else "Unknown"
return " | ".join(filter(bool, [
"VID",
f"[{self.codec.value}, {self.range.name}]",
"[" + (", ".join(filter(bool, [
self.codec.value if self.codec else None,
self.range.name
]))) + "]",
str(self.language),
f"{self.width}x{self.height} @ {self.bitrate // 1000 if self.bitrate else '?'} kb/s, {fps} FPS",
", ".join(filter(bool, [
" @ ".join(filter(bool, [
f"{self.width}x{self.height}" if self.width and self.height else None,
f"{self.bitrate // 1000} kb/s" if self.bitrate else None
])),
f"{self.fps:.3f} FPS" if self.fps else None
])),
self.edition
]))
def change_color_range(self, range_: int) -> None:
"""Change the Video's Color Range to Limited (0) or Full (1)."""
if not self.path or not self.path.exists():
raise ValueError("Cannot repackage a Track that has not been downloaded.")
raise ValueError("Cannot change the color range flag on a Video that has not been downloaded.")
if not self.codec:
raise ValueError("Cannot change the color range flag on a Video that has no codec specified.")
if self.codec not in (Video.Codec.AVC, Video.Codec.HEVC):
raise NotImplementedError(
"Cannot change the color range flag on this Video as "
f"it's codec, {self.codec.value}, is not yet supported."
)
executable = get_binary_path("ffmpeg")
if not executable:
if not binaries.FFMPEG:
raise EnvironmentError("FFmpeg executable \"ffmpeg\" was not found but is required for this call.")
filter_key = {
@ -182,17 +266,20 @@ class Video(Track):
Video.Codec.HEVC: "hevc_metadata"
}[self.codec]
changed_path = self.path.with_suffix(f".range{range_}{self.path.suffix}")
original_path = self.path
output_path = original_path.with_stem(f"{original_path.stem}_{['limited', 'full'][range_]}_range")
subprocess.run([
executable, "-hide_banner",
binaries.FFMPEG, "-hide_banner",
"-loglevel", "panic",
"-i", self.path,
"-i", original_path,
"-codec", "copy",
"-bsf:v", f"{filter_key}=video_full_range_flag={range_}",
str(changed_path)
str(output_path)
], check=True)
self.swap(changed_path)
self.path = output_path
original_path.unlink()
def ccextractor(
self, track_id: Any, out_path: Union[Path, str], language: Language, original: bool = False
@ -201,29 +288,29 @@ class Video(Track):
if not self.path:
raise ValueError("You must download the track first.")
executable = get_binary_path("ccextractor", "ccextractorwin", "ccextractorwinfull")
if not executable:
if not binaries.CCExtractor:
raise EnvironmentError("ccextractor executable was not found.")
# ccextractor often fails in weird ways unless we repack
self.repackage()
out_path = Path(out_path)
try:
subprocess.run([
executable,
"-trim", "-noru", "-ru1",
self.path, "-o", out_path
], check=True)
binaries.CCExtractor,
"-trim",
"-nobom",
"-noru", "-ru1",
"-o", out_path,
self.path
], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
except subprocess.CalledProcessError as e:
out_path.unlink(missing_ok=True)
if not e.returncode == 10: # No captions found
raise
if out_path.exists():
if out_path.stat().st_size <= 3:
# An empty UTF-8 file with BOM is 3 bytes.
# If the subtitle file is empty, mkvmerge will fail to mux.
out_path.unlink()
return None
cc_track = Subtitle(
id_=track_id,
url="", # doesn't need to be downloaded
@ -292,8 +379,7 @@ class Video(Track):
if not self.path or not self.path.exists():
raise ValueError("Cannot clean a Track that has not been downloaded.")
executable = get_binary_path("ffmpeg")
if not executable:
if not binaries.FFMPEG:
raise EnvironmentError("FFmpeg executable \"ffmpeg\" was not found but is required for this call.")
log = logging.getLogger("x264-clean")
@ -311,11 +397,12 @@ class Video(Track):
i = file.index(b"x264")
encoding_settings = file[i: i + file[i:].index(b"\x00")].replace(b":", br"\\:").replace(b",", br"\,").decode()
cleaned_path = self.path.with_suffix(f".cleaned{self.path.suffix}")
original_path = self.path
cleaned_path = original_path.with_suffix(f".cleaned{original_path.suffix}")
subprocess.run([
executable, "-hide_banner",
binaries.FFMPEG, "-hide_banner",
"-loglevel", "panic",
"-i", self.path,
"-i", original_path,
"-map_metadata", "-1",
"-fflags", "bitexact",
"-bsf:v", f"filter_units=remove_types=6,h264_metadata=sei_user_data={uuid}+{encoding_settings}",
@ -325,9 +412,10 @@ class Video(Track):
log.info(" + Removed")
self.swap(cleaned_path)
self.path = cleaned_path
original_path.unlink()
return True
__ALL__ = (Video,)
__all__ = ("Video",)

View File

@ -1,18 +1,22 @@
import ast
import contextlib
import importlib.util
import os
import re
import shutil
import socket
import sys
from urllib.parse import urlparse
import pproxy
import requests
import time
import unicodedata
from collections import defaultdict
from datetime import datetime
from pathlib import Path
from types import ModuleType
from typing import Optional, Union, Sequence, AsyncIterator
from typing import Optional, Sequence, Union
from urllib.parse import ParseResult, urlparse
import chardet
import requests
from construct import ValidationError
from langcodes import Language, closest_match
from pymp4.parser import Box
from unidecode import unidecode
@ -21,6 +25,37 @@ from devine.core.config import config
from devine.core.constants import LANGUAGE_MAX_DISTANCE
def rotate_log_file(log_path: Path, keep: int = 20) -> Path:
"""
Update Log Filename and delete old log files.
It keeps only the 20 newest logs by default.
"""
if not log_path:
raise ValueError("A log path must be provided")
try:
log_path.relative_to(Path("")) # file name only
except ValueError:
pass
else:
log_path = config.directories.logs / log_path
log_path = log_path.parent / log_path.name.format_map(defaultdict(
str,
name="root",
time=datetime.now().strftime("%Y%m%d-%H%M%S")
))
if log_path.parent.exists():
log_files = [x for x in log_path.parent.iterdir() if x.suffix == log_path.suffix]
for log_file in log_files[::-1][keep-1:]:
# keep n newest files and delete the rest
log_file.unlink()
log_path.parent.mkdir(parents=True, exist_ok=True)
return log_path
def import_module_by_path(path: Path) -> ModuleType:
"""Import a Python file by Path as a Module."""
if not path:
@ -51,15 +86,6 @@ def import_module_by_path(path: Path) -> ModuleType:
return module
def get_binary_path(*names: str) -> Optional[Path]:
"""Find the path of the first found binary name."""
for name in names:
path = shutil.which(name)
if path:
return Path(path)
return None
def sanitize_filename(filename: str, spacer: str = ".") -> str:
"""
Sanitize a string to be filename safe.
@ -75,8 +101,8 @@ def sanitize_filename(filename: str, spacer: str = ".") -> str:
filename = filename.\
replace("/", " & ").\
replace(";", " & ") # e.g. multi-episode filenames
filename = re.sub(rf"[:; ]", spacer, filename) # structural chars to (spacer)
filename = re.sub(r"[\\*!?¿,'\"()<>|$#]", "", filename) # not filename safe chars
filename = re.sub(r"[:; ]", spacer, filename) # structural chars to (spacer)
filename = re.sub(r"[\\*!?¿,'\"“”()<>|$#]", "", filename) # not filename safe chars
filename = re.sub(rf"[{spacer}]{{2,}}", spacer, filename) # remove extra neighbouring (spacer)s
return filename
@ -94,26 +120,34 @@ def get_boxes(data: bytes, box_type: bytes, as_bytes: bool = False) -> Box:
"""Scan a byte array for a wanted box, then parse and yield each find."""
# using slicing to get to the wanted box is done because parsing the entire box and recursively
# scanning through each box and its children often wouldn't scan far enough to reach the wanted box.
# since it doesnt care what child box the wanted box is from, this works fine.
# since it doesn't care what child box the wanted box is from, this works fine.
if not isinstance(data, (bytes, bytearray)):
raise ValueError("data must be bytes")
offset = 0
while True:
try:
index = data.index(box_type)
index = data[offset:].index(box_type)
except ValueError:
break
if index < 0:
break
if index > 4:
index -= 4 # size is before box type and is 4 bytes long
data = data[index:]
index -= 4 # size is before box type and is 4 bytes long
try:
box = Box.parse(data)
box = Box.parse(data[offset:][index:])
except IOError:
# TODO: Does this miss any data we may need?
# since get_init_segment might cut off unexpectedly, pymp4 may be unable to read
# the expected amounts of data and complain, so let's just end the function here
break
except ValidationError as e:
if box_type == b"tenc":
# ignore this error on tenc boxes as the tenc definition isn't consistent,
# some services don't even put valid data and mix it up with avc1...
continue
raise e
if as_bytes:
box = Box.build(box)
offset += index + len(Box.build(box))
yield box
@ -157,35 +191,102 @@ def get_ip_info(session: Optional[requests.Session] = None) -> dict:
return (session or requests.Session()).get("https://ipinfo.io/json").json()
@contextlib.asynccontextmanager
async def start_pproxy(proxy: str) -> AsyncIterator[str]:
proxy = urlparse(proxy)
def time_elapsed_since(start: float) -> str:
"""
Get time elapsed since a timestamp as a string.
E.g., `1h56m2s`, `15m12s`, `0m55s`, e.t.c.
"""
elapsed = int(time.time() - start)
scheme = {
"https": "http+ssl",
"socks5h": "socks"
}.get(proxy.scheme, proxy.scheme)
minutes, seconds = divmod(elapsed, 60)
hours, minutes = divmod(minutes, 60)
remote_server = f"{scheme}://{proxy.hostname}"
if proxy.port:
remote_server += f":{proxy.port}"
if proxy.username or proxy.password:
remote_server += "#"
if proxy.username:
remote_server += proxy.username
if proxy.password:
remote_server += f":{proxy.password}"
time_string = f"{minutes:d}m{seconds:d}s"
if hours:
time_string = f"{hours:d}h{time_string}"
server = pproxy.Server("http://localhost:0") # random port
remote = pproxy.Connection(remote_server)
handler = await server.start_server({"rserver": [remote]})
return time_string
def try_ensure_utf8(data: bytes) -> bytes:
"""
Try to ensure that the given data is encoded in UTF-8.
Parameters:
data: Input data that may or may not yet be UTF-8 or another encoding.
Returns the input data encoded in UTF-8 if successful. If unable to detect the
encoding of the input data, then the original data is returned as-received.
"""
try:
port = handler.sockets[0].getsockname()[1]
yield f"http://localhost:{port}"
finally:
handler.close()
await handler.wait_closed()
data.decode("utf8")
return data
except UnicodeDecodeError:
try:
# CP-1252 is a superset of latin1
return data.decode("cp1252").encode("utf8")
except UnicodeDecodeError:
try:
# last ditch effort to detect encoding
detection_result = chardet.detect(data)
if not detection_result["encoding"]:
return data
return data.decode(detection_result["encoding"]).encode("utf8")
except UnicodeDecodeError:
return data
def get_free_port() -> int:
"""
Get an available port to use between a-b (inclusive).
The port is freed as soon as this has returned, therefore, it
is possible for the port to be taken before you try to use it.
"""
with contextlib.closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as s:
s.bind(("", 0))
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
return s.getsockname()[1]
def get_extension(value: Union[str, Path, ParseResult]) -> Optional[str]:
"""
Get a URL or Path file extension/suffix.
Note: The returned value will begin with `.`.
"""
if isinstance(value, ParseResult):
value_parsed = value
elif isinstance(value, (str, Path)):
value_parsed = urlparse(str(value))
else:
raise TypeError(f"Expected {str}, {Path}, or {ParseResult}, got {type(value)}")
if value_parsed.path:
ext = os.path.splitext(value_parsed.path)[1]
if ext and ext != ".":
return ext
def get_system_fonts() -> dict[str, Path]:
if sys.platform == "win32":
import winreg
with winreg.ConnectRegistry(None, winreg.HKEY_LOCAL_MACHINE) as reg:
key = winreg.OpenKey(
reg,
r"SOFTWARE\Microsoft\Windows NT\CurrentVersion\Fonts",
0,
winreg.KEY_READ
)
total_fonts = winreg.QueryInfoKey(key)[1]
return {
name.replace(" (TrueType)", ""): Path(r"C:\Windows\Fonts", filename)
for n in range(0, total_fonts)
for name, filename, _ in [winreg.EnumValue(key, n)]
}
else:
# TODO: Get System Fonts for Linux and mac OS
return {}
class FPS(ast.NodeVisitor):

View File

@ -1,105 +0,0 @@
"""
AtomicSQL - Race-condition and Threading safe SQL Database Interface.
Copyright (C) 2020-2023 rlaphoenix
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
import os
import sqlite3
import time
from threading import Lock
from typing import Any, Callable, Union
import pymysql.cursors
Connections = Union[sqlite3.Connection, pymysql.connections.Connection]
Cursors = Union[sqlite3.Cursor, pymysql.cursors.Cursor]
class AtomicSQL:
"""
Race-condition and Threading safe SQL Database Interface.
"""
def __init__(self) -> None:
self.master_lock = Lock() # prevents race condition
self.db: dict[bytes, Connections] = {} # used to hold the database connections and commit changes and such
self.cursor: dict[bytes, Cursors] = {} # used to execute queries and receive results
self.session_lock: dict[bytes, Lock] = {} # like master_lock, but per-session
def load(self, connection: Connections) -> bytes:
"""
Store SQL Connection object and return a reference ticket.
:param connection: SQLite3 or pymysql Connection object.
:returns: Session ID in which the database connection is referenced with.
"""
self.master_lock.acquire()
try:
# obtain a unique cryptographically random session_id
session_id = None
while not session_id or session_id in self.db:
session_id = os.urandom(16)
self.db[session_id] = connection
self.cursor[session_id] = self.db[session_id].cursor()
self.session_lock[session_id] = Lock()
return session_id
finally:
self.master_lock.release()
def safe_execute(self, session_id: bytes, action: Callable) -> Any:
"""
Execute code on the Database Connection in a race-condition safe way.
:param session_id: Database Connection's Session ID.
:param action: Function or lambda in which to execute, it's provided `db` and `cursor` arguments.
:returns: Whatever `action` returns.
"""
if session_id not in self.db:
raise ValueError(f"Session ID {session_id!r} is invalid.")
self.master_lock.acquire()
self.session_lock[session_id].acquire()
try:
failures = 0
while True:
try:
action(
db=self.db[session_id],
cursor=self.cursor[session_id]
)
break
except sqlite3.OperationalError as e:
failures += 1
delay = 3 * failures
print(f"AtomicSQL.safe_execute failed, {e}, retrying in {delay} seconds...")
time.sleep(delay)
if failures == 10:
raise ValueError("AtomicSQL.safe_execute failed too many time's. Aborting.")
return self.cursor[session_id]
finally:
self.session_lock[session_id].release()
self.master_lock.release()
def commit(self, session_id: bytes) -> bool:
"""
Commit changes to the Database Connection immediately.
This isn't necessary to be run every time you make changes, just ensure it's run
at least before termination.
:param session_id: Database Connection's Session ID.
:returns: True if it committed.
"""
self.safe_execute(
session_id,
lambda db, cursor: db.commit()
)
return True # todo ; actually check if db.commit worked

View File

@ -1,9 +1,8 @@
from __future__ import annotations
import re
from typing import Optional, Union
from typing import Any, Optional, Union
import click
from click.shell_completion import CompletionItem
from pywidevine.cdm import Cdm as WidevineCdm
@ -96,22 +95,90 @@ class LanguageRange(click.ParamType):
return re.split(r"\s*[,;]\s*", value)
class Quality(click.ParamType):
name = "quality"
class QualityList(click.ParamType):
name = "quality_list"
def convert(self, value: str, param: Optional[click.Parameter] = None, ctx: Optional[click.Context] = None) -> int:
try:
return int(value.lower().rstrip("p"))
except TypeError:
def convert(
self,
value: Union[str, list[str]],
param: Optional[click.Parameter] = None,
ctx: Optional[click.Context] = None
) -> list[int]:
if not value:
return []
if not isinstance(value, list):
value = value.split(",")
resolutions = []
for resolution in value:
try:
resolutions.append(int(resolution.lower().rstrip("p")))
except TypeError:
self.fail(
f"Expected string for int() conversion, got {resolution!r} of type {type(resolution).__name__}",
param,
ctx
)
except ValueError:
self.fail(f"{resolution!r} is not a valid integer", param, ctx)
return sorted(resolutions, reverse=True)
class MultipleChoice(click.Choice):
"""
The multiple choice type allows multiple values to be checked against
a fixed set of supported values.
It internally uses and is based off of click.Choice.
"""
name = "multiple_choice"
def __repr__(self) -> str:
return f"MultipleChoice({list(self.choices)})"
def convert(
self,
value: Any,
param: Optional[click.Parameter] = None,
ctx: Optional[click.Context] = None
) -> list[Any]:
if not value:
return []
if isinstance(value, str):
values = value.split(",")
elif isinstance(value, list):
values = value
else:
self.fail(
f"expected string for int() conversion, got {value!r} of type {type(value).__name__}",
f"{value!r} is not a supported value.",
param,
ctx
)
except ValueError:
self.fail(f"{value!r} is not a valid integer", param, ctx)
chosen_values: list[Any] = []
for value in values:
chosen_values.append(super().convert(value, param, ctx))
return chosen_values
def shell_complete(
self,
ctx: click.Context,
param: click.Parameter,
incomplete: str
) -> list[CompletionItem]:
"""
Complete choices that start with the incomplete value.
Parameters:
ctx: Invocation context for this command.
param: The parameter that is requesting completion.
incomplete: Value being completed. May be empty.
"""
incomplete = incomplete.rsplit(",")[-1]
return super(self).shell_complete(ctx, param, incomplete)
SEASON_RANGE = SeasonRange()
LANGUAGE_RANGE = LanguageRange()
QUALITY = Quality()
QUALITY_LIST = QualityList()

View File

@ -3,11 +3,16 @@ import subprocess
from pathlib import Path
from typing import Union
from devine.core import binaries
def ffprobe(uri: Union[bytes, Path]) -> dict:
"""Use ffprobe on the provided data to get stream information."""
if not binaries.FFProbe:
raise EnvironmentError("FFProbe executable \"ffprobe\" not found but is required.")
args = [
"ffprobe",
binaries.FFProbe,
"-v", "quiet",
"-of", "json",
"-show_streams"

191
devine/core/utils/webvtt.py Normal file
View File

@ -0,0 +1,191 @@
import re
import sys
import typing
from typing import Optional
from pycaption import Caption, CaptionList, CaptionNode, CaptionReadError, WebVTTReader, WebVTTWriter
class CaptionListExt(CaptionList):
@typing.no_type_check
def __init__(self, iterable=None, layout_info=None):
self.first_segment_mpegts = 0
super().__init__(iterable, layout_info)
class CaptionExt(Caption):
@typing.no_type_check
def __init__(self, start, end, nodes, style=None, layout_info=None, segment_index=0, mpegts=0, cue_time=0.0):
style = style or {}
self.segment_index: int = segment_index
self.mpegts: float = mpegts
self.cue_time: float = cue_time
super().__init__(start, end, nodes, style, layout_info)
class WebVTTReaderExt(WebVTTReader):
# HLS extension support <https://datatracker.ietf.org/doc/html/rfc8216#section-3.5>
RE_TIMESTAMP_MAP = re.compile(r"X-TIMESTAMP-MAP.*")
RE_MPEGTS = re.compile(r"MPEGTS:(\d+)")
RE_LOCAL = re.compile(r"LOCAL:((?:(\d{1,}):)?(\d{2}):(\d{2})\.(\d{3}))")
def _parse(self, lines: list[str]) -> CaptionList:
captions = CaptionListExt()
start = None
end = None
nodes: list[CaptionNode] = []
layout_info = None
found_timing = False
segment_index = -1
mpegts = 0
cue_time = 0.0
# The first segment MPEGTS is needed to calculate the rest. It is possible that
# the first segment contains no cue and is ignored by pycaption, this acts as a fallback.
captions.first_segment_mpegts = 0
for i, line in enumerate(lines):
if "-->" in line:
found_timing = True
timing_line = i
last_start_time = captions[-1].start if captions else 0
try:
start, end, layout_info = self._parse_timing_line(line, last_start_time)
except CaptionReadError as e:
new_msg = f"{e.args[0]} (line {timing_line})"
tb = sys.exc_info()[2]
raise type(e)(new_msg).with_traceback(tb) from None
elif "" == line:
if found_timing and nodes:
found_timing = False
caption = CaptionExt(
start,
end,
nodes,
layout_info=layout_info,
segment_index=segment_index,
mpegts=mpegts,
cue_time=cue_time,
)
captions.append(caption)
nodes = []
elif "WEBVTT" in line:
# Merged segmented VTT doesn't have index information, track manually.
segment_index += 1
mpegts = 0
cue_time = 0.0
elif m := self.RE_TIMESTAMP_MAP.match(line):
if r := self.RE_MPEGTS.search(m.group()):
mpegts = int(r.group(1))
cue_time = self._parse_local(m.group())
# Early assignment in case the first segment contains no cue.
if segment_index == 0:
captions.first_segment_mpegts = mpegts
else:
if found_timing:
if nodes:
nodes.append(CaptionNode.create_break())
nodes.append(CaptionNode.create_text(self._decode(line)))
else:
# it's a comment or some metadata; ignore it
pass
# Add a last caption if there are remaining nodes
if nodes:
caption = CaptionExt(start, end, nodes, layout_info=layout_info, segment_index=segment_index, mpegts=mpegts)
captions.append(caption)
return captions
@staticmethod
def _parse_local(string: str) -> float:
"""
Parse WebVTT LOCAL time and convert it to seconds.
"""
m = WebVTTReaderExt.RE_LOCAL.search(string)
if not m:
return 0
parsed = m.groups()
if not parsed:
return 0
hours = int(parsed[1])
minutes = int(parsed[2])
seconds = int(parsed[3])
milliseconds = int(parsed[4])
return (milliseconds / 1000) + seconds + (minutes * 60) + (hours * 3600)
def merge_segmented_webvtt(vtt_raw: str, segment_durations: Optional[list[int]] = None, timescale: int = 1) -> str:
"""
Merge Segmented WebVTT data.
Parameters:
vtt_raw: The concatenated WebVTT files to merge. All WebVTT headers must be
appropriately spaced apart, or it may produce unwanted effects like
considering headers as captions, timestamp lines, etc.
segment_durations: A list of each segment's duration. If not provided it will try
to get it from the X-TIMESTAMP-MAP headers, specifically the MPEGTS number.
timescale: The number of time units per second.
This parses the X-TIMESTAMP-MAP data to compute new absolute timestamps, replacing
the old start and end timestamp values. All X-TIMESTAMP-MAP header information will
be removed from the output as they are no longer of concern. Consider this function
the opposite of a WebVTT Segmenter, a WebVTT Joiner of sorts.
Algorithm borrowed from N_m3u8DL-RE and shaka-player.
"""
MPEG_TIMESCALE = 90_000
vtt = WebVTTReaderExt().read(vtt_raw)
for lang in vtt.get_languages():
prev_caption = None
duplicate_index: list[int] = []
captions = vtt.get_captions(lang)
if captions[0].segment_index == 0:
first_segment_mpegts = captions[0].mpegts
else:
first_segment_mpegts = segment_durations[0] if segment_durations else captions.first_segment_mpegts
caption: CaptionExt
for i, caption in enumerate(captions):
# DASH WebVTT doesn't have MPEGTS timestamp like HLS. Instead,
# calculate the timestamp from SegmentTemplate/SegmentList duration.
likely_dash = first_segment_mpegts == 0 and caption.mpegts == 0
if likely_dash and segment_durations:
duration = segment_durations[caption.segment_index]
caption.mpegts = MPEG_TIMESCALE * (duration / timescale)
if caption.mpegts == 0:
continue
seconds = (caption.mpegts - first_segment_mpegts) / MPEG_TIMESCALE - caption.cue_time
offset = seconds * 1_000_000 # pycaption use microseconds
if caption.start < offset:
caption.start += offset
caption.end += offset
# If the difference between current and previous captions is <=1ms
# and the payload is equal then splice.
if (
prev_caption
and not caption.is_empty()
and (caption.start - prev_caption.end) <= 1000 # 1ms in microseconds
and caption.get_text() == prev_caption.get_text()
):
prev_caption.end = caption.end
duplicate_index.append(i)
prev_caption = caption
# Remove duplicate
captions[:] = [c for c_index, c in enumerate(captions) if c_index not in set(duplicate_index)]
return WebVTTWriter().write(vtt)

View File

@ -1,5 +1,3 @@
from __future__ import annotations
from abc import ABCMeta, abstractmethod
from typing import Iterator, Optional, Union
from uuid import UUID
@ -31,11 +29,11 @@ class Vault(metaclass=ABCMeta):
"""Get All Keys from Vault by Service."""
@abstractmethod
def add_key(self, service: str, kid: Union[UUID, str], key: str, commit: bool = False) -> bool:
def add_key(self, service: str, kid: Union[UUID, str], key: str) -> bool:
"""Add KID:KEY to the Vault."""
@abstractmethod
def add_keys(self, service: str, kid_keys: dict[Union[UUID, str], str], commit: bool = False) -> int:
def add_keys(self, service: str, kid_keys: dict[Union[UUID, str], str]) -> int:
"""
Add Multiple Content Keys with Key IDs for Service to the Vault.
Pre-existing Content Keys are ignored/skipped.
@ -47,4 +45,4 @@ class Vault(metaclass=ABCMeta):
"""Get a list of Service Tags from Vault."""
__ALL__ = (Vault,)
__all__ = ("Vault",)

View File

@ -1,11 +1,9 @@
from __future__ import annotations
from typing import Iterator, Optional, Union, Any
from typing import Any, Iterator, Optional, Union
from uuid import UUID
from devine.core.vault import Vault
from devine.core.config import config
from devine.core.utilities import import_module_by_path
from devine.core.vault import Vault
_VAULTS = sorted(
(
@ -57,7 +55,7 @@ class Vaults:
for vault in self.vaults:
if vault != excluding:
try:
success += vault.add_key(self.service, kid, key, commit=True)
success += vault.add_key(self.service, kid, key)
except (PermissionError, NotImplementedError):
pass
return success
@ -70,10 +68,10 @@ class Vaults:
success = 0
for vault in self.vaults:
try:
success += bool(vault.add_keys(self.service, kid_keys, commit=True))
success += bool(vault.add_keys(self.service, kid_keys))
except (PermissionError, NotImplementedError):
pass
return success
__ALL__ = (Vaults,)
__all__ = ("Vaults",)

214
devine/vaults/API.py Normal file
View File

@ -0,0 +1,214 @@
from typing import Iterator, Optional, Union
from uuid import UUID
from requests import Session
from devine.core import __version__
from devine.core.vault import Vault
class API(Vault):
"""Key Vault using a simple RESTful HTTP API call."""
def __init__(self, name: str, uri: str, token: str):
super().__init__(name)
self.uri = uri.rstrip("/")
self.session = Session()
self.session.headers.update({
"User-Agent": f"Devine v{__version__}"
})
self.session.headers.update({
"Authorization": f"Bearer {token}"
})
def get_key(self, kid: Union[UUID, str], service: str) -> Optional[str]:
if isinstance(kid, UUID):
kid = kid.hex
data = self.session.get(
url=f"{self.uri}/{service.lower()}/{kid}",
headers={
"Accept": "application/json"
}
).json()
code = int(data.get("code", 0))
message = data.get("message")
error = {
0: None,
1: Exceptions.AuthRejected,
2: Exceptions.TooManyRequests,
3: Exceptions.ServiceTagInvalid,
4: Exceptions.KeyIdInvalid
}.get(code, ValueError)
if error:
raise error(f"{message} ({code})")
content_key = data.get("content_key")
if not content_key:
return None
if not isinstance(content_key, str):
raise ValueError(f"Expected {content_key} to be {str}, was {type(content_key)}")
return content_key
def get_keys(self, service: str) -> Iterator[tuple[str, str]]:
page = 1
while True:
data = self.session.get(
url=f"{self.uri}/{service.lower()}",
params={
"page": page,
"total": 10
},
headers={
"Accept": "application/json"
}
).json()
code = int(data.get("code", 0))
message = data.get("message")
error = {
0: None,
1: Exceptions.AuthRejected,
2: Exceptions.TooManyRequests,
3: Exceptions.PageInvalid,
4: Exceptions.ServiceTagInvalid,
}.get(code, ValueError)
if error:
raise error(f"{message} ({code})")
content_keys = data.get("content_keys")
if content_keys:
if not isinstance(content_keys, dict):
raise ValueError(f"Expected {content_keys} to be {dict}, was {type(content_keys)}")
for key_id, key in content_keys.items():
yield key_id, key
pages = int(data["pages"])
if pages <= page:
break
page += 1
def add_key(self, service: str, kid: Union[UUID, str], key: str) -> bool:
if isinstance(kid, UUID):
kid = kid.hex
data = self.session.post(
url=f"{self.uri}/{service.lower()}/{kid}",
json={
"content_key": key
},
headers={
"Accept": "application/json"
}
).json()
code = int(data.get("code", 0))
message = data.get("message")
error = {
0: None,
1: Exceptions.AuthRejected,
2: Exceptions.TooManyRequests,
3: Exceptions.ServiceTagInvalid,
4: Exceptions.KeyIdInvalid,
5: Exceptions.ContentKeyInvalid
}.get(code, ValueError)
if error:
raise error(f"{message} ({code})")
# the kid:key was new to the vault (optional)
added = bool(data.get("added"))
# the key for kid was changed/updated (optional)
updated = bool(data.get("updated"))
return added or updated
def add_keys(self, service: str, kid_keys: dict[Union[UUID, str], str]) -> int:
data = self.session.post(
url=f"{self.uri}/{service.lower()}",
json={
"content_keys": {
str(kid).replace("-", ""): key
for kid, key in kid_keys.items()
}
},
headers={
"Accept": "application/json"
}
).json()
code = int(data.get("code", 0))
message = data.get("message")
error = {
0: None,
1: Exceptions.AuthRejected,
2: Exceptions.TooManyRequests,
3: Exceptions.ServiceTagInvalid,
4: Exceptions.KeyIdInvalid,
5: Exceptions.ContentKeyInvalid
}.get(code, ValueError)
if error:
raise error(f"{message} ({code})")
# each kid:key that was new to the vault (optional)
added = int(data.get("added"))
# each key for a kid that was changed/updated (optional)
updated = int(data.get("updated"))
return added + updated
def get_services(self) -> Iterator[str]:
data = self.session.post(
url=self.uri,
headers={
"Accept": "application/json"
}
).json()
code = int(data.get("code", 0))
message = data.get("message")
error = {
0: None,
1: Exceptions.AuthRejected,
2: Exceptions.TooManyRequests,
}.get(code, ValueError)
if error:
raise error(f"{message} ({code})")
service_list = data.get("service_list", [])
if not isinstance(service_list, list):
raise ValueError(f"Expected {service_list} to be {list}, was {type(service_list)}")
for service in service_list:
yield service
class Exceptions:
class AuthRejected(Exception):
"""Authentication Error Occurred, is your token valid? Do you have permission to make this call?"""
class TooManyRequests(Exception):
"""Rate Limited; Sent too many requests in a given amount of time."""
class PageInvalid(Exception):
"""Requested page does not exist."""
class ServiceTagInvalid(Exception):
"""The Service Tag is invalid."""
class KeyIdInvalid(Exception):
"""The Key ID is invalid."""
class ContentKeyInvalid(Exception):
"""The Content Key is invalid."""

View File

@ -1,5 +1,4 @@
from __future__ import annotations
import threading
from typing import Iterator, Optional, Union
from uuid import UUID
@ -7,7 +6,6 @@ import pymysql
from pymysql.cursors import DictCursor
from devine.core.services import Services
from devine.core.utils.atomicsql import AtomicSQL
from devine.core.vault import Vault
@ -21,15 +19,13 @@ class MySQL(Vault):
"""
super().__init__(name)
self.slug = f"{host}:{database}:{username}"
self.con = pymysql.connect(
self.conn_factory = ConnectionFactory(dict(
host=host,
db=database,
user=username,
cursorclass=DictCursor,
**kwargs
)
self.adb = AtomicSQL()
self.ticket = self.adb.load(self.con)
))
self.permissions = self.get_permissions()
if not self.has_permission("SELECT"):
@ -43,37 +39,42 @@ class MySQL(Vault):
if isinstance(kid, UUID):
kid = kid.hex
c = self.adb.safe_execute(
self.ticket,
lambda db, cursor: cursor.execute(
conn = self.conn_factory.get()
cursor = conn.cursor()
try:
cursor.execute(
# TODO: SQL injection risk
f"SELECT `id`, `key_` FROM `{service}` WHERE `kid`=%s AND `key_`!=%s",
[kid, "0" * 32]
(kid, "0" * 32)
)
).fetchone()
if not c:
return None
return c["key_"]
cek = cursor.fetchone()
if not cek:
return None
return cek["key_"]
finally:
cursor.close()
def get_keys(self, service: str) -> Iterator[tuple[str, str]]:
if not self.has_table(service):
# no table, no keys, simple
return None
c = self.adb.safe_execute(
self.ticket,
lambda db, cursor: cursor.execute(
conn = self.conn_factory.get()
cursor = conn.cursor()
try:
cursor.execute(
# TODO: SQL injection risk
f"SELECT `kid`, `key_` FROM `{service}` WHERE `key_`!=%s",
["0" * 32]
("0" * 32,)
)
)
for row in cursor.fetchall():
yield row["kid"], row["key_"]
finally:
cursor.close()
for row in c.fetchall():
yield row["kid"], row["key_"]
def add_key(self, service: str, kid: Union[UUID, str], key: str, commit: bool = False) -> bool:
def add_key(self, service: str, kid: Union[UUID, str], key: str) -> bool:
if not key or key.count("0") == len(key):
raise ValueError("You cannot add a NULL Content Key to a Vault.")
@ -82,39 +83,37 @@ class MySQL(Vault):
if not self.has_table(service):
try:
self.create_table(service, commit)
self.create_table(service)
except PermissionError:
return False
if isinstance(kid, UUID):
kid = kid.hex
if self.adb.safe_execute(
self.ticket,
lambda db, cursor: cursor.execute(
conn = self.conn_factory.get()
cursor = conn.cursor()
try:
cursor.execute(
# TODO: SQL injection risk
f"SELECT `id` FROM `{service}` WHERE `kid`=%s AND `key_`=%s",
[kid, key]
(kid, key)
)
).fetchone():
# table already has this exact KID:KEY stored
return True
self.adb.safe_execute(
self.ticket,
lambda db, cursor: cursor.execute(
if cursor.fetchone():
# table already has this exact KID:KEY stored
return True
cursor.execute(
# TODO: SQL injection risk
f"INSERT INTO `{service}` (kid, key_) VALUES (%s, %s)",
(kid, key)
)
)
if commit:
self.commit()
finally:
conn.commit()
cursor.close()
return True
def add_keys(self, service: str, kid_keys: dict[Union[UUID, str], str], commit: bool = False) -> int:
def add_keys(self, service: str, kid_keys: dict[Union[UUID, str], str]) -> int:
for kid, key in kid_keys.items():
if not key or key.count("0") == len(key):
raise ValueError("You cannot add a NULL Content Key to a Vault.")
@ -124,7 +123,7 @@ class MySQL(Vault):
if not self.has_table(service):
try:
self.create_table(service, commit)
self.create_table(service)
except PermissionError:
return 0
@ -139,40 +138,47 @@ class MySQL(Vault):
for kid, key_ in kid_keys.items()
}
c = self.adb.safe_execute(
self.ticket,
lambda db, cursor: cursor.executemany(
conn = self.conn_factory.get()
cursor = conn.cursor()
try:
cursor.executemany(
# TODO: SQL injection risk
f"INSERT IGNORE INTO `{service}` (kid, key_) VALUES (%s, %s)",
kid_keys.items()
)
)
if commit:
self.commit()
return c.rowcount
return cursor.rowcount
finally:
conn.commit()
cursor.close()
def get_services(self) -> Iterator[str]:
c = self.adb.safe_execute(
self.ticket,
lambda db, cursor: cursor.execute("SHOW TABLES")
)
for table in c.fetchall():
# each entry has a key named `Tables_in_<db name>`
yield Services.get_tag(list(table.values())[0])
conn = self.conn_factory.get()
cursor = conn.cursor()
try:
cursor.execute("SHOW TABLES")
for table in cursor.fetchall():
# each entry has a key named `Tables_in_<db name>`
yield Services.get_tag(list(table.values())[0])
finally:
cursor.close()
def has_table(self, name: str) -> bool:
"""Check if the Vault has a Table with the specified name."""
return list(self.adb.safe_execute(
self.ticket,
lambda db, cursor: cursor.execute(
"SELECT count(TABLE_NAME) FROM information_schema.TABLES WHERE TABLE_SCHEMA=%s AND TABLE_NAME=%s",
[self.con.db, name]
)
).fetchone().values())[0] == 1
conn = self.conn_factory.get()
cursor = conn.cursor()
def create_table(self, name: str, commit: bool = False):
try:
cursor.execute(
"SELECT count(TABLE_NAME) FROM information_schema.TABLES WHERE TABLE_SCHEMA=%s AND TABLE_NAME=%s",
(conn.db, name)
)
return list(cursor.fetchone().values())[0] == 1
finally:
cursor.close()
def create_table(self, name: str):
"""Create a Table with the specified name if not yet created."""
if self.has_table(name):
return
@ -180,9 +186,11 @@ class MySQL(Vault):
if not self.has_permission("CREATE"):
raise PermissionError(f"MySQL vault {self.slug} has no CREATE permission.")
self.adb.safe_execute(
self.ticket,
lambda db, cursor: cursor.execute(
conn = self.conn_factory.get()
cursor = conn.cursor()
try:
cursor.execute(
# TODO: SQL injection risk
f"""
CREATE TABLE IF NOT EXISTS {name} (
@ -193,23 +201,28 @@ class MySQL(Vault):
);
"""
)
)
if commit:
self.commit()
finally:
conn.commit()
cursor.close()
def get_permissions(self) -> list:
"""Get and parse Grants to a more easily usable list tuple array."""
with self.con.cursor() as c:
c.execute("SHOW GRANTS")
grants = c.fetchall()
conn = self.conn_factory.get()
cursor = conn.cursor()
try:
cursor.execute("SHOW GRANTS")
grants = cursor.fetchall()
grants = [next(iter(x.values())) for x in grants]
grants = [tuple(x[6:].split(" TO ")[0].split(" ON ")) for x in list(grants)]
grants = [(
list(map(str.strip, perms.replace("ALL PRIVILEGES", "*").split(","))),
location.replace("`", "").split(".")
) for perms, location in grants]
return grants
grants = [tuple(x[6:].split(" TO ")[0].split(" ON ")) for x in list(grants)]
grants = [(
list(map(str.strip, perms.replace("ALL PRIVILEGES", "*").split(","))),
location.replace("`", "").split(".")
) for perms, location in grants]
return grants
finally:
conn.commit()
cursor.close()
def has_permission(self, operation: str, database: Optional[str] = None, table: Optional[str] = None) -> bool:
"""Check if the current connection has a specific permission."""
@ -220,6 +233,16 @@ class MySQL(Vault):
grants = [x for x in grants if x[1][1] in (table, "*")]
return bool(grants)
def commit(self):
"""Commit any changes made that has not been written to db."""
self.adb.commit(self.ticket)
class ConnectionFactory:
def __init__(self, con: dict):
self._con = con
self._store = threading.local()
def _create_connection(self) -> pymysql.Connection:
return pymysql.connect(**self._con)
def get(self) -> pymysql.Connection:
if not hasattr(self._store, "conn"):
self._store.conn = self._create_connection()
return self._store.conn

View File

@ -1,12 +1,11 @@
from __future__ import annotations
import sqlite3
import threading
from pathlib import Path
from sqlite3 import Connection
from typing import Iterator, Optional, Union
from uuid import UUID
from devine.core.services import Services
from devine.core.utils.atomicsql import AtomicSQL
from devine.core.vault import Vault
@ -17,9 +16,7 @@ class SQLite(Vault):
super().__init__(name)
self.path = Path(path).expanduser()
# TODO: Use a DictCursor or such to get fetches as dict?
self.con = sqlite3.connect(self.path)
self.adb = AtomicSQL()
self.ticket = self.adb.load(self.con)
self.conn_factory = ConnectionFactory(self.path)
def get_key(self, kid: Union[UUID, str], service: str) -> Optional[str]:
if not self.has_table(service):
@ -29,76 +26,79 @@ class SQLite(Vault):
if isinstance(kid, UUID):
kid = kid.hex
c = self.adb.safe_execute(
self.ticket,
lambda db, cursor: cursor.execute(
# TODO: SQL injection risk
f"SELECT `id`, `key_` FROM `{service}` WHERE `kid`=? AND `key_`!=?",
[kid, "0" * 32]
)
).fetchone()
if not c:
return None
conn = self.conn_factory.get()
cursor = conn.cursor()
return c[1] # `key_`
try:
cursor.execute(
f"SELECT `id`, `key_` FROM `{service}` WHERE `kid`=? AND `key_`!=?",
(kid, "0" * 32)
)
cek = cursor.fetchone()
if not cek:
return None
return cek[1]
finally:
cursor.close()
def get_keys(self, service: str) -> Iterator[tuple[str, str]]:
if not self.has_table(service):
# no table, no keys, simple
return None
c = self.adb.safe_execute(
self.ticket,
lambda db, cursor: cursor.execute(
# TODO: SQL injection risk
f"SELECT `kid`, `key_` FROM `{service}` WHERE `key_`!=?",
["0" * 32]
)
)
for (kid, key_) in c.fetchall():
yield kid, key_
def add_key(self, service: str, kid: Union[UUID, str], key: str, commit: bool = False) -> bool:
conn = self.conn_factory.get()
cursor = conn.cursor()
try:
cursor.execute(
f"SELECT `kid`, `key_` FROM `{service}` WHERE `key_`!=?",
("0" * 32,)
)
for (kid, key_) in cursor.fetchall():
yield kid, key_
finally:
cursor.close()
def add_key(self, service: str, kid: Union[UUID, str], key: str) -> bool:
if not key or key.count("0") == len(key):
raise ValueError("You cannot add a NULL Content Key to a Vault.")
if not self.has_table(service):
self.create_table(service, commit)
self.create_table(service)
if isinstance(kid, UUID):
kid = kid.hex
if self.adb.safe_execute(
self.ticket,
lambda db, cursor: cursor.execute(
conn = self.conn_factory.get()
cursor = conn.cursor()
try:
cursor.execute(
# TODO: SQL injection risk
f"SELECT `id` FROM `{service}` WHERE `kid`=? AND `key_`=?",
[kid, key]
(kid, key)
)
).fetchone():
# table already has this exact KID:KEY stored
return True
self.adb.safe_execute(
self.ticket,
lambda db, cursor: cursor.execute(
if cursor.fetchone():
# table already has this exact KID:KEY stored
return True
cursor.execute(
# TODO: SQL injection risk
f"INSERT INTO `{service}` (kid, key_) VALUES (?, ?)",
(kid, key)
)
)
if commit:
self.commit()
finally:
conn.commit()
cursor.close()
return True
def add_keys(self, service: str, kid_keys: dict[Union[UUID, str], str], commit: bool = False) -> int:
def add_keys(self, service: str, kid_keys: dict[Union[UUID, str], str]) -> int:
for kid, key in kid_keys.items():
if not key or key.count("0") == len(key):
raise ValueError("You cannot add a NULL Content Key to a Vault.")
if not self.has_table(service):
self.create_table(service, commit)
self.create_table(service)
if not isinstance(kid_keys, dict):
raise ValueError(f"The kid_keys provided is not a dictionary, {kid_keys!r}")
@ -111,47 +111,56 @@ class SQLite(Vault):
for kid, key_ in kid_keys.items()
}
c = self.adb.safe_execute(
self.ticket,
lambda db, cursor: cursor.executemany(
conn = self.conn_factory.get()
cursor = conn.cursor()
try:
cursor.executemany(
# TODO: SQL injection risk
f"INSERT OR IGNORE INTO `{service}` (kid, key_) VALUES (?, ?)",
kid_keys.items()
)
)
if commit:
self.commit()
return c.rowcount
return cursor.rowcount
finally:
conn.commit()
cursor.close()
def get_services(self) -> Iterator[str]:
c = self.adb.safe_execute(
self.ticket,
lambda db, cursor: cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
)
for (name,) in c.fetchall():
if name != "sqlite_sequence":
yield Services.get_tag(name)
conn = self.conn_factory.get()
cursor = conn.cursor()
try:
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
for (name,) in cursor.fetchall():
if name != "sqlite_sequence":
yield Services.get_tag(name)
finally:
cursor.close()
def has_table(self, name: str) -> bool:
"""Check if the Vault has a Table with the specified name."""
return self.adb.safe_execute(
self.ticket,
lambda db, cursor: cursor.execute(
"SELECT count(name) FROM sqlite_master WHERE type='table' AND name=?",
[name]
)
).fetchone()[0] == 1
conn = self.conn_factory.get()
cursor = conn.cursor()
def create_table(self, name: str, commit: bool = False):
try:
cursor.execute(
"SELECT count(name) FROM sqlite_master WHERE type='table' AND name=?",
(name,)
)
return cursor.fetchone()[0] == 1
finally:
cursor.close()
def create_table(self, name: str):
"""Create a Table with the specified name if not yet created."""
if self.has_table(name):
return
self.adb.safe_execute(
self.ticket,
lambda db, cursor: cursor.execute(
conn = self.conn_factory.get()
cursor = conn.cursor()
try:
cursor.execute(
# TODO: SQL injection risk
f"""
CREATE TABLE IF NOT EXISTS {name} (
@ -163,11 +172,20 @@ class SQLite(Vault):
);
"""
)
)
finally:
conn.commit()
cursor.close()
if commit:
self.commit()
def commit(self):
"""Commit any changes made that has not been written to db."""
self.adb.commit(self.ticket)
class ConnectionFactory:
def __init__(self, path: Union[str, Path]):
self._path = path
self._store = threading.local()
def _create_connection(self) -> Connection:
return sqlite3.connect(self._path)
def get(self) -> Connection:
if not hasattr(self._store, "conn"):
self._store.conn = self._create_connection()
return self._store.conn

2527
poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -1,78 +1,95 @@
[build-system]
requires = ['poetry-core>=1.0.0']
build-backend = 'poetry.core.masonry.api'
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
[tool.poetry]
name = 'devine'
version = '1.0.0'
description = 'Open-Source Movie, TV, and Music Downloading Solution'
license = 'GPL-3.0-only'
authors = ['rlaphoenix <rlaphoenix@pm.me>']
readme = 'README.md'
homepage = 'https://github.com/devine/devine'
repository = 'https://github.com/devine/devine'
keywords = ['widevine', 'drm', 'downloader']
name = "devine"
version = "3.3.3"
description = "Modular Movie, TV, and Music Archival Software."
license = "GPL-3.0-only"
authors = ["rlaphoenix <rlaphoenix@pm.me>"]
readme = "README.md"
homepage = "https://github.com/devine-dl/devine"
repository = "https://github.com/devine-dl/devine"
keywords = ["python", "downloader", "drm", "widevine"]
classifiers = [
'Development Status :: 4 - Beta',
'Environment :: Console',
'Intended Audience :: End Users/Desktop',
'Natural Language :: English',
'Operating System :: OS Independent',
'Topic :: Multimedia :: Video',
'Topic :: Security :: Cryptography',
"Development Status :: 4 - Beta",
"Environment :: Console",
"Intended Audience :: End Users/Desktop",
"Natural Language :: English",
"Operating System :: OS Independent",
"Topic :: Multimedia :: Video",
"Topic :: Security :: Cryptography",
]
include = [
{ path = "CHANGELOG.md", format = "sdist" },
{ path = "README.md", format = "sdist" },
{ path = "LICENSE", format = "sdist" },
]
[tool.poetry.urls]
"Issues" = "https://github.com/devine-dl/devine/issues"
"Discussions" = "https://github.com/devine-dl/devine/discussions"
"Changelog" = "https://github.com/devine-dl/devine/blob/master/CHANGELOG.md"
[tool.poetry.dependencies]
python = ">=3.8.6,<3.12"
python = ">=3.9,<4.0"
appdirs = "^1.4.4"
Brotli = "^1.0.9"
click = "^8.1.3"
colorama = "^0.4.6"
coloredlogs = "^15.0.1"
Brotli = "^1.1.0"
click = "^8.1.7"
construct = "^2.8.8"
crccheck = "^1.3.0"
jsonpickle = "^3.0.1"
langcodes = { extras = ["data"], version = "^3.3.0" }
lxml = "^4.9.2"
m3u8 = "^3.4.0"
pproxy = "^2.7.8"
protobuf = "4.21.6"
pycaption = "^2.1.1"
pycryptodomex = "^3.17.0"
pyjwt = "^2.6.0"
pymediainfo = "^6.0.1"
pymp4 = "^1.2.0"
pymysql = "^1.0.2"
pywidevine = { extras = ["serve"], version = "^1.6.0" }
PyYAML = "^6.0"
requests = { extras = ["socks"], version = "^2.28.2" }
"ruamel.yaml" = "^0.17.21"
jsonpickle = "^3.0.4"
langcodes = { extras = ["data"], version = "^3.4.0" }
lxml = "^5.2.1"
pproxy = "^2.7.9"
protobuf = "^4.25.3"
pycaption = "^2.2.6"
pycryptodomex = "^3.20.0"
pyjwt = "^2.8.0"
pymediainfo = "^6.1.0"
pymp4 = "^1.4.0"
pymysql = "^1.1.0"
pywidevine = { extras = ["serve"], version = "^1.8.0" }
PyYAML = "^6.0.1"
requests = { extras = ["socks"], version = "^2.31.0" }
rich = "^13.7.1"
"rlaphoenix.m3u8" = "^3.4.0"
"ruamel.yaml" = "^0.18.6"
sortedcontainers = "^2.4.0"
subtitle-filter = "^1.4.4"
tqdm = "^4.64.1"
Unidecode = "^1.3.6"
urllib3 = "^1.26.14"
subtitle-filter = "^1.4.9"
Unidecode = "^1.3.8"
urllib3 = "^2.2.1"
chardet = "^5.2.0"
curl-cffi = "^0.7.0b4"
[tool.poetry.dev-dependencies]
pre-commit = "^3.0.4"
mypy = "^0.991"
mypy-protobuf = "^3.3.0"
types-protobuf = "^3.19.22"
types-PyMySQL = "^1.0.19.2"
types-requests = "^2.28.11.8"
isort = "^5.12.0"
pre-commit = "^3.7.0"
mypy = "^1.9.0"
mypy-protobuf = "^3.6.0"
types-protobuf = "^4.24.0.20240408"
types-PyMySQL = "^1.1.0.1"
types-requests = "^2.31.0.20240406"
isort = "^5.13.2"
ruff = "~0.3.7"
[tool.poetry.scripts]
devine = 'devine.core.__main__:main'
devine = "devine.core.__main__:main"
[tool.ruff]
force-exclude = true
line-length = 120
[tool.ruff.lint]
select = ["E4", "E7", "E9", "F", "W"]
[tool.isort]
line_length = 120
line_length = 118
[tool.mypy]
exclude = '_pb2\.pyi?$'
check_untyped_defs = true
disallow_incomplete_defs = true
disallow_untyped_defs = true
follow_imports = 'silent'
follow_imports = "silent"
ignore_missing_imports = true
no_implicit_optional = true