97 Commits

Author SHA1 Message Date
rlaphoenix
10285c3819 feat(dl): Add *new* --workers to set download threads/workers
The previously named --workers which is now --downloads specified how many tracks to download, not how many threads/workers are used per-download.

It defaults to nothing, which each downloader then has their own defaults. All current downloaders though currently default to `min(32, (os.cpu_count() or 1) + 4)`, which is also the default for `ThreadPoolExecutor` in general.

This also brings a side effect of changing DASH and HLS's forced max_workers of 16 to now a more appropriate default but more importantly actually configurable. You can set a default in your config under `dl.workers`.
2024-04-03 00:58:47 +01:00
rlaphoenix
5a12cb33e2 refactor(Track): Move from OnXyz callables to Event observer
Fixes #85
2024-04-02 18:01:03 +01:00
rlaphoenix
d9873dac25 fix(HLS): Delete video/audio segments after FFmpeg merge 2024-03-24 22:28:15 +00:00
rlaphoenix
774fec6d77 fix(HLS): Delete subtitle segments as they are merged 2024-03-24 22:27:32 +00:00
rlaphoenix
73d9bc4f94 fix(HLS): Remove save dir even if final merge wasn't needed 2024-03-09 19:44:40 +00:00
rlaphoenix
4d6c72ba30 fix(DASH/HLS): Don't merge folders, skip final merge if only 1 segment 2024-03-09 01:37:55 +00:00
rlaphoenix
423ff289db feat(Track): Allow Track to choose downloader to use
The downloader property must be a Callable of the same signature as the aria2c, curl_impersonate, and requests downloader functions. You can pass it these functions by importing, or a custom function of a matching signature.

Note: It will still override the chosen downloader and use a fallback one in the case of using aria2c downloader but the download uses the HTTP Range header.

Closes #70
2024-03-08 16:48:44 +00:00
rlaphoenix
ba801739fe fix(aria2c): Support aria2(c) 1.37.0 by handling upstream regression
From aria2c's changelog (2007-09-02):

```
Now *.aria2 contorol file is first saved to *.aria2__temp and if it is successful, then renamed to *.aria2.
This prevents *.aria2 file from being truncated or corrupted when file system becomes out of space.
```

It seems something went wrong in 1.37.0 resulting in these files sometimes not being renamed back to `.aria2` and then being left there for good. The fix for devine would be to simply detect `.aria2__temp` and delete them once all segments finish downloading. My only worry here is the root cause for why it has failed to rename. Did the download actually complete without error? According to aria2c's RPC, no errors occurred. There's no way to add support for Aria2(c) 1.37.0 without this sort of change as the files to seem to download correctly regardless of the file not being renamed and then deleted.

Fixes #71
2024-03-08 16:15:50 +00:00
rlaphoenix
79506dda75 chore(HLS): Remove commented-out code from get_supported_key()
This is code I forgot to remove while testing the HLS rework which released in v3.0.0.
2024-03-08 15:48:39 +00:00
rlaphoenix
e0aa0e37d3 feat(ClearKey): Pass session not proxy str in from_m3u_key method
This reduces the amount of connections being made by quite a bit for playlists that constantly change keys, or have new key data for every single segment (e.g., Pluto sometimes).

It also allows you to pass headers and cookies, while still also being able to supply a proxy.
2024-03-08 15:44:41 +00:00
rlaphoenix
6e8efc3f63 fix(HLS): Use filtered out segment key info
Also simplifies calculation of wanted segment range when decrypting. Instead of storing the starting segment index number with the encryption_data variable, we just grab the first segment that isn't already merged.

Fixes #77
2024-03-04 12:51:00 +00:00
rlaphoenix
90c544966a refactor(Track): Rename extra to data, enforce type as dict
Setting data as a dictionary allows more places of code (including DASH, HLS, Services, etc) to get/set what they want by key instead of typically by index (list/tuple). Tuples or lists were typically in services because DASH and HLS stored needed data as a tuple and services did not want to interrupt or remove that data, even though it would be fine.
2024-03-01 04:29:45 +00:00
rlaphoenix
fa9db335d6 refactor(Track): Rename Descriptor's M3U & MPD to HLS & DASH 2024-03-01 04:11:52 +00:00
rlaphoenix
97efb59e5f Only decode text direction entities in Sub files (cont.)
Already did this for HLS, but somehow forgot to for DASH and direct URLs.
2024-02-29 22:06:57 +00:00
rlaphoenix
eef397f2e8 HLS: Don't include map data if discontinuity/end of playlist was decrypted
The decrypt() call just before it would have included the map data for us, as it was needed to decrypt. Therefore, it would not need to be added again when merge_discontinuity() is called. In some cases re-adding the map data can cause playback or final merge failure.
2024-02-20 20:12:09 +00:00
rlaphoenix
7f898cf2df HLS: Fix map data exists check when merging segments
`map_data` may resolve Truthy, while `map_data[1]` itself could be None, resulting in `None` being written to the stream.
2024-02-20 02:14:58 +00:00
rlaphoenix
2635d06d58 Set stop event & mark track failed if new HLS DRM fails to license 2024-02-20 01:46:47 +00:00
rlaphoenix
8de3a95c6b Flush file buffers when merging DASH or HLS segments 2024-02-20 01:35:58 +00:00
rlaphoenix
1259a26b14 Create and use new utility to get file extension from URLs/Paths
Fixes #73
2024-02-19 18:14:50 +00:00
rlaphoenix
9e0515609f HLS: Ignore possible folders when doing naive final merge 2024-02-16 18:41:05 +00:00
rlaphoenix
323577a5fd HLS: Update first segment of EXT-X-KEY state data on discontinuity 2024-02-16 18:21:21 +00:00
rlaphoenix
e26e55caf3 HLS: Don't reset EXT-X-KEY state data on discontinuity 2024-02-16 16:50:12 +00:00
rlaphoenix
506ba0f615 HLS: Only merge relevant segments on discontinuity 2024-02-16 16:49:42 +00:00
rlaphoenix
2388c85894 HLS: Ensure all segments to decrypt in range exist 2024-02-16 16:49:13 +00:00
rlaphoenix
7587243aa2 HLS: Don't decrypt on key change if there were no prior segments 2024-02-16 16:48:38 +00:00
rlaphoenix
6a37fe9d1b HLS: Don't merge on discontinuity, if it's the first segment
How the m3u8 parser handles/groups #EXT-X to segment objects means the #EXT-X-DISCONTINUITY (`discontinuity` property) is tied to whatever segment is below it's line. Therefore, there's never a scenario where we need to merge+decrypt and the first every segment of the for loop, as there's no segments before it.

This can happen from just slightly off-spec playlists (can't blame it) but also from the OnSegmentFilter filtering out all segments before the first EXT-X-DISCONTINUITY. Common to happen when filtering out bumpers/intros.
2024-02-16 00:15:36 +00:00
rlaphoenix
837015b4ea HLS: Fix incorrect last segment i when decrypting first segment 2024-02-15 23:44:00 +00:00
rlaphoenix
2b7fc929f6 Rework the HLS downloader, add support for new downloaders
- It now downloads all segment files multi-threaded first before any decryption or merging operations (excluding init data, which will be downloaded in sequence/order after all the segments are downloaded)
- Once all segments are downloaded it then starts to go through and do any merging/decryption/init data stuff/e.t.c afterwards.
- Segments are no longer decrypted one by one. If segments use the same EXT-X-KEY data, then they will be merged together and then decrypted. This should see a noticeable speed increase for Widevine DRM.
2024-02-15 17:26:39 +00:00
rlaphoenix
709901176e Use CRC32 instead of MD5 for Track IDs in DASH/HLS 2024-02-15 10:56:51 +00:00
rlaphoenix
bd185126b6 HLS: Skip merging continuity if all segments were skipped
If all segments of a continuity is skipped, i.e. by OnSegmentFilter, then this code fails as the folder wouldn't exist.
2024-02-13 17:03:42 +00:00
rlaphoenix
cd194e3192 Add new Track Event, OnSegmentDownloaded
Like OnDownloaded but called every time a DASH or HLS segment is downloaded. The path to the downloaded segment file is passed to the callable.
2024-02-10 18:10:09 +00:00
rlaphoenix
87779f4e7d Move Track OnDownloaded event before decryption 2024-02-10 18:05:35 +00:00
rlaphoenix
c18fe5706b Pass DRM and Segment objects to Track OnDecrypted event 2024-02-10 17:48:26 +00:00
rlaphoenix
439e376b38 No longer pass the track through track events
If you are setting a callable onto a track event, then you have access to the track variable, so just include/use that in your lambda/callable.
2024-02-10 17:47:12 +00:00
rlaphoenix
a544b1e867 Merge HLS segments first by discontinuity then via FFmpeg
HLS playlists where each segment is in an mp4 container seems to corrupt when the EXT-X-MAP is changed out, unless you first merge segments by discontinuity and then merge the merges via FFmpeg (which demuxes all the merged segment continuities and then concatanates them together, probably giving it new init data too).
2024-02-09 08:33:17 +00:00
rlaphoenix
167b45475e Only decode text direction entities in Sub files
Previously, all entities were decoded in Subtitle files because of a problem with SubtitleEdit and it's /ReverseRtlStartEnd option not being entity-aware.

It actually ends up reversing the `;` of `&rlm;`, instead of the actual value of `&rlm;`. Therefore, I decoded all entities before SubtitleEdit could have processed the Subtitle, but this has caused problems with more advanced formats like TTML and WebVTT as `&lt;` would decode to `<` causing syntax errors, among other problematic characters.

According to the TTML and WebVTT spec, html entity encoding is allowed, and that makes sense or you wouldn't be able to use `<` etc. Any failure for players to show the decoded character would be a player problem and be out of scope with Devine.
2024-02-05 12:37:21 +00:00
rlaphoenix
2056e056a4 Unescape HTML Entities in Subtitles after Downloading
This fixes some Subtitles having e.g., `&amp;` instead of just `&`, but especially for special entities like `&rlm;` which enables Right-to-Left mode on Hebrew and Arabic Subtitles.
2024-01-18 16:25:39 +00:00
rlaphoenix
e8e3d4a90f Remove 5-attempt loop from DASH and HLS Downloads
These are unnecessary now as all downloaders have retry functionality built-in.
2024-01-09 13:00:39 +00:00
rlaphoenix
cc4900a2ed Remove uses of the downloader's silent arg in DASH and HLS
This was originally done to prevent *all* aria2c logs unless on the last attempt, at which if it failed all attempts it would let aria2c log the error.

However, that's bad practice as aria2c may produce errors or warnings on say the 3rd attempt, and the 3rd attempt may have otherwise succeeded, with warnings or errors. It also generally shouldn't be necessary.
2024-01-09 12:54:27 +00:00
rlaphoenix
fa3cee11b7 Move Download Cancel/Skip Events to constants 2024-01-09 11:55:05 +00:00
rlaphoenix
ce457df151 Change wording from Download Stopped to Download Cancelled 2024-01-09 11:38:58 +00:00
rlaphoenix
d566aa2547 Show Licensing and Licensed Messages via Rich 2024-01-09 11:34:14 +00:00
rlaphoenix
f28a6dc28a Fix usage of __all__ 2024-01-09 02:31:22 +00:00
rlaphoenix
c0d940b17b Remove Track.needs_proxy
Ok, so there's a few reasons this was done.

1) Design-wise it isn't valid to have --proxy (or via config/otherwise) set a proxy, then unpredictably have it bypassed or disabled. If I specify `--proxy 127.0.0.1:8080`, I would expect it to use that proxy for all communication indefinitely, not switch in and out depending on the track or service.

2) With reason 1, it's also a security problem. The only reason I implemented it in the first place was so I could download faster on my home connection. This means I would authenticate and call APIs under a proxy, then suddenly download manifests and segments e.t.c under my home connection. A competent service could see that as an indicator of bad play and flag you.

3) Maintaining this setup across the codebase is extremely annoying, especially because of how proxies are setup/used by Requests in the Session. There's no way to tell a request session to temporarily disable the proxy and turn it back on later, without having to get the proxy from the session (in an annoying way) store it, then remove it, make the calls, then assuming your still in the same function you can add it back. If you're not in the same function, well, time for some spaghetti code.

---

tldr; -1 ux/design/expectations with CLI, -1 security aspect, -1 code maintenance, but only +1 for potentially increased download speeds in certain scenarios.
2023-12-29 20:25:57 +00:00
rlaphoenix
7cec16d8ab Validate track languages in HLS.to_tracks 2023-12-02 22:40:41 +00:00
Shivelight
c31ee338dc
Add option for automatic subtitle character encoding normalization (#68)
* Add option for automatic subtitle character encoding normalization

The rationale behind this function is that some services use ISO-8859-1
(latin1) or Windows-1252 (CP-1252) instead of UTF-8 encoding, whether
intentionally or accidentally. Some services even stream subtitles with
malformed/mixed encoding (each segment has a different encoding).

* Remove Subtitle parameter `auto_fix_encoding`

Just always attempt to fix encoding. If the subtitle is neither UTF-8 nor CP-1252, then it should realistically error out instead of producing garbage Subtitle data anyway.

* Move Subtitle encoding fixing code out of if drm tree

* Use chardet as a last ditch effort fixing Subs, or return original data

* Move Subtitle.fix_encoding method to utilities as try_ensure_utf8

* Add Shivelight as a contributor

---------

Co-authored-by: rlaphoenix <rlaphoenix@pm.me>
2023-12-02 11:00:55 +00:00
rlaphoenix
4b8cfabaac Fix all Ruff and isort linter errors 2023-12-02 09:57:13 +00:00
rlaphoenix
6cfbaa7db1 Pass cookies to the aria2c and requests downloaders
For aria2c I've simplified the operation by offloading most of the work for creating a cookie header by just re-doing what Python-requests does. This results in the exact same cookies Python-requests would have used in a requests.get() call or such. It supports multiple of the same-name cookies under different domains/paths based on the URI of the mock request.
2023-05-29 22:23:39 +01:00
rlaphoenix
fd52073605 Skip merging of HLS segments if --skip-dl is used
Partially fixes #61
2023-05-27 20:20:07 +01:00
rlaphoenix
df2f9b85ae Use urljoin instead of an if check and + op in HLS
This used to be used even before devine was public, but it was constantly changed back and forth between an urljoin(), another form of urljoin (something custom or something I can't remember), and an if check + addition.

However, I can confirm that a simple if check will not work as the Base URI might not even be in the same relative root. The if checks have also been inconsistent with some checking if it starts with http(s)://, and some checking if it does not have the base URI at the start of the string.

This if check method does not work as well as an urljoin() has the potential to. It also fixes some services as some HLS playlists would have the m3u8 URL on a completely different root, subdomain, or even domain, causing it to completely break when trying to download segments.
2023-05-21 00:06:30 +01:00