90 Commits

Author SHA1 Message Date
rlaphoenix
10285c3819 feat(dl): Add *new* --workers to set download threads/workers
The previously named --workers which is now --downloads specified how many tracks to download, not how many threads/workers are used per-download.

It defaults to nothing, which each downloader then has their own defaults. All current downloaders though currently default to `min(32, (os.cpu_count() or 1) + 4)`, which is also the default for `ThreadPoolExecutor` in general.

This also brings a side effect of changing DASH and HLS's forced max_workers of 16 to now a more appropriate default but more importantly actually configurable. You can set a default in your config under `dl.workers`.
2024-04-03 00:58:47 +01:00
rlaphoenix
5a12cb33e2 refactor(Track): Move from OnXyz callables to Event observer
Fixes #85
2024-04-02 18:01:03 +01:00
rlaphoenix
35501bdb9c fix(DASH): Fix merge regression from recent commit
An else tree was used in 4d6c72ba30 when it shouldn't have been.

Fixes #81
2024-03-09 17:52:50 +00:00
rlaphoenix
4d6c72ba30 fix(DASH/HLS): Don't merge folders, skip final merge if only 1 segment 2024-03-09 01:37:55 +00:00
rlaphoenix
423ff289db feat(Track): Allow Track to choose downloader to use
The downloader property must be a Callable of the same signature as the aria2c, curl_impersonate, and requests downloader functions. You can pass it these functions by importing, or a custom function of a matching signature.

Note: It will still override the chosen downloader and use a fallback one in the case of using aria2c downloader but the download uses the HTTP Range header.

Closes #70
2024-03-08 16:48:44 +00:00
rlaphoenix
ba801739fe fix(aria2c): Support aria2(c) 1.37.0 by handling upstream regression
From aria2c's changelog (2007-09-02):

```
Now *.aria2 contorol file is first saved to *.aria2__temp and if it is successful, then renamed to *.aria2.
This prevents *.aria2 file from being truncated or corrupted when file system becomes out of space.
```

It seems something went wrong in 1.37.0 resulting in these files sometimes not being renamed back to `.aria2` and then being left there for good. The fix for devine would be to simply detect `.aria2__temp` and delete them once all segments finish downloading. My only worry here is the root cause for why it has failed to rename. Did the download actually complete without error? According to aria2c's RPC, no errors occurred. There's no way to add support for Aria2(c) 1.37.0 without this sort of change as the files to seem to download correctly regardless of the file not being renamed and then deleted.

Fixes #71
2024-03-08 16:15:50 +00:00
rlaphoenix
c516f54a07 refactor(DASH): Change how Video FPS is gotten to remove FutureWarning log 2024-03-01 05:15:47 +00:00
rlaphoenix
289808b80c refactor(DASH): Move data values from track url to track data property 2024-03-01 05:08:59 +00:00
rlaphoenix
90c544966a refactor(Track): Rename extra to data, enforce type as dict
Setting data as a dictionary allows more places of code (including DASH, HLS, Services, etc) to get/set what they want by key instead of typically by index (list/tuple). Tuples or lists were typically in services because DASH and HLS stored needed data as a tuple and services did not want to interrupt or remove that data, even though it would be fine.
2024-03-01 04:29:45 +00:00
rlaphoenix
fa9db335d6 refactor(Track): Rename Descriptor's M3U & MPD to HLS & DASH 2024-03-01 04:11:52 +00:00
rlaphoenix
97efb59e5f Only decode text direction entities in Sub files (cont.)
Already did this for HLS, but somehow forgot to for DASH and direct URLs.
2024-02-29 22:06:57 +00:00
rlaphoenix
b829ea5c5e DASH: Detect SDH subtitles via AudioPurposeCS:2007=2 2024-02-20 19:29:21 +00:00
rlaphoenix
8de3a95c6b Flush file buffers when merging DASH or HLS segments 2024-02-20 01:35:58 +00:00
rlaphoenix
c826a702ab DASH: Fix URL concatenation in some edge cases
In some of the urljoin()'s it would end with `/None`, e.g., `http://.../some_base_value/None`, when it should just join with the base value only.
2024-02-19 17:45:40 +00:00
rlaphoenix
1f11ed258b DASH: Update progress bar when merging segments 2024-02-15 20:06:42 +00:00
rlaphoenix
e8b07bf03a DASH: Don't set Range Header if no bytes range value
This caused a HTTP 501 Not Implemented on some CDNs.
2024-02-15 19:10:52 +00:00
rlaphoenix
a1ed083b74 Add support for the new Downloaders to DASH 2024-02-15 17:26:39 +00:00
rlaphoenix
709901176e Use CRC32 instead of MD5 for Track IDs in DASH/HLS 2024-02-15 10:56:51 +00:00
rlaphoenix
cd194e3192 Add new Track Event, OnSegmentDownloaded
Like OnDownloaded but called every time a DASH or HLS segment is downloaded. The path to the downloaded segment file is passed to the callable.
2024-02-10 18:10:09 +00:00
rlaphoenix
87779f4e7d Move Track OnDownloaded event before decryption 2024-02-10 18:05:35 +00:00
rlaphoenix
c18fe5706b Pass DRM and Segment objects to Track OnDecrypted event 2024-02-10 17:48:26 +00:00
rlaphoenix
439e376b38 No longer pass the track through track events
If you are setting a callable onto a track event, then you have access to the track variable, so just include/use that in your lambda/callable.
2024-02-10 17:47:12 +00:00
rlaphoenix
3b62b50e25 Add support for SegmentBase and BaseURL-only DASH Manifests 2024-02-05 10:22:40 +00:00
rlaphoenix
2affb62ad0 Fix SegmentList source/media join with Base URL in DASH download_track() 2024-02-03 05:26:52 +00:00
rlaphoenix
e9dc53735c Fix BaseURLs starting with ../ in DASH download_track() 2024-01-29 03:26:15 +00:00
rlaphoenix
2056e056a4 Unescape HTML Entities in Subtitles after Downloading
This fixes some Subtitles having e.g., `&` instead of just `&`, but especially for special entities like `‏` which enables Right-to-Left mode on Hebrew and Arabic Subtitles.
2024-01-18 16:25:39 +00:00
rlaphoenix
e8e3d4a90f Remove 5-attempt loop from DASH and HLS Downloads
These are unnecessary now as all downloaders have retry functionality built-in.
2024-01-09 13:00:39 +00:00
rlaphoenix
cc4900a2ed Remove uses of the downloader's silent arg in DASH and HLS
This was originally done to prevent *all* aria2c logs unless on the last attempt, at which if it failed all attempts it would let aria2c log the error.

However, that's bad practice as aria2c may produce errors or warnings on say the 3rd attempt, and the 3rd attempt may have otherwise succeeded, with warnings or errors. It also generally shouldn't be necessary.
2024-01-09 12:54:27 +00:00
rlaphoenix
fa3cee11b7 Move Download Cancel/Skip Events to constants 2024-01-09 11:55:05 +00:00
rlaphoenix
ce457df151 Change wording from Download Stopped to Download Cancelled 2024-01-09 11:38:58 +00:00
rlaphoenix
d566aa2547 Show Licensing and Licensed Messages via Rich 2024-01-09 11:34:14 +00:00
rlaphoenix
f28a6dc28a Fix usage of __all__ 2024-01-09 02:31:22 +00:00
rlaphoenix
c0d940b17b Remove Track.needs_proxy
Ok, so there's a few reasons this was done.

1) Design-wise it isn't valid to have --proxy (or via config/otherwise) set a proxy, then unpredictably have it bypassed or disabled. If I specify `--proxy 127.0.0.1:8080`, I would expect it to use that proxy for all communication indefinitely, not switch in and out depending on the track or service.

2) With reason 1, it's also a security problem. The only reason I implemented it in the first place was so I could download faster on my home connection. This means I would authenticate and call APIs under a proxy, then suddenly download manifests and segments e.t.c under my home connection. A competent service could see that as an indicator of bad play and flag you.

3) Maintaining this setup across the codebase is extremely annoying, especially because of how proxies are setup/used by Requests in the Session. There's no way to tell a request session to temporarily disable the proxy and turn it back on later, without having to get the proxy from the session (in an annoying way) store it, then remove it, make the calls, then assuming your still in the same function you can add it back. If you're not in the same function, well, time for some spaghetti code.

---

tldr; -1 ux/design/expectations with CLI, -1 security aspect, -1 code maintenance, but only +1 for potentially increased download speeds in certain scenarios.
2023-12-29 20:25:57 +00:00
rlaphoenix
e87de50940 Exclude fragmented Sub Codecs from DASH UTF-8 checks
Chardet was detecting a mixture of mostly cp1252 and MacRoman encoding, where it should just be left as-is when parsing. The actual text within it perhaps may want to go through `try_ensure_utf8` when parsed, but not the entire box.
2023-12-02 17:44:47 +00:00
Shivelight
c31ee338dc
Add option for automatic subtitle character encoding normalization (#68)
* Add option for automatic subtitle character encoding normalization

The rationale behind this function is that some services use ISO-8859-1
(latin1) or Windows-1252 (CP-1252) instead of UTF-8 encoding, whether
intentionally or accidentally. Some services even stream subtitles with
malformed/mixed encoding (each segment has a different encoding).

* Remove Subtitle parameter `auto_fix_encoding`

Just always attempt to fix encoding. If the subtitle is neither UTF-8 nor CP-1252, then it should realistically error out instead of producing garbage Subtitle data anyway.

* Move Subtitle encoding fixing code out of if drm tree

* Use chardet as a last ditch effort fixing Subs, or return original data

* Move Subtitle.fix_encoding method to utilities as try_ensure_utf8

* Add Shivelight as a contributor

---------

Co-authored-by: rlaphoenix <rlaphoenix@pm.me>
2023-12-02 11:00:55 +00:00
rlaphoenix
4b8cfabaac Fix all Ruff and isort linter errors 2023-12-02 09:57:13 +00:00
rlaphoenix
f3cfaa3ab3 Fix DASH FPS error when SegmentBase is not found 2023-07-15 18:08:01 +01:00
rlaphoenix
6cfbaa7db1 Pass cookies to the aria2c and requests downloaders
For aria2c I've simplified the operation by offloading most of the work for creating a cookie header by just re-doing what Python-requests does. This results in the exact same cookies Python-requests would have used in a requests.get() call or such. It supports multiple of the same-name cookies under different domains/paths based on the URI of the mock request.
2023-05-29 22:23:39 +01:00
rlaphoenix
8ada6165e3 Set stop event & mark track failed if DASH DRM fails to license 2023-05-19 19:07:35 +01:00
rlaphoenix
3e0b7ef200 Fix regression where Range header is accidentally kept and re-used 2023-05-19 00:35:46 +01:00
rlaphoenix
dd64212ad2 Move download_segment() from DASH/HLS download_track() to Class
Various overall small readability improvements have also been made.
2023-05-17 03:20:01 +01:00
rlaphoenix
03c012f88e Move the Downloaded msg after Decrypt mgs in DASH/URL downloads 2023-05-17 02:09:16 +01:00
rlaphoenix
6cdde3efb0 Override the downloader more efficiently in DASH/HLS when Range is used 2023-05-17 01:33:06 +01:00
rlaphoenix
6d4be8620c Only write segment data if the tfhd fix was necessary in DASH 2023-05-17 01:22:59 +01:00
rlaphoenix
681d69d5e5 Mark DASH and URL tracks as Decrypting when using shaka
DASH and normal URL downloads now both decrypt one large single or merged file after all downloads are finished. This leaves a bit of a "pause" between progress bar movement which looks a bit odd. So mark the track as being in a Decrypting state.
2023-05-16 22:01:07 +01:00
rlaphoenix
a45c784569 Replace download speeds with "Downloaded" text when finished 2023-05-16 21:59:03 +01:00
rlaphoenix
2a8307b98d Decrypt DASH downloads after merging all segments
Since DASH doesn't have the ability to change keys dynamically per-track (Representation), there's no need for the DASH downloader to decrypt segments as they are downloaded (like HLS).

This halves the amount of processes needing to be opened as well as the I/O usage. It may result in noticeably lower CPU usage. Since the IOPS is lowered, you may even see an increase in download speed if downloading to something like a meh HDD.

This also fixes decryption in some weird edge-cases where decrypting each segment individually resulted in timestamp anomalies causing shaka to fail.
2023-05-16 21:55:53 +01:00
rlaphoenix
e7dc138c0f Improve readability and documentation of DASH's to_tracks function 2023-05-15 16:19:53 +01:00
rlaphoenix
cb82febb7c Add ability to choose downloader via config 2023-05-12 06:42:33 +01:00
rlaphoenix
b92708ef45 Alter behaviour of --skip-dl to allow DRM licensing
Most people used --skip-dl just to license the DRM pre-v1.3.0. Which makes sense, --skip-dl is otherwise a pointless feature. I've fixed it so that --skip-dl worked like before, allowing license calls, while still supporting the new per-segment features post-v1.3.0.

Fixes #37
2023-05-11 22:17:41 +01:00