17 Commits

Author SHA1 Message Date
rlaphoenix
e76bc7201d Add convert() method to Subtitle class 2024-01-12 00:50:27 +00:00
rlaphoenix
f4d8bc8dd0 Add support for parsing SubRip (SRT) in Subtitle.parse() 2024-01-12 00:37:22 +00:00
rlaphoenix
14ebe4ee1b Ensure input is UTF-8 when parsing TTML and WebVTT Subtitles
This fixes some conversion errors when working with non-latin languages like Russian (crylic) and Arabic.
2024-01-12 00:36:43 +00:00
rlaphoenix
96f1cbb260 Remove empty caption lists post-parsing in Subtitle.parse()
This issue is common with Now TV where it for some reason parses into "two" languages. "en" and "eng". This results in one empty caption list, and one non empty caption list. The empty caption list tends to be first.

This issue causes a multitude of snowballing problems later down the codebase like when converting to SRT it will result in "MULTI-LANGUAGE SRT" header, which most programs do not recognize, like mkvmerge, causing a mux failure.
2024-01-12 00:30:52 +00:00
rlaphoenix
9683c34337 Improve readability of Subtitle.parse() method 2024-01-12 00:27:19 +00:00
rlaphoenix
f28a6dc28a Fix usage of __all__ 2024-01-09 02:31:22 +00:00
rlaphoenix
53de34da51 Add remove_multi_lang_srt_header() method to Subtitle class 2023-12-29 16:39:45 +00:00
rlaphoenix
e7e18a4204 Use same output subtitle format as input codec to SubtitleEdit calls 2023-12-29 16:39:45 +00:00
rlaphoenix
7cc7227f8c Specify utf8 with SubtitleEdit when stripping hearing impaired 2023-12-29 16:02:10 +00:00
rlaphoenix
d369e6134c Add function to fix Start/End Chars on Subtitles 2023-05-30 20:22:40 +01:00
rlaphoenix
96aa7c1e0a Fix segmented vtt merging code
This got 'broken' after moving to my fork of pymp4 because my fork has commits by TrueDread that add support for the vttc, payl, and sttg boxes, therefore they no longer contain `data` fields but rather specifically parsed fields. I also no longer need to parse the data stream of vttc boxes, as they are already parsed as `children`.
2023-04-04 20:15:18 +01:00
rlaphoenix
f4a9d6c0b1 Replace negative size values in TTML text with 0
Negative size values are not allowed by the spec basically anywhere in the document. Some services seem to accidentally specify a negative value which puts pycaption on a fritz.
2023-03-17 19:28:55 +00:00
rlaphoenix
41018d4574 Don't absorb error messages on Caption Syntax Errors 2023-03-17 18:56:53 +00:00
rlaphoenix
6e888a095e Silence SubtitleEdit when stripping SDH 2023-03-16 20:49:23 +00:00
rlaphoenix
42aaa03941 Completely rewrite downloading system
The new system now downloads and decrypts segments individually instead of downloading all segments, merging them, and then decrypting. Overall the download system now acts more like a normal player.

This fixes #23 as the new HLS download system detects changes in keys and init segments as segments are downloaded. DASH still only supports one period, and one period only, but hopefully I can change that in the future.

Downloading code is now also moved from the Track classes to the manifest classes. Download progress is now also actually helpful for segmented downloads (all HLS, and most DASH streams). It uses TQDM to show a progress bar based on how many segments it needs to download, and how fast it downloads them.

There's only one down side currently. Downloading of segmented videos no longer have the benefit of aria2c's -j parameter. Where it can download n URLs concurrently. Aria2c is still used but only -x and -s is going to make a difference.

In the future I will make HLS and DASH download in a multi-threaded way, sort of a manual version of -j.
2023-02-21 06:00:39 +00:00
rlaphoenix
4b5a2c703b Fix subtitle conversion error where WEBVTT header is kept
This happened because the WEBVTT header was segmented and appended to each other without enough newline separation so pycaption thought it was an actual caption and to be kept.
2023-02-11 22:17:43 +00:00
rlaphoenix
7fd87b8aa2 Initial commit 2023-02-06 02:41:29 +00:00