Compare commits

...

34 Commits

Author SHA1 Message Date
retouching
09eda16882
fix(dl): delete old file after repackage (#114)
* fix(dl): delete old file after repackage

* fix(dl): using original_path instead of self.path in repackage method
2024-06-03 16:57:26 +01:00
rlaphoenix
a95d32de9e chore: Add config to gitignore 2024-05-17 02:29:46 +01:00
rlaphoenix
221cd145c4 refactor(dl): Make Widevine CDM config optional
With this change you no longer have to define/configure a CDM to load. This is something that isn't necessary for a lot of services.

Note: It's also now less hand-holdy in terms of correct config formatting/values. I.e. if you define a cdm by profile for a service slightly incorrectly, say a typo on the service or profile name, it will no longer warn you.
2024-05-17 01:52:45 +01:00
rlaphoenix
0310646cb2 fix(Subtitle): Skip merging segmented WebVTT if only 1 segment 2024-05-17 01:42:44 +01:00
rlaphoenix
3426fc145f fix(HLS): Decrypt AES-encrypted segments separately
We cannot merge all the encrypted AES-128-CBC (ClearKey) segments and then decrypt them in one go because each segment should be padded to a 16-byte boundary in CBC mode.

Since it uses PKCS#5 or #7 style (cant remember which) then the merged file has a 15 in 16 chance to fail the boundary check. And in the 1 in 16 odds that it passes the boundary check, it will not decrypt properly as each segment's padding will be treated as actual data, and not padding.
2024-05-17 01:15:37 +01:00
rlaphoenix
e57d755837 fix(clearkey): Do not pad data before decryption
This is seemingly unnecessary and simply incorrect at least for two sources (VGTV, and TRUTV).

Without this change it is not possible to correctly merge all segments without at least some problem in the resulting file.
2024-05-17 01:00:11 +01:00
rlaphoenix
03f3fec5cc refactor(dl): Only log errors/warnings from mkvmerge, list after message 2024-05-16 18:12:57 +01:00
rlaphoenix
2acee30e54 fix(utilities): Prevent finding the same box index over and over
Since it removed the data before the found box's index(-4), all loops would only find the same box at the same index again, but this time the box index would be 4 since all previous data was removed in the prior loop. Since the index-=4 code is only run if the index > 4, this never run on the second loop, and since this data now does not have the box length, Box.parse failed with an IOError.

This corrects looping through boxes and correctly obtains and parses each box.
2024-05-15 17:54:21 +01:00
rlaphoenix
2e697d93fc fix(dl): Log output from mkvmerge on failure 2024-05-15 14:00:38 +01:00
rlaphoenix
f08402d795 refactor: Warn falling back to requests as aria2c doesn't support Range 2024-05-11 22:59:31 +01:00
rlaphoenix
5ef95e942a fix(DASH): Use SegmentTemplate endNumber if available 2024-05-11 22:15:05 +01:00
rlaphoenix
dde55fd708 fix(DASH): Correct SegmentTemplate range stop value
Since range(start, stop) is start-inclusive but stop-exclusive, and DASH startNumber of SegmentTemplate typically will be 1 or not specified (defaulting to 1) it effectively worked by coincidence.

However, if startNumber was anything other than 1 than we will have a problem.
2024-05-11 22:13:28 +01:00
rlaphoenix
345cc5aba6
Merge pull request #110 from adbbbb/master
Adding Arm64 OSX Shaka support
2024-05-11 20:13:30 +01:00
rlaphoenix
145e7a6c17 docs(contributors): Add adbbbb to Contributor list 2024-05-11 20:13:01 +01:00
Adam
5706bb1417 fix(binaries): Search for Arm64 builds of Shaka-Packager 2024-05-11 20:11:29 +01:00
rlaphoenix
85246ab419
Merge pull request #109 from pandamoon21/master
Fix uppercase letters in the fonts extension - Font attachment
2024-05-11 17:46:04 +01:00
rlaphoenix
71a3a4e2c4 docs(contributors): Add pandamoon21 to Contributor list 2024-05-11 17:45:10 +01:00
pandamoon21
06d414975c fix(Attachment): Check mime-type case-insensitively 2024-05-11 17:43:32 +01:00
rlaphoenix
f419e04fad refactor(Track): Ensure data property is a defaultdict with dict factory
This is so both internal code and service code can save data to sub-keys without the parent keys needing to exist.

A doc-string is now set to the data property denoting some keys as reserved as well as their typing and meaning.

This also fixes a bug introduced in v3.3.3 where it will fail to download tracks without the "hls" key in the data property. This can happen when manually making Audio tracks using the HLS descriptor, and not putting any of the hls data the HLS class sets in to_tracks().
2024-05-09 15:15:22 +01:00
rlaphoenix
50d6f3a64d docs(changelog): Add v3.3.3 Changes 2024-05-07 07:10:20 +01:00
rlaphoenix
259434b59d docs(version): Bump to v3.3.3 2024-05-07 07:10:02 +01:00
rlaphoenix
7df8be46da build(poetry): Update dependencies
We can remove explicit dependency on language-data and marisa-trie because langcodes v3.3.0 now depends on language-data 1.2.0 and language-data 1.2.0 now depends on marisa-trie 1.1.0.
2024-05-07 07:06:22 +01:00
rlaphoenix
7aa797a4cc
Merge pull request #67 from Shivelight/feature/fix-webvtt-timestamp
Correct timestamps when merging fragmented WebVTT
2024-05-07 06:54:42 +01:00
Shivelight
0ba45decc6 fix(Subtitle): Correct timestamps when merging fragmented WebVTT
This applies the X-TIMESTAMP-MAP data to timestamps as it reads through a concatenated (merged) WebVTT file to correct timestamps on segmented WebVTT streams. It then removes the X-TIMESTAMP-MAP header.

The timescale and segment duration information is saved in the Subtitle's data dictionary under the hls/dash key: timescale (dash-only) and segment_durations. Note that this information will only be available post-download.

This is done regardless if you are converting to another subtitle or not, since the downloader automatically and forcefully concatenated the segmented subtitle data. We do not support the use of segmented Subtitles for downloading or otherwise, nor do we plan to.
2024-05-06 18:18:23 +01:00
rlaphoenix
af95ba062a refactor(env): Shorten paths on Windows with env vars 2024-04-24 05:56:05 +01:00
rlaphoenix
3bfd96d53c fix(dl): Automatically convert TTML Subs to WebVTT for MKV support 2024-04-24 05:35:24 +01:00
rlaphoenix
f23100077e refactor(dl): Improve readability of download worker errors
Now it will no longer print the full traceback for errors caused by a missing binary file. Other errors still include it and now explicitly label them as unexpected. CalledProcessError handling is now merged with all non-environment related errors and explicitly mentions that a binary call failed.
2024-04-24 05:28:10 +01:00
rlaphoenix
fd64e6acf4 refactor(utilities): Remove get_binary_path, use binaries.find instead
The function now located at core/binaries should only be used by services to find a specific binary not listed in there already, or if the name of the binary needed by your service differs.
2024-04-24 05:10:34 +01:00
rlaphoenix
677fd9c56a feat(binaries): Move all binary definitions to core/binaries file
This simplifies and centralizes all definitions on where these binaries can be found to a singular reference, making it easier to modify, edit, and improve.
2024-04-24 05:07:25 +01:00
rlaphoenix
9768de8bf2 feat(env): List possible config path locations when not found 2024-04-19 19:28:15 +01:00
rlaphoenix
959b62222e fix(env): List all directories as table in info 2024-04-19 19:27:33 +01:00
rlaphoenix
c101136d55 refactor(Config): Move possible config paths out of func to constant 2024-04-19 19:23:56 +01:00
rlaphoenix
4f1dfd7dd1 refactor(curl-impersonate): Update the default browser to chrome124 2024-04-18 09:50:17 +01:00
rlaphoenix
c859465af2 refactor(curl-impersonate): Remove manual fix for curl proxy SSL
The new version of curl-cffi includes the proper fix for applying ca-bundles to proxy connections making this manual fix no longer required.
2024-04-18 09:49:35 +01:00
28 changed files with 597 additions and 192 deletions

2
.gitignore vendored
View File

@ -1,4 +1,6 @@
# devine
devine.yaml
devine.yml
*.mkv
*.mp4
*.exe

View File

@ -7,6 +7,25 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
Versions [3.0.0] and older use a format based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
but versions thereafter use a custom changelog format using [git-cliff](https://git-cliff.org).
## [3.3.3] - 2024-05-07
### Bug Fixes
- *dl*: Automatically convert TTML Subs to WebVTT for MKV support
- *Subtitle*: Correct timestamps when merging fragmented WebVTT
### Changes
- *env*: List all directories as table in info
- *env*: List possible config path locations when not found
- *binaries*: Move all binary definitions to core/binaries file
- *curl-impersonate*: Remove manual fix for curl proxy SSL
- *curl-impersonate*: Update the default browser to chrome124
- *Config*: Move possible config paths out of func to constant
- *utilities*: Remove get_binary_path, use binaries.find instead
- *dl*: Improve readability of download worker errors
- *env*: Shorten paths on Windows with env vars
## [3.3.2] - 2024-04-16
### Bug Fixes
@ -795,6 +814,7 @@ This release brings a huge change to the fundamentals of Devine's logging, UI, a
Initial public release under the name Devine.
[3.3.3]: https://github.com/devine-dl/devine/releases/tag/v3.3.3
[3.3.2]: https://github.com/devine-dl/devine/releases/tag/v3.3.2
[3.3.1]: https://github.com/devine-dl/devine/releases/tag/v3.3.1
[3.3.0]: https://github.com/devine-dl/devine/releases/tag/v3.3.0

View File

@ -343,6 +343,8 @@ Please refrain from spam or asking for questions that infringe upon a Service's
<a href="https://github.com/Shivelight"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/20620780?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt="Shivelight"/></a>
<a href="https://github.com/knowhere01"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/113712042?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt="knowhere01"/></a>
<a href="https://github.com/retouching"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/33735357?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt="retouching"/></a>
<a href="https://github.com/pandamoon21"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/33972938?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt="pandamoon21"/></a>
<a href="https://github.com/adbbbb"><img src="https://images.weserv.nl/?url=avatars.githubusercontent.com/u/56319336?v=4&h=25&w=25&fit=cover&mask=circle&maxage=7d" alt="adbbbb"/></a>
## Licensing

View File

@ -38,6 +38,7 @@ from rich.table import Table
from rich.text import Text
from rich.tree import Tree
from devine.core import binaries
from devine.core.config import config
from devine.core.console import console
from devine.core.constants import DOWNLOAD_LICENCE_ONLY, AnyTrack, context_settings
@ -51,7 +52,7 @@ from devine.core.titles import Movie, Song, Title_T
from devine.core.titles.episode import Episode
from devine.core.tracks import Audio, Subtitle, Tracks, Video
from devine.core.tracks.attachment import Attachment
from devine.core.utilities import get_binary_path, get_system_fonts, is_close_match, time_elapsed_since
from devine.core.utilities import get_system_fonts, is_close_match, time_elapsed_since
from devine.core.utils.click_types import LANGUAGE_RANGE, QUALITY_LIST, SEASON_RANGE, ContextData, MultipleChoice
from devine.core.utils.collections import merge_dict
from devine.core.utils.subprocess import ffprobe
@ -177,6 +178,7 @@ class dl:
except ValueError as e:
self.log.error(f"Failed to load Widevine CDM, {e}")
sys.exit(1)
if self.cdm:
self.log.info(
f"Loaded {self.cdm.__class__.__name__} Widevine CDM: {self.cdm.system_id} (L{self.cdm.security_level})"
)
@ -198,7 +200,7 @@ class dl:
self.proxy_providers.append(Basic(**config.proxy_providers["basic"]))
if config.proxy_providers.get("nordvpn"):
self.proxy_providers.append(NordVPN(**config.proxy_providers["nordvpn"]))
if get_binary_path("hola-proxy"):
if binaries.HolaProxy:
self.proxy_providers.append(Hola())
for proxy_provider in self.proxy_providers:
self.log.info(f"Loaded {proxy_provider.__class__.__name__}: {proxy_provider}")
@ -546,13 +548,16 @@ class dl:
except Exception as e: # noqa
error_messages = [
":x: Download Failed...",
" One of the track downloads had an error!",
" See the error trace above for more information."
]
if isinstance(e, subprocess.CalledProcessError):
# ignore process exceptions as proper error logs are already shown
error_messages.append(f" Process exit code: {e.returncode}")
if isinstance(e, EnvironmentError):
error_messages.append(f" {e}")
else:
error_messages.append(" An unexpected error occurred in one of the download workers.",)
if hasattr(e, "returncode"):
error_messages.append(f" Binary call failed, Process exit code: {e.returncode}")
error_messages.append(" See the error trace above for more information.")
if isinstance(e, subprocess.CalledProcessError):
# CalledProcessError already lists the exception trace
console.print_exception()
console.print(Padding(
Group(*error_messages),
@ -610,11 +615,14 @@ class dl:
break
video_track_n += 1
if sub_format:
with console.status(f"Converting Subtitles to {sub_format.name}..."):
with console.status("Converting Subtitles..."):
for subtitle in title.tracks.subtitles:
if sub_format:
if subtitle.codec != sub_format:
subtitle.convert(sub_format)
elif subtitle.codec == Subtitle.Codec.TimedTextMarkupLang:
# MKV does not support TTML, VTT is the next best option
subtitle.convert(Subtitle.Codec.WebVTT)
with console.status("Checking Subtitles for Fonts..."):
font_names = []
@ -694,16 +702,22 @@ class dl:
):
for task_id, task_tracks in multiplex_tasks:
progress.start_task(task_id) # TODO: Needed?
muxed_path, return_code = task_tracks.mux(
muxed_path, return_code, errors = task_tracks.mux(
str(title),
progress=partial(progress.update, task_id=task_id),
delete=False
)
muxed_paths.append(muxed_path)
if return_code == 1:
self.log.warning("mkvmerge had at least one warning, will continue anyway...")
elif return_code >= 2:
self.log.error(f"Failed to Mux video to Matroska file ({return_code})")
if return_code >= 2:
self.log.error(f"Failed to Mux video to Matroska file ({return_code}):")
elif return_code == 1 or errors:
self.log.warning("mkvmerge had at least one warning or error, continuing anyway...")
for line in errors:
if line.startswith("#GUI#error"):
self.log.error(line)
else:
self.log.warning(line)
if return_code >= 2:
sys.exit(1)
for video_track in task_tracks.videos:
video_track.delete()
@ -923,21 +937,21 @@ class dl:
return Credential.loads(credentials) # type: ignore
@staticmethod
def get_cdm(service: str, profile: Optional[str] = None) -> WidevineCdm:
def get_cdm(service: str, profile: Optional[str] = None) -> Optional[WidevineCdm]:
"""
Get CDM for a specified service (either Local or Remote CDM).
Raises a ValueError if there's a problem getting a CDM.
"""
cdm_name = config.cdm.get(service) or config.cdm.get("default")
if not cdm_name:
raise ValueError("A CDM to use wasn't listed in the config")
return None
if isinstance(cdm_name, dict):
if not profile:
raise ValueError("CDM config is mapped for profiles, but no profile was chosen")
return None
cdm_name = cdm_name.get(profile) or config.cdm.get("default")
if not cdm_name:
raise ValueError(f"A CDM to use was not mapped for the profile {profile}")
return None
cdm_api = next(iter(x for x in config.remote_cdm if x["name"] == cdm_name), None)
if cdm_api:

View File

@ -1,10 +1,17 @@
import logging
import os
import shutil
import sys
from pathlib import Path
from typing import Optional
import click
from rich.padding import Padding
from rich.table import Table
from rich.tree import Tree
from devine.core.config import config, config_path
from devine.core.config import POSSIBLE_CONFIG_PATHS, config, config_path
from devine.core.console import console
from devine.core.constants import context_settings
from devine.core.services import Services
@ -18,13 +25,42 @@ def env() -> None:
def info() -> None:
"""Displays information about the current environment."""
log = logging.getLogger("env")
log.info(f"[Config] : {config_path or '--'}")
log.info(f"[Cookies] : {config.directories.cookies}")
log.info(f"[WVDs] : {config.directories.wvds}")
log.info(f"[Cache] : {config.directories.cache}")
log.info(f"[Logs] : {config.directories.logs}")
log.info(f"[Temp Files] : {config.directories.temp}")
log.info(f"[Downloads] : {config.directories.downloads}")
if config_path:
log.info(f"Config loaded from {config_path}")
else:
tree = Tree("No config file found, you can use any of the following locations:")
for i, path in enumerate(POSSIBLE_CONFIG_PATHS, start=1):
tree.add(f"[repr.number]{i}.[/] [text2]{path.resolve()}[/]")
console.print(Padding(
tree,
(0, 5)
))
table = Table(title="Directories", expand=True)
table.add_column("Name", no_wrap=True)
table.add_column("Path")
path_vars = {
x: Path(os.getenv(x))
for x in ("TEMP", "APPDATA", "LOCALAPPDATA", "USERPROFILE")
if sys.platform == "win32" and os.getenv(x)
}
for name in sorted(dir(config.directories)):
if name.startswith("__") or name == "app_dirs":
continue
path = getattr(config.directories, name).resolve()
for var, var_path in path_vars.items():
if path.is_relative_to(var_path):
path = rf"%{var}%\{path.relative_to(var_path)}"
break
table.add_row(name.title(), str(path))
console.print(Padding(
table,
(1, 5)
))
@env.group(name="clear", short_help="Clear an environment directory.", context_settings=context_settings)

View File

@ -12,13 +12,13 @@ from rich.rule import Rule
from rich.tree import Tree
from devine.commands.dl import dl
from devine.core import binaries
from devine.core.config import config
from devine.core.console import console
from devine.core.constants import context_settings
from devine.core.proxies import Basic, Hola, NordVPN
from devine.core.service import Service
from devine.core.services import Services
from devine.core.utilities import get_binary_path
from devine.core.utils.click_types import ContextData
from devine.core.utils.collections import merge_dict
@ -72,7 +72,7 @@ def search(
proxy_providers.append(Basic(**config.proxy_providers["basic"]))
if config.proxy_providers.get("nordvpn"):
proxy_providers.append(NordVPN(**config.proxy_providers["nordvpn"]))
if get_binary_path("hola-proxy"):
if binaries.HolaProxy:
proxy_providers.append(Hola())
for proxy_provider in proxy_providers:
log.info(f"Loaded {proxy_provider.__class__.__name__}: {proxy_provider}")

View File

@ -2,9 +2,9 @@ import subprocess
import click
from devine.core import binaries
from devine.core.config import config
from devine.core.constants import context_settings
from devine.core.utilities import get_binary_path
@click.command(
@ -29,11 +29,10 @@ def serve(host: str, port: int, caddy: bool) -> None:
from pywidevine import serve
if caddy:
executable = get_binary_path("caddy")
if not executable:
if not binaries.Caddy:
raise click.ClickException("Caddy executable \"caddy\" not found but is required for --caddy.")
caddy_p = subprocess.Popen([
executable,
binaries.Caddy,
"run",
"--config", str(config.directories.user_configs / "Caddyfile")
])

View File

@ -4,8 +4,8 @@ from pathlib import Path
import click
from pymediainfo import MediaInfo
from devine.core import binaries
from devine.core.constants import context_settings
from devine.core.utilities import get_binary_path
@click.group(short_help="Various helper scripts and programs.", context_settings=context_settings)
@ -38,8 +38,7 @@ def crop(path: Path, aspect: str, letter: bool, offset: int, preview: bool) -> N
as it may go from being 2px away from a perfect crop, to 20px over-cropping
again due to sub-sampled chroma.
"""
executable = get_binary_path("ffmpeg")
if not executable:
if not binaries.FFMPEG:
raise click.ClickException("FFmpeg executable \"ffmpeg\" not found but is required.")
if path.is_dir():
@ -87,7 +86,7 @@ def crop(path: Path, aspect: str, letter: bool, offset: int, preview: bool) -> N
]))))]
ffmpeg_call = subprocess.Popen([
executable, "-y",
binaries.FFMPEG, "-y",
"-i", str(video_path),
"-map", "0:v:0",
"-c", "copy",
@ -95,7 +94,7 @@ def crop(path: Path, aspect: str, letter: bool, offset: int, preview: bool) -> N
] + out_path, stdout=subprocess.PIPE)
try:
if preview:
previewer = get_binary_path("mpv", "ffplay")
previewer = binaries.MPV or binaries.FFPlay
if not previewer:
raise click.ClickException("MPV/FFplay executables weren't found but are required for previewing.")
subprocess.Popen((previewer, "-"), stdin=ffmpeg_call.stdout)
@ -120,8 +119,7 @@ def range_(path: Path, full: bool, preview: bool) -> None:
then you're video may have the range set to the wrong value. Flip its range to the
opposite value and see if that fixes it.
"""
executable = get_binary_path("ffmpeg")
if not executable:
if not binaries.FFMPEG:
raise click.ClickException("FFmpeg executable \"ffmpeg\" not found but is required.")
if path.is_dir():
@ -157,7 +155,7 @@ def range_(path: Path, full: bool, preview: bool) -> None:
]))))]
ffmpeg_call = subprocess.Popen([
executable, "-y",
binaries.FFMPEG, "-y",
"-i", str(video_path),
"-map", "0:v:0",
"-c", "copy",
@ -165,7 +163,7 @@ def range_(path: Path, full: bool, preview: bool) -> None:
] + out_path, stdout=subprocess.PIPE)
try:
if preview:
previewer = get_binary_path("mpv", "ffplay")
previewer = binaries.MPV or binaries.FFPlay
if not previewer:
raise click.ClickException("MPV/FFplay executables weren't found but are required for previewing.")
subprocess.Popen((previewer, "-"), stdin=ffmpeg_call.stdout)
@ -188,8 +186,7 @@ def test(path: Path, map_: str) -> None:
You may choose specific streams using the -m/--map parameter. E.g.,
'0:v:0' to test the first video stream, or '0:a' to test all audio streams.
"""
executable = get_binary_path("ffmpeg")
if not executable:
if not binaries.FFMPEG:
raise click.ClickException("FFmpeg executable \"ffmpeg\" not found but is required.")
if path.is_dir():
@ -199,7 +196,7 @@ def test(path: Path, map_: str) -> None:
for video_path in paths:
print("Starting...")
p = subprocess.Popen([
executable, "-hide_banner",
binaries.FFMPEG, "-hide_banner",
"-benchmark",
"-i", str(video_path),
"-map", map_,

View File

@ -1 +1 @@
__version__ = "3.3.2"
__version__ = "3.3.3"

46
devine/core/binaries.py Normal file
View File

@ -0,0 +1,46 @@
import shutil
import sys
from pathlib import Path
from typing import Optional
__shaka_platform = {
"win32": "win",
"darwin": "osx"
}.get(sys.platform, sys.platform)
def find(*names: str) -> Optional[Path]:
"""Find the path of the first found binary name."""
for name in names:
path = shutil.which(name)
if path:
return Path(path)
return None
FFMPEG = find("ffmpeg")
FFProbe = find("ffprobe")
FFPlay = find("ffplay")
SubtitleEdit = find("SubtitleEdit")
ShakaPackager = find(
"shaka-packager",
"packager",
f"packager-{__shaka_platform}",
f"packager-{__shaka_platform}-arm64",
f"packager-{__shaka_platform}-x64"
)
Aria2 = find("aria2c", "aria2")
CCExtractor = find(
"ccextractor",
"ccextractorwin",
"ccextractorwinfull"
)
HolaProxy = find("hola-proxy")
MPV = find("mpv")
Caddy = find("caddy")
__all__ = (
"FFMPEG", "FFProbe", "FFPlay", "SubtitleEdit", "ShakaPackager",
"Aria2", "CCExtractor", "HolaProxy", "MPV", "Caddy", "find"
)

View File

@ -77,29 +77,27 @@ class Config:
return cls(**yaml.safe_load(path.read_text(encoding="utf8")) or {})
# noinspection PyProtectedMember
POSSIBLE_CONFIG_PATHS = (
# The Devine Namespace Folder (e.g., %appdata%/Python/Python311/site-packages/devine)
Config._Directories.namespace_dir / Config._Filenames.root_config,
# The Parent Folder to the Devine Namespace Folder (e.g., %appdata%/Python/Python311/site-packages)
Config._Directories.namespace_dir.parent / Config._Filenames.root_config,
# The AppDirs User Config Folder (e.g., %localappdata%/devine)
Config._Directories.user_configs / Config._Filenames.root_config
)
def get_config_path() -> Optional[Path]:
"""
Get Path to Config from various locations.
Looks for a config file in the following folders in order:
1. The Devine Namespace Folder (e.g., %appdata%/Python/Python311/site-packages/devine)
2. The Parent Folder to the Devine Namespace Folder (e.g., %appdata%/Python/Python311/site-packages)
3. The AppDirs User Config Folder (e.g., %localappdata%/devine)
Get Path to Config from any one of the possible locations.
Returns None if no config file could be found.
"""
# noinspection PyProtectedMember
path = Config._Directories.namespace_dir / Config._Filenames.root_config
if not path.exists():
# noinspection PyProtectedMember
path = Config._Directories.namespace_dir.parent / Config._Filenames.root_config
if not path.exists():
# noinspection PyProtectedMember
path = Config._Directories.user_configs / Config._Filenames.root_config
if not path.exists():
path = None
for path in POSSIBLE_CONFIG_PATHS:
if path.exists():
return path
return None
config_path = get_config_path()

View File

@ -15,10 +15,11 @@ from requests.cookies import cookiejar_from_dict, get_cookie_header
from rich import filesize
from rich.text import Text
from devine.core import binaries
from devine.core.config import config
from devine.core.console import console
from devine.core.constants import DOWNLOAD_CANCELLED
from devine.core.utilities import get_binary_path, get_extension, get_free_port
from devine.core.utilities import get_extension, get_free_port
def rpc(caller: Callable, secret: str, method: str, params: Optional[list[Any]] = None) -> Any:
@ -87,8 +88,7 @@ def download(
if not isinstance(urls, list):
urls = [urls]
executable = get_binary_path("aria2c", "aria2")
if not executable:
if not binaries.Aria2:
raise EnvironmentError("Aria2c executable not found...")
if proxy and not proxy.lower().startswith("http://"):
@ -186,7 +186,7 @@ def download(
try:
p = subprocess.Popen(
[
executable,
binaries.Aria2,
*arguments
],
stdin=subprocess.PIPE,

View File

@ -6,7 +6,6 @@ from http.cookiejar import CookieJar
from pathlib import Path
from typing import Any, Generator, MutableMapping, Optional, Union
from curl_cffi import CurlOpt
from curl_cffi.requests import Session
from rich import filesize
@ -18,7 +17,7 @@ MAX_ATTEMPTS = 5
RETRY_WAIT = 2
CHUNK_SIZE = 1024
PROGRESS_WINDOW = 5
BROWSER = config.curl_impersonate.get("browser", "chrome120")
BROWSER = config.curl_impersonate.get("browser", "chrome124")
def download(
@ -53,11 +52,6 @@ def download(
for one-time request changes like a header, cookie, or proxy. For example,
to request Byte-ranges use e.g., `headers={"Range": "bytes=0-128"}`.
"""
# https://github.com/yifeikong/curl_cffi/issues/6#issuecomment-2028518677
# must be applied here since the `session.curl` is thread-localized
# noinspection PyProtectedMember
session.curl.setopt(CurlOpt.PROXY_CAINFO, session.curl._cacert)
save_dir = save_path.parent
control_file = save_path.with_name(f"{save_path.name}.!dev")

View File

@ -7,7 +7,7 @@ from typing import Optional, Union
from urllib.parse import urljoin
from Cryptodome.Cipher import AES
from Cryptodome.Util.Padding import pad, unpad
from Cryptodome.Util.Padding import unpad
from m3u8.model import Key
from requests import Session
@ -43,7 +43,7 @@ class ClearKey:
decrypted = AES. \
new(self.key, AES.MODE_CBC, self.iv). \
decrypt(pad(path.read_bytes(), AES.block_size))
decrypt(path.read_bytes())
try:
decrypted = unpad(decrypted, AES.block_size)

View File

@ -3,7 +3,6 @@ from __future__ import annotations
import base64
import shutil
import subprocess
import sys
import textwrap
from pathlib import Path
from typing import Any, Callable, Optional, Union
@ -17,10 +16,11 @@ from pywidevine.pssh import PSSH
from requests import Session
from rich.text import Text
from devine.core import binaries
from devine.core.config import config
from devine.core.console import console
from devine.core.constants import AnyTrack
from devine.core.utilities import get_binary_path, get_boxes
from devine.core.utilities import get_boxes
from devine.core.utils.subprocess import ffprobe
@ -223,9 +223,7 @@ class Widevine:
if not self.content_keys:
raise ValueError("Cannot decrypt a Track without any Content Keys...")
platform = {"win32": "win", "darwin": "osx"}.get(sys.platform, sys.platform)
executable = get_binary_path("shaka-packager", "packager", f"packager-{platform}", f"packager-{platform}-x64")
if not executable:
if not binaries.ShakaPackager:
raise EnvironmentError("Shaka Packager executable not found but is required.")
if not path or not path.exists():
raise ValueError("Tried to decrypt a file that does not exist.")
@ -252,7 +250,7 @@ class Widevine:
]
p = subprocess.Popen(
[executable, *arguments],
[binaries.ShakaPackager, *arguments],
stdout=subprocess.DEVNULL,
stderr=subprocess.PIPE,
universal_newlines=True

View File

@ -285,12 +285,16 @@ class DASH:
segment_base = adaptation_set.find("SegmentBase")
segments: list[tuple[str, Optional[str]]] = []
segment_timescale: float = 0
segment_durations: list[int] = []
track_kid: Optional[UUID] = None
if segment_template is not None:
segment_template = copy(segment_template)
start_number = int(segment_template.get("startNumber") or 1)
end_number = int(segment_template.get("endNumber") or 0) or None
segment_timeline = segment_template.find("SegmentTimeline")
segment_timescale = float(segment_template.get("timescale") or 1)
for item in ("initialization", "media"):
value = segment_template.get(item)
@ -318,17 +322,18 @@ class DASH:
track_kid = track.get_key_id(init_data)
if segment_timeline is not None:
seg_time_list = []
current_time = 0
for s in segment_timeline.findall("S"):
if s.get("t"):
current_time = int(s.get("t"))
for _ in range(1 + (int(s.get("r") or 0))):
seg_time_list.append(current_time)
segment_durations.append(current_time)
current_time += int(s.get("d"))
seg_num_list = list(range(start_number, len(seg_time_list) + start_number))
for t, n in zip(seg_time_list, seg_num_list):
if not end_number:
end_number = len(segment_durations)
for t, n in zip(segment_durations, range(start_number, end_number + 1)):
segments.append((
DASH.replace_fields(
segment_template.get("media"),
@ -342,11 +347,12 @@ class DASH:
if not period_duration:
raise ValueError("Duration of the Period was unable to be determined.")
period_duration = DASH.pt_to_sec(period_duration)
segment_duration = float(segment_template.get("duration"))
segment_timescale = float(segment_template.get("timescale") or 1)
total_segments = math.ceil(period_duration / (segment_duration / segment_timescale))
segment_duration = float(segment_template.get("duration")) or 1
for s in range(start_number, start_number + total_segments):
if not end_number:
end_number = math.ceil(period_duration / (segment_duration / segment_timescale))
for s in range(start_number, end_number + 1):
segments.append((
DASH.replace_fields(
segment_template.get("media"),
@ -356,7 +362,11 @@ class DASH:
Time=s
), None
))
# TODO: Should we floor/ceil/round, or is int() ok?
segment_durations.append(int(segment_duration))
elif segment_list is not None:
segment_timescale = float(segment_list.get("timescale") or 1)
init_data = None
initialization = segment_list.find("Initialization")
if initialization is not None:
@ -388,6 +398,7 @@ class DASH:
media_url,
segment_url.get("mediaRange")
))
segment_durations.append(int(segment_url.get("duration") or 1))
elif segment_base is not None:
media_range = None
init_data = None
@ -420,6 +431,10 @@ class DASH:
log.debug(track.url)
sys.exit(1)
# TODO: Should we floor/ceil/round, or is int() ok?
track.data["dash"]["timescale"] = int(segment_timescale)
track.data["dash"]["segment_durations"] = segment_durations
if not track.drm and isinstance(track, (Video, Audio)):
try:
track.drm = [Widevine.from_init_data(init_data)]
@ -457,6 +472,7 @@ class DASH:
if downloader.__name__ == "aria2c" and any(bytes_range is not None for url, bytes_range in segments):
# aria2(c) is shit and doesn't support the Range header, fallback to the requests downloader
downloader = requests_downloader
log.warning("Falling back to the requests downloader as aria2(c) doesn't support the Range header")
for status_update in downloader(
urls=[

View File

@ -19,12 +19,13 @@ from pywidevine.cdm import Cdm as WidevineCdm
from pywidevine.pssh import PSSH
from requests import Session
from devine.core import binaries
from devine.core.constants import DOWNLOAD_CANCELLED, DOWNLOAD_LICENCE_ONLY, AnyTrack
from devine.core.downloaders import requests as requests_downloader
from devine.core.drm import DRM_T, ClearKey, Widevine
from devine.core.events import events
from devine.core.tracks import Audio, Subtitle, Tracks, Video
from devine.core.utilities import get_binary_path, get_extension, is_close_match, try_ensure_utf8
from devine.core.utilities import get_extension, is_close_match, try_ensure_utf8
class HLS:
@ -253,17 +254,24 @@ class HLS:
progress(total=total_segments)
downloader = track.downloader
if (
downloader.__name__ == "aria2c" and
any(x.byterange for x in master.segments if x not in unwanted_segments)
):
downloader = requests_downloader
log.warning("Falling back to the requests downloader as aria2(c) doesn't support the Range header")
urls: list[dict[str, Any]] = []
segment_durations: list[int] = []
range_offset = 0
for segment in master.segments:
if segment in unwanted_segments:
continue
segment_durations.append(int(segment.duration))
if segment.byterange:
if downloader.__name__ == "aria2c":
# aria2(c) is shit and doesn't support the Range header, fallback to the requests downloader
downloader = requests_downloader
byte_range = HLS.calculate_byte_range(segment.byterange, range_offset)
range_offset = byte_range.split("-")[0]
else:
@ -276,6 +284,8 @@ class HLS:
} if byte_range else {}
})
track.data["hls"]["segment_durations"] = segment_durations
segment_save_dir = save_dir / "segments"
for status_update in downloader(
@ -377,15 +387,27 @@ class HLS:
elif len(files) != range_len:
raise ValueError(f"Missing {range_len - len(files)} segment files for {segment_range}...")
if isinstance(drm, Widevine):
# with widevine we can merge all segments and decrypt once
merge(
to=merged_path,
via=files,
delete=True,
include_map_data=True
)
drm.decrypt(merged_path)
merged_path.rename(decrypted_path)
else:
# with other drm we must decrypt separately and then merge them
# for aes this is because each segment likely has 16-byte padding
for file in files:
drm.decrypt(file)
merge(
to=merged_path,
via=files,
delete=True,
include_map_data=True
)
events.emit(
events.Types.TRACK_DECRYPTED,
@ -556,8 +578,7 @@ class HLS:
Returns the file size of the merged file.
"""
ffmpeg = get_binary_path("ffmpeg")
if not ffmpeg:
if not binaries.FFMPEG:
raise EnvironmentError("FFmpeg executable was not found but is required to merge HLS segments.")
demuxer_file = segments[0].parent / "ffmpeg_concat_demuxer.txt"
@ -567,7 +588,7 @@ class HLS:
]))
subprocess.check_call([
ffmpeg, "-hide_banner",
binaries.FFMPEG, "-hide_banner",
"-loglevel", "panic",
"-f", "concat",
"-safe", "0",

View File

@ -3,8 +3,8 @@ import re
import subprocess
from typing import Optional
from devine.core import binaries
from devine.core.proxies.proxy import Proxy
from devine.core.utilities import get_binary_path
class Hola(Proxy):
@ -13,7 +13,7 @@ class Hola(Proxy):
Proxy Service using Hola's direct connections via the hola-proxy project.
https://github.com/Snawoot/hola-proxy
"""
self.binary = get_binary_path("hola-proxy")
self.binary = binaries.HolaProxy
if not self.binary:
raise EnvironmentError("hola-proxy executable not found but is required for the Hola proxy provider.")

View File

@ -37,7 +37,7 @@ class Attachment:
mime_type = {
".ttf": "application/x-truetype-font",
".otf": "application/vnd.ms-opentype"
}.get(path.suffix, mimetypes.guess_type(path)[0])
}.get(path.suffix.lower(), mimetypes.guess_type(path)[0])
if not mime_type:
raise ValueError("The attachment mime-type could not be automatically detected.")

View File

@ -7,7 +7,7 @@ from enum import Enum
from functools import partial
from io import BytesIO
from pathlib import Path
from typing import Any, Callable, Iterable, Optional
from typing import Any, Callable, Iterable, Optional, Union
import pycaption
import requests
@ -17,8 +17,10 @@ from pycaption.geometry import Layout
from pymp4.parser import MP4
from subtitle_filter import Subtitles
from devine.core import binaries
from devine.core.tracks.track import Track
from devine.core.utilities import get_binary_path, try_ensure_utf8
from devine.core.utilities import try_ensure_utf8
from devine.core.utils.webvtt import merge_segmented_webvtt
class Subtitle(Track):
@ -201,6 +203,26 @@ class Subtitle(Track):
self.convert(Subtitle.Codec.TimedTextMarkupLang)
elif self.codec == Subtitle.Codec.fVTT:
self.convert(Subtitle.Codec.WebVTT)
elif self.codec == Subtitle.Codec.WebVTT:
text = self.path.read_text("utf8")
if self.descriptor == Track.Descriptor.DASH:
if len(self.data["dash"]["segment_durations"]) > 1:
text = merge_segmented_webvtt(
text,
segment_durations=self.data["dash"]["segment_durations"],
timescale=self.data["dash"]["timescale"]
)
elif self.descriptor == Track.Descriptor.HLS:
if len(self.data["hls"]["segment_durations"]) > 1:
text = merge_segmented_webvtt(
text,
segment_durations=self.data["hls"]["segment_durations"],
timescale=1 # ?
)
caption_set = pycaption.WebVTTReader().read(text)
Subtitle.merge_same_cues(caption_set)
subtitle_text = pycaption.WebVTTWriter().write(caption_set)
self.path.write_text(subtitle_text, encoding="utf8")
def convert(self, codec: Subtitle.Codec) -> Path:
"""
@ -233,14 +255,13 @@ class Subtitle(Track):
output_path = self.path.with_suffix(f".{codec.value.lower()}")
sub_edit_executable = get_binary_path("SubtitleEdit")
if sub_edit_executable and self.codec not in (Subtitle.Codec.fTTML, Subtitle.Codec.fVTT):
if binaries.SubtitleEdit and self.codec not in (Subtitle.Codec.fTTML, Subtitle.Codec.fVTT):
sub_edit_format = {
Subtitle.Codec.SubStationAlphav4: "AdvancedSubStationAlpha",
Subtitle.Codec.TimedTextMarkupLang: "TimedText1.0"
}.get(codec, codec.name)
sub_edit_args = [
sub_edit_executable,
binaries.SubtitleEdit,
"/Convert", self.path, sub_edit_format,
f"/outputfilename:{output_path.name}",
"/encoding:utf8"
@ -308,14 +329,7 @@ class Subtitle(Track):
caption_lists[language] = caption_list
caption_set: pycaption.CaptionSet = pycaption.CaptionSet(caption_lists)
elif codec == Subtitle.Codec.WebVTT:
text = try_ensure_utf8(data).decode("utf8")
# Segmented VTT when merged may have the WEBVTT headers part of the next caption
# if they are not separated far enough from the previous caption, hence the \n\n
text = text. \
replace("WEBVTT", "\n\nWEBVTT"). \
replace("\r", ""). \
replace("\n\n\n", "\n \n\n"). \
replace("\n\n<", "\n<")
text = Subtitle.space_webvtt_headers(data)
caption_set = pycaption.WebVTTReader().read(text)
else:
raise ValueError(f"Unknown Subtitle format \"{codec}\"...")
@ -332,6 +346,27 @@ class Subtitle(Track):
return caption_set
@staticmethod
def space_webvtt_headers(data: Union[str, bytes]):
"""
Space out the WEBVTT Headers from Captions.
Segmented VTT when merged may have the WEBVTT headers part of the next caption
as they were not separated far enough from the previous caption and ended up
being considered as caption text rather than the header for the next segment.
"""
if isinstance(data, bytes):
data = try_ensure_utf8(data).decode("utf8")
elif not isinstance(data, str):
raise ValueError(f"Expecting data to be a str, not {data!r}")
text = data.replace("WEBVTT", "\n\nWEBVTT").\
replace("\r", "").\
replace("\n\n\n", "\n \n\n").\
replace("\n\n<", "\n<")
return text
@staticmethod
def merge_same_cues(caption_set: pycaption.CaptionSet):
"""Merge captions with the same timecodes and text as one in-place."""
@ -500,8 +535,7 @@ class Subtitle(Track):
if not self.path or not self.path.exists():
raise ValueError("You must download the subtitle track first.")
executable = get_binary_path("SubtitleEdit")
if executable:
if binaries.SubtitleEdit:
if self.codec == Subtitle.Codec.SubStationAlphav4:
output_format = "AdvancedSubStationAlpha"
elif self.codec == Subtitle.Codec.TimedTextMarkupLang:
@ -510,7 +544,7 @@ class Subtitle(Track):
output_format = self.codec.name
subprocess.run(
[
executable,
binaries.SubtitleEdit,
"/Convert", self.path, output_format,
"/encoding:utf8",
"/overwrite",
@ -539,8 +573,7 @@ class Subtitle(Track):
if not self.path or not self.path.exists():
raise ValueError("You must download the subtitle track first.")
executable = get_binary_path("SubtitleEdit")
if not executable:
if not binaries.SubtitleEdit:
raise EnvironmentError("SubtitleEdit executable not found...")
if self.codec == Subtitle.Codec.SubStationAlphav4:
@ -552,7 +585,7 @@ class Subtitle(Track):
subprocess.run(
[
executable,
binaries.SubtitleEdit,
"/Convert", self.path, output_format,
"/ReverseRtlStartEnd",
"/encoding:utf8",

View File

@ -4,6 +4,7 @@ import logging
import re
import shutil
import subprocess
from collections import defaultdict
from copy import copy
from enum import Enum
from functools import partial
@ -15,12 +16,13 @@ from zlib import crc32
from langcodes import Language
from requests import Session
from devine.core import binaries
from devine.core.config import config
from devine.core.constants import DOWNLOAD_CANCELLED, DOWNLOAD_LICENCE_ONLY
from devine.core.downloaders import aria2c, curl_impersonate, requests
from devine.core.drm import DRM_T, Widevine
from devine.core.events import events
from devine.core.utilities import get_binary_path, get_boxes, try_ensure_utf8
from devine.core.utilities import get_boxes, try_ensure_utf8
from devine.core.utils.subprocess import ffprobe
@ -41,7 +43,7 @@ class Track:
drm: Optional[Iterable[DRM_T]] = None,
edition: Optional[str] = None,
downloader: Optional[Callable] = None,
data: Optional[dict] = None,
data: Optional[Union[dict, defaultdict]] = None,
id_: Optional[str] = None,
) -> None:
if not isinstance(url, (str, list)):
@ -62,8 +64,8 @@ class Track:
raise TypeError(f"Expected edition to be a {str}, not {type(edition)}")
if not isinstance(downloader, (Callable, type(None))):
raise TypeError(f"Expected downloader to be a {Callable}, not {type(downloader)}")
if not isinstance(data, (dict, type(None))):
raise TypeError(f"Expected data to be a {dict}, not {type(data)}")
if not isinstance(data, (dict, defaultdict, type(None))):
raise TypeError(f"Expected data to be a {dict} or {defaultdict}, not {type(data)}")
invalid_urls = ", ".join(set(type(x) for x in url if not isinstance(x, str)))
if invalid_urls:
@ -92,6 +94,7 @@ class Track:
self.drm = drm
self.edition: str = edition
self.downloader = downloader
self._data: defaultdict[Any, Any] = defaultdict(dict)
self.data = data or {}
if self.name is None:
@ -131,6 +134,42 @@ class Track:
def __eq__(self, other: Any) -> bool:
return isinstance(other, Track) and self.id == other.id
@property
def data(self) -> defaultdict[Any, Any]:
"""
Arbitrary track data dictionary.
A defaultdict is used with a dict as the factory for easier
nested saving and safer exists-checks.
Reserved keys:
- "hls" used by the HLS class.
- playlist: m3u8.model.Playlist - The primary track information.
- media: m3u8.model.Media - The audio/subtitle track information.
- segment_durations: list[int] - A list of each segment's duration.
- "dash" used by the DASH class.
- manifest: lxml.ElementTree - DASH MPD manifest.
- period: lxml.Element - The period of this track.
- adaptation_set: lxml.Element - The adaptation set of this track.
- representation: lxml.Element - The representation of this track.
- timescale: int - The timescale of the track's segments.
- segment_durations: list[int] - A list of each segment's duration.
You should not add, change, or remove any data within reserved keys.
You may use their data but do note that the values of them may change
or be removed at any point.
"""
return self._data
@data.setter
def data(self, value: Union[dict, defaultdict]) -> None:
if not isinstance(value, (dict, defaultdict)):
raise TypeError(f"Expected data to be a {dict} or {defaultdict}, not {type(value)}")
if isinstance(value, dict):
value = defaultdict(dict, **value)
self._data = value
def download(
self,
session: Session,
@ -470,8 +509,7 @@ class Track:
if not self.path or not self.path.exists():
raise ValueError("Cannot repackage a Track that has not been downloaded.")
executable = get_binary_path("ffmpeg")
if not executable:
if not binaries.FFMPEG:
raise EnvironmentError("FFmpeg executable \"ffmpeg\" was not found but is required for this call.")
original_path = self.path
@ -480,7 +518,7 @@ class Track:
def _ffmpeg(extra_args: list[str] = None):
subprocess.run(
[
executable, "-hide_banner",
binaries.FFMPEG, "-hide_banner",
"-loglevel", "error",
"-i", original_path,
*(extra_args or []),
@ -504,6 +542,7 @@ class Track:
else:
raise
original_path.unlink()
self.path = output_path

View File

@ -316,7 +316,7 @@ class Tracks:
][:per_language or None])
return selected
def mux(self, title: str, delete: bool = True, progress: Optional[partial] = None) -> tuple[Path, int]:
def mux(self, title: str, delete: bool = True, progress: Optional[partial] = None) -> tuple[Path, int, list[str]]:
"""
Multiplex all the Tracks into a Matroska Container file.
@ -410,15 +410,18 @@ class Tracks:
# let potential failures go to caller, caller should handle
try:
errors = []
p = subprocess.Popen([
*cl,
"--output", str(output_path),
"--gui-mode"
], text=True, stdout=subprocess.PIPE)
for line in iter(p.stdout.readline, ""):
if line.startswith("#GUI#error") or line.startswith("#GUI#warning"):
errors.append(line)
if "progress" in line:
progress(total=100, completed=int(line.strip()[14:-1]))
return output_path, p.wait()
return output_path, p.wait(), errors
finally:
if chapters_path:
# regardless of delete param, we delete as it's a file we made during muxing

View File

@ -10,10 +10,11 @@ from typing import Any, Optional, Union
from langcodes import Language
from devine.core import binaries
from devine.core.config import config
from devine.core.tracks.subtitle import Subtitle
from devine.core.tracks.track import Track
from devine.core.utilities import FPS, get_binary_path, get_boxes
from devine.core.utilities import FPS, get_boxes
class Video(Track):
@ -257,8 +258,7 @@ class Video(Track):
f"it's codec, {self.codec.value}, is not yet supported."
)
executable = get_binary_path("ffmpeg")
if not executable:
if not binaries.FFMPEG:
raise EnvironmentError("FFmpeg executable \"ffmpeg\" was not found but is required for this call.")
filter_key = {
@ -270,7 +270,7 @@ class Video(Track):
output_path = original_path.with_stem(f"{original_path.stem}_{['limited', 'full'][range_]}_range")
subprocess.run([
executable, "-hide_banner",
binaries.FFMPEG, "-hide_banner",
"-loglevel", "panic",
"-i", original_path,
"-codec", "copy",
@ -288,8 +288,7 @@ class Video(Track):
if not self.path:
raise ValueError("You must download the track first.")
executable = get_binary_path("ccextractor", "ccextractorwin", "ccextractorwinfull")
if not executable:
if not binaries.CCExtractor:
raise EnvironmentError("ccextractor executable was not found.")
# ccextractor often fails in weird ways unless we repack
@ -299,7 +298,7 @@ class Video(Track):
try:
subprocess.run([
executable,
binaries.CCExtractor,
"-trim",
"-nobom",
"-noru", "-ru1",
@ -380,8 +379,7 @@ class Video(Track):
if not self.path or not self.path.exists():
raise ValueError("Cannot clean a Track that has not been downloaded.")
executable = get_binary_path("ffmpeg")
if not executable:
if not binaries.FFMPEG:
raise EnvironmentError("FFmpeg executable \"ffmpeg\" was not found but is required for this call.")
log = logging.getLogger("x264-clean")
@ -402,7 +400,7 @@ class Video(Track):
original_path = self.path
cleaned_path = original_path.with_suffix(f".cleaned{original_path.suffix}")
subprocess.run([
executable, "-hide_banner",
binaries.FFMPEG, "-hide_banner",
"-loglevel", "panic",
"-i", original_path,
"-map_metadata", "-1",

View File

@ -3,7 +3,6 @@ import contextlib
import importlib.util
import os
import re
import shutil
import socket
import sys
import time
@ -87,15 +86,6 @@ def import_module_by_path(path: Path) -> ModuleType:
return module
def get_binary_path(*names: str) -> Optional[Path]:
"""Find the path of the first found binary name."""
for name in names:
path = shutil.which(name)
if path:
return Path(path)
return None
def sanitize_filename(filename: str, spacer: str = ".") -> str:
"""
Sanitize a string to be filename safe.
@ -133,18 +123,18 @@ def get_boxes(data: bytes, box_type: bytes, as_bytes: bool = False) -> Box:
# since it doesn't care what child box the wanted box is from, this works fine.
if not isinstance(data, (bytes, bytearray)):
raise ValueError("data must be bytes")
offset = 0
while True:
try:
index = data.index(box_type)
index = data[offset:].index(box_type)
except ValueError:
break
if index < 0:
break
if index > 4:
index -= 4 # size is before box type and is 4 bytes long
data = data[index:]
try:
box = Box.parse(data)
box = Box.parse(data[offset:][index:])
except IOError:
# since get_init_segment might cut off unexpectedly, pymp4 may be unable to read
# the expected amounts of data and complain, so let's just end the function here
@ -157,6 +147,7 @@ def get_boxes(data: bytes, box_type: bytes, as_bytes: bool = False) -> Box:
raise e
if as_bytes:
box = Box.build(box)
offset += index + len(Box.build(box))
yield box

View File

@ -3,11 +3,16 @@ import subprocess
from pathlib import Path
from typing import Union
from devine.core import binaries
def ffprobe(uri: Union[bytes, Path]) -> dict:
"""Use ffprobe on the provided data to get stream information."""
if not binaries.FFProbe:
raise EnvironmentError("FFProbe executable \"ffprobe\" not found but is required.")
args = [
"ffprobe",
binaries.FFProbe,
"-v", "quiet",
"-of", "json",
"-show_streams"

191
devine/core/utils/webvtt.py Normal file
View File

@ -0,0 +1,191 @@
import re
import sys
import typing
from typing import Optional
from pycaption import Caption, CaptionList, CaptionNode, CaptionReadError, WebVTTReader, WebVTTWriter
class CaptionListExt(CaptionList):
@typing.no_type_check
def __init__(self, iterable=None, layout_info=None):
self.first_segment_mpegts = 0
super().__init__(iterable, layout_info)
class CaptionExt(Caption):
@typing.no_type_check
def __init__(self, start, end, nodes, style=None, layout_info=None, segment_index=0, mpegts=0, cue_time=0.0):
style = style or {}
self.segment_index: int = segment_index
self.mpegts: float = mpegts
self.cue_time: float = cue_time
super().__init__(start, end, nodes, style, layout_info)
class WebVTTReaderExt(WebVTTReader):
# HLS extension support <https://datatracker.ietf.org/doc/html/rfc8216#section-3.5>
RE_TIMESTAMP_MAP = re.compile(r"X-TIMESTAMP-MAP.*")
RE_MPEGTS = re.compile(r"MPEGTS:(\d+)")
RE_LOCAL = re.compile(r"LOCAL:((?:(\d{1,}):)?(\d{2}):(\d{2})\.(\d{3}))")
def _parse(self, lines: list[str]) -> CaptionList:
captions = CaptionListExt()
start = None
end = None
nodes: list[CaptionNode] = []
layout_info = None
found_timing = False
segment_index = -1
mpegts = 0
cue_time = 0.0
# The first segment MPEGTS is needed to calculate the rest. It is possible that
# the first segment contains no cue and is ignored by pycaption, this acts as a fallback.
captions.first_segment_mpegts = 0
for i, line in enumerate(lines):
if "-->" in line:
found_timing = True
timing_line = i
last_start_time = captions[-1].start if captions else 0
try:
start, end, layout_info = self._parse_timing_line(line, last_start_time)
except CaptionReadError as e:
new_msg = f"{e.args[0]} (line {timing_line})"
tb = sys.exc_info()[2]
raise type(e)(new_msg).with_traceback(tb) from None
elif "" == line:
if found_timing and nodes:
found_timing = False
caption = CaptionExt(
start,
end,
nodes,
layout_info=layout_info,
segment_index=segment_index,
mpegts=mpegts,
cue_time=cue_time,
)
captions.append(caption)
nodes = []
elif "WEBVTT" in line:
# Merged segmented VTT doesn't have index information, track manually.
segment_index += 1
mpegts = 0
cue_time = 0.0
elif m := self.RE_TIMESTAMP_MAP.match(line):
if r := self.RE_MPEGTS.search(m.group()):
mpegts = int(r.group(1))
cue_time = self._parse_local(m.group())
# Early assignment in case the first segment contains no cue.
if segment_index == 0:
captions.first_segment_mpegts = mpegts
else:
if found_timing:
if nodes:
nodes.append(CaptionNode.create_break())
nodes.append(CaptionNode.create_text(self._decode(line)))
else:
# it's a comment or some metadata; ignore it
pass
# Add a last caption if there are remaining nodes
if nodes:
caption = CaptionExt(start, end, nodes, layout_info=layout_info, segment_index=segment_index, mpegts=mpegts)
captions.append(caption)
return captions
@staticmethod
def _parse_local(string: str) -> float:
"""
Parse WebVTT LOCAL time and convert it to seconds.
"""
m = WebVTTReaderExt.RE_LOCAL.search(string)
if not m:
return 0
parsed = m.groups()
if not parsed:
return 0
hours = int(parsed[1])
minutes = int(parsed[2])
seconds = int(parsed[3])
milliseconds = int(parsed[4])
return (milliseconds / 1000) + seconds + (minutes * 60) + (hours * 3600)
def merge_segmented_webvtt(vtt_raw: str, segment_durations: Optional[list[int]] = None, timescale: int = 1) -> str:
"""
Merge Segmented WebVTT data.
Parameters:
vtt_raw: The concatenated WebVTT files to merge. All WebVTT headers must be
appropriately spaced apart, or it may produce unwanted effects like
considering headers as captions, timestamp lines, etc.
segment_durations: A list of each segment's duration. If not provided it will try
to get it from the X-TIMESTAMP-MAP headers, specifically the MPEGTS number.
timescale: The number of time units per second.
This parses the X-TIMESTAMP-MAP data to compute new absolute timestamps, replacing
the old start and end timestamp values. All X-TIMESTAMP-MAP header information will
be removed from the output as they are no longer of concern. Consider this function
the opposite of a WebVTT Segmenter, a WebVTT Joiner of sorts.
Algorithm borrowed from N_m3u8DL-RE and shaka-player.
"""
MPEG_TIMESCALE = 90_000
vtt = WebVTTReaderExt().read(vtt_raw)
for lang in vtt.get_languages():
prev_caption = None
duplicate_index: list[int] = []
captions = vtt.get_captions(lang)
if captions[0].segment_index == 0:
first_segment_mpegts = captions[0].mpegts
else:
first_segment_mpegts = segment_durations[0] if segment_durations else captions.first_segment_mpegts
caption: CaptionExt
for i, caption in enumerate(captions):
# DASH WebVTT doesn't have MPEGTS timestamp like HLS. Instead,
# calculate the timestamp from SegmentTemplate/SegmentList duration.
likely_dash = first_segment_mpegts == 0 and caption.mpegts == 0
if likely_dash and segment_durations:
duration = segment_durations[caption.segment_index]
caption.mpegts = MPEG_TIMESCALE * (duration / timescale)
if caption.mpegts == 0:
continue
seconds = (caption.mpegts - first_segment_mpegts) / MPEG_TIMESCALE - caption.cue_time
offset = seconds * 1_000_000 # pycaption use microseconds
if caption.start < offset:
caption.start += offset
caption.end += offset
# If the difference between current and previous captions is <=1ms
# and the payload is equal then splice.
if (
prev_caption
and not caption.is_empty()
and (caption.start - prev_caption.end) <= 1000 # 1ms in microseconds
and caption.get_text() == prev_caption.get_text()
):
prev_caption.end = caption.end
duplicate_index.append(i)
prev_caption = caption
# Remove duplicate
captions[:] = [c for c_index, c in enumerate(captions) if c_index not in set(duplicate_index)]
return WebVTTWriter().write(vtt)

42
poetry.lock generated
View File

@ -523,28 +523,31 @@ testing = ["cssselect", "importlib-resources", "jaraco.test (>=5.1)", "lxml", "p
[[package]]
name = "curl-cffi"
version = "0.6.2"
description = "libcurl ffi bindings for Python, with impersonation support"
version = "0.7.0b4"
description = "libcurl ffi bindings for Python, with impersonation support."
optional = false
python-versions = ">=3.8"
files = [
{file = "curl_cffi-0.6.2-cp38-abi3-macosx_10_9_x86_64.whl", hash = "sha256:23b8a2872b160718c04b06b1f8aa4fb1a2f4f94bce7040493515e081a27cad19"},
{file = "curl_cffi-0.6.2-cp38-abi3-macosx_11_0_arm64.whl", hash = "sha256:ad3c1cf5360810825ec4bc3da425f26ee4098878a615dab9d309a99afd883ba9"},
{file = "curl_cffi-0.6.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:3d01de6ed737ad1924aaa0198195b9020c38e77ce90ea3d72b9eacf4938c7adf"},
{file = "curl_cffi-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:37e513cc149d024a2d625e202f2cc9d4423d2937343ea2e06f797d99779e62dc"},
{file = "curl_cffi-0.6.2-cp38-abi3-win32.whl", hash = "sha256:12e829af97cbf7c1d5afef177e786f6f404ddf163b08897a1ed087cadbeb4837"},
{file = "curl_cffi-0.6.2-cp38-abi3-win_amd64.whl", hash = "sha256:3791b7a9ae4cb1298165300f2dc2d60a86779f055570ae83163fc2d8a74bf714"},
{file = "curl_cffi-0.6.2.tar.gz", hash = "sha256:9ee519e960b5fc6e0bbf13d0ecba9ce5f6306cb929354504bf03cc30f59a8f63"},
{file = "curl_cffi-0.7.0b4-cp38-abi3-macosx_10_9_x86_64.whl", hash = "sha256:694d88f7065c59c651970f14bc415431f65ac601a9ba537463d70f432a48ccfc"},
{file = "curl_cffi-0.7.0b4-cp38-abi3-macosx_11_0_arm64.whl", hash = "sha256:6faf01aa8d98d322b877d3d801544692c73729ea6eb4a45af83514a4ecd1c8fe"},
{file = "curl_cffi-0.7.0b4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5d39849371bbf3eab048113693715a8da5c729c494cccfa1128d768d96fdc31e"},
{file = "curl_cffi-0.7.0b4-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:e3a5099b98c4bf12cc1afecb3409a9c57e7ebce9447a03c96dfb661ad8fa5e79"},
{file = "curl_cffi-0.7.0b4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7e3616141a2a0be7896e7dc5da1ed3965e1a78aa2e563d8aba7a641135aeaf1b"},
{file = "curl_cffi-0.7.0b4-cp38-abi3-musllinux_1_1_aarch64.whl", hash = "sha256:bd16cccc0d3e93c2fbc4f4cb7cce0e10cb2ef7f8957352f3f0d770f0d6e05702"},
{file = "curl_cffi-0.7.0b4-cp38-abi3-musllinux_1_1_x86_64.whl", hash = "sha256:d65aa649abb24020c2ad7b3ce45e2816d1ffe25df06f1a6b0f52fbf353af82e0"},
{file = "curl_cffi-0.7.0b4-cp38-abi3-win32.whl", hash = "sha256:b55c53bb6dff713cb63f76e2f147e2d54c984b1b09df66b08f52f3acae1aeca0"},
{file = "curl_cffi-0.7.0b4-cp38-abi3-win_amd64.whl", hash = "sha256:449ab07e07335558997cd62296b5c4f16ce27630de7830e4ad22441049a0ef1e"},
{file = "curl_cffi-0.7.0b4.tar.gz", hash = "sha256:c09a062b8aac93d4890d2c33b7053c0e1a5cf275328b80c1fb1a950310df75f2"},
]
[package.dependencies]
certifi = "*"
certifi = ">=2024.2.2"
cffi = ">=1.12.0"
[package.extras]
build = ["cibuildwheel", "wheel"]
dev = ["autoflake (==1.4)", "coverage (==6.4.1)", "cryptography (==38.0.3)", "flake8 (==6.0.0)", "flake8-bugbear (==22.7.1)", "flake8-pie (==0.15.0)", "httpx (==0.23.1)", "mypy (==0.971)", "nest-asyncio (==1.6.0)", "pytest (==7.1.2)", "pytest-asyncio (==0.19.0)", "pytest-trio (==0.7.0)", "ruff (==0.1.14)", "trio (==0.21.0)", "trio-typing (==0.7.0)", "trustme (==0.9.0)", "types-certifi (==2021.10.8.2)", "uvicorn (==0.18.3)", "websockets (==11.0.3)"]
test = ["cryptography (==38.0.3)", "fastapi (==0.100.0)", "httpx (==0.23.1)", "nest-asyncio (==1.6.0)", "proxy.py (==2.4.3)", "pytest (==7.1.2)", "pytest-asyncio (==0.19.0)", "pytest-trio (==0.7.0)", "python-multipart (==0.0.6)", "trio (==0.21.0)", "trio-typing (==0.7.0)", "trustme (==0.9.0)", "types-certifi (==2021.10.8.2)", "uvicorn (==0.18.3)", "websockets (==11.0.3)"]
dev = ["charset-normalizer (>=3.3.2,<4.0)", "coverage (>=6.4.1,<7.0)", "cryptography (>=42.0.5,<43.0)", "httpx (==0.23.1)", "mypy (>=1.9.0,<2.0)", "pytest (>=8.1.1,<9.0)", "pytest-asyncio (>=0.23.6,<1.0)", "pytest-trio (>=0.8.0,<1.0)", "ruff (>=0.3.5,<1.0)", "trio (>=0.25.0,<1.0)", "trustme (>=1.1.0,<2.0)", "uvicorn (>=0.29.0,<1.0)", "websockets (>=12.0,<13.0)"]
test = ["charset-normalizer (>=3.3.2,<4.0)", "cryptography (>=42.0.5,<43.0)", "fastapi (==0.110.0)", "httpx (==0.23.1)", "proxy.py (>=2.4.3,<3.0)", "pytest (>=8.1.1,<9.0)", "pytest-asyncio (>=0.23.6,<1.0)", "pytest-trio (>=0.8.0,<1.0)", "python-multipart (>=0.0.9,<1.0)", "trio (>=0.25.0,<1.0)", "trustme (>=1.1.0,<2.0)", "uvicorn (>=0.29.0,<1.0)", "websockets (>=12.0,<13.0)"]
[[package]]
name = "distlib"
@ -727,20 +730,21 @@ testing = ["bson", "ecdsa", "feedparser", "gmpy2", "numpy", "pandas", "pymongo",
[[package]]
name = "langcodes"
version = "3.3.0"
version = "3.4.0"
description = "Tools for labeling human languages with IETF language tags"
optional = false
python-versions = ">=3.6"
python-versions = ">=3.8"
files = [
{file = "langcodes-3.3.0-py3-none-any.whl", hash = "sha256:4d89fc9acb6e9c8fdef70bcdf376113a3db09b67285d9e1d534de6d8818e7e69"},
{file = "langcodes-3.3.0.tar.gz", hash = "sha256:794d07d5a28781231ac335a1561b8442f8648ca07cd518310aeb45d6f0807ef6"},
{file = "langcodes-3.4.0-py3-none-any.whl", hash = "sha256:10a4cc078b8e8937d8485d3352312a0a89a3125190db9f2bb2074250eef654e9"},
{file = "langcodes-3.4.0.tar.gz", hash = "sha256:ae5a77d1a01d0d1e91854a671890892b7ce9abb601ab7327fc5c874f899e1979"},
]
[package.dependencies]
language-data = {version = ">=1.1,<2.0", optional = true, markers = "extra == \"data\""}
language-data = ">=1.2"
[package.extras]
data = ["language-data (>=1.1,<2.0)"]
build = ["build", "twine"]
test = ["pytest", "pytest-cov"]
[[package]]
name = "language-data"
@ -2004,4 +2008,4 @@ multidict = ">=4.0"
[metadata]
lock-version = "2.0"
python-versions = ">=3.9,<4.0"
content-hash = "db110c0b1b9e30309fcd4e0f80e4369f20b055651f1bef81d0f5e6153a250dec"
content-hash = "8bbbd788ab179a0669e8d7c6f45c0746e79f11c24ac39f6d4856563e76ec2f94"

View File

@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
[tool.poetry]
name = "devine"
version = "3.3.2"
version = "3.3.3"
description = "Modular Movie, TV, and Music Archival Software."
license = "GPL-3.0-only"
authors = ["rlaphoenix <rlaphoenix@pm.me>"]
@ -40,7 +40,7 @@ click = "^8.1.7"
construct = "^2.8.8"
crccheck = "^1.3.0"
jsonpickle = "^3.0.4"
langcodes = { extras = ["data"], version = "^3.3.0" }
langcodes = { extras = ["data"], version = "^3.4.0" }
lxml = "^5.2.1"
pproxy = "^2.7.9"
protobuf = "^4.25.3"
@ -61,9 +61,7 @@ subtitle-filter = "^1.4.9"
Unidecode = "^1.3.8"
urllib3 = "^2.2.1"
chardet = "^5.2.0"
curl-cffi = "^0.6.2"
language-data = "^1.2.0"
marisa-trie = "^1.1.0"
curl-cffi = "^0.7.0b4"
[tool.poetry.dev-dependencies]
pre-commit = "^3.7.0"