Update Detecting track language from filename

Moritz Bunkus 2020-07-30 15:07:45 +00:00
parent 19a599bd31
commit 71b28ab9ab

@ -6,6 +6,33 @@ You have several files that contain the language or language code in their name,
## The answer
Up to and including v21 the GUI did not support such functionality due to various concerns. I've since reconsidered. Release v22 will contain such functionality.
Starting with v21 the GUI supports deriving the track language from file names. The functionality can be configured wrt. several aspects.
First off, the GUI only recognizes languages, not flags such as "forced display" or "default track".
As for the languages, you configure how it works (and whether or not at all) in the preferences → "Multiplexer" → "Deriving track languages". In order to enable recognition, follow these steps:
1. Enable it for subtitles in whatever mode you prefer. For SRTs any mode should work as the SRT container doesn't provide a language for tracks, but if you chose "also if the track language is 'undefined'" it might also work for container formats that do provide a language for tracks such as Matroska or MP4.
2. Decide which languages you want the GUI to detect and put them in the "Selected" side for "Recognized languages". The defaults are to detect all supported languages, which might lead to interesting mis-detection as the list is long and several language codes are regular English words, too.
3. When in doubt, reset the "regular expression" to its default by clicking the yellow round-arrrow-thingy button on the right side. This "regular expression" is used for matching against the file name (without the path). It determines which characters the language's name must be surrounded by.
You're set now. The default "regular expression" will recognize languages if they're surrounded on the left and the right by one of the following special characters:
* on the left: `[ ( { . + = # -`
* on the right: `] ) } . + = # -`
Note that a space is NOT a valid boundary — I've only inserted spaces above in order to make the characters easier to recognize. You might also notice that the underscore isn't part of the default "regular expression" either. If you need either of the characters to be recognized as a boundy, you'll have to adjust the expression and insert them in appropriate places.
As for what exactly you must use as the language: the default "regular expression" allows for:
* ISO 639-1 language codes (e.g. `de` for German)
* ISO 639-2 language codes, both the bibliographic and the terminology versions (e.g. `ger` or `deu` for German)
* English names of languages as found in the ISO 639-2 language list (e.g. `German` for German)
When in doubt use ISO 639-2 codes (`ger`).
Putting it all together: `movie[ger].srt` should work with the aforementioned settings whereas neither `movie (Deutsch).srt` ("Deutsch" is not an English name of a language) nor `movie_ger.srt` will (underscores aren't valid boundary characters by default).
You don't have to worry about the "default values" section in the preferences, BTW. Those are only fallback values. The language-in-file-name-recognition has precedence.
Categories: [merging](FAQ#category-merging), [metadata](FAQ#category-metadata)