5 Detecting track language from filename
Moritz Bunkus edited this page 2020-07-30 15:07:45 +00:00

Can the GUI derive the track language from the file name?

The problem

You have several files that contain the language or language code in their name, e.g. The Incredibles [fre].srt. You'd like the GUI to auto-select fre as the track's language when you add such a file.

The answer

Starting with v21 the GUI supports deriving the track language from file names. The functionality can be configured wrt. several aspects.

First off, the GUI only recognizes languages, not flags such as "forced display" or "default track".

As for the languages, you configure how it works (and whether or not at all) in the preferences → "Multiplexer" → "Deriving track languages". In order to enable recognition, follow these steps:

  1. Enable it for subtitles in whatever mode you prefer. For SRTs any mode should work as the SRT container doesn't provide a language for tracks, but if you chose "also if the track language is 'undefined'" it might also work for container formats that do provide a language for tracks such as Matroska or MP4.
  2. Decide which languages you want the GUI to detect and put them in the "Selected" side for "Recognized languages". The defaults are to detect all supported languages, which might lead to interesting mis-detection as the list is long and several language codes are regular English words, too.
  3. When in doubt, reset the "regular expression" to its default by clicking the yellow round-arrrow-thingy button on the right side. This "regular expression" is used for matching against the file name (without the path). It determines which characters the language's name must be surrounded by.

You're set now. The default "regular expression" will recognize languages if they're surrounded on the left and the right by one of the following special characters:

  • on the left: [ ( { . + = # -
  • on the right: ] ) } . + = # -

Note that a space is NOT a valid boundary — I've only inserted spaces above in order to make the characters easier to recognize. You might also notice that the underscore isn't part of the default "regular expression" either. If you need either of the characters to be recognized as a boundy, you'll have to adjust the expression and insert them in appropriate places.

As for what exactly you must use as the language: the default "regular expression" allows for:

  • ISO 639-1 language codes (e.g. de for German)
  • ISO 639-2 language codes, both the bibliographic and the terminology versions (e.g. ger or deu for German)
  • English names of languages as found in the ISO 639-2 language list (e.g. German for German)

When in doubt use ISO 639-2 codes (ger).

Putting it all together: movie[ger].srt should work with the aforementioned settings whereas neither movie (Deutsch).srt ("Deutsch" is not an English name of a language) nor movie_ger.srt will (underscores aren't valid boundary characters by default).

You don't have to worry about the "default values" section in the preferences, BTW. Those are only fallback values. The language-in-file-name-recognition has precedence.

Categories: merging, metadata