Created Detecting track language from filename (markdown)

Moritz Bunkus 2015-02-07 18:20:33 +01:00
parent 415cd0b0bf
commit f1ff1b2447

@ -0,0 +1,19 @@
# Can mmg derive the track language from the file name?
## The problem
You have several files that contain the language or language code in their name, e.g. `The Incredibles [fre].srt`. You'd like mmg to auto-select `fre` as the track's language when you add such a file.
## The answer
mmg does not support this and most likely never will. There are simply way too many cases in which such a detection cannot be performed reliably and correctly. Here are a couple of examples/problems:
1. If I wanted to detect full language names (e.g. "English") then how should `Johnny English.srt` or, even worse, `The French Connection.srt` be handled?
2. If I wanted to detect full language names it would be restricted to their English representations. But what if the file name contains a localized language name? In the following example the language would be "English" again, but the whole file name is in French: `Le fabuleux destin d'Amélie Poulain (Anglais).srt`
3. So we're back to short ISO-689 codes. Even if I only try to detect them at word boundaries there's a huge problem as some of those three-letter codes are actual words in certain languages. Let's take `Le juge est une femme.srt` Here `est` is the language code for `Estonian`.
4. Another problem occurs with files that can contain multiple tracks. Let's take `Life of Brian (FRE).mkv` as an example. What if there's one audio track and one subtitle track in that file? Which of those would `FRE` refer to?
5. The next problem is that detecting the language code is one thing. Making the user aware that mmg has tried to detect such a code is another matter entirely. If the language drop-down box is simply set accordingly then the user will only realize this if he clicks on all added tracks in turn. But he has to do that because it's such a frail mechanism that he has to verify the results. And there's no way for him to determine how reliable that pre-selected language is! It could come from the container (e.g. language codes stored in the `.idx` file of a VobSub subtitle). Then it would be reliable. But it could come from the file name. In which case it simply isn't reliable.
And those are just the examples I can think of off the top of my head.
Categories: [merging](Category-merging), [metadata](Category-metadata)