0 Characters (e.g. Umlauts) aren't shown correctly
Moritz Bunkus edited this page 2016-04-24 19:00:31 +02:00

Why are my Umlauts in subtitles or chapters not shown correctly?

Most subtitle files are just plain text files. The problem with plain text files is that they can use any encoding scheme, and that information is not part of an SRT file (in general a text file's encoding is not part of the text file itself making such a situation rather common).

MKVToolNix makes certain assumptions about the encoding of text files. By default it assumes that a text file uses the same encoding as the operating system MKVToolNix is run on. This depends both on the OS as well as on the OS's language. For Linux nowadays this usually defaults to UTF-8, on e.g. a European Windows it's usually Windows-1252 (also called CP-1252) or one of the related ones in that family.

So if non-ASCII characters in your subtitle don't display right the subtitle file's encoding is most likely not the one derived from your operating system and you'll have to tell mkvmerge the actual encoding (deriving the encoding solely from the text file is technically impossible to get right). For that the GUI provides the "subtitle character set" drop-down box when you've got a text subtitle track selected and a similar one for chapters on the "output" tab.

A new feature implemented in release 8.5.0 makes this a bit easier. Next to said drop-down box there's now a button. Pressing it will open a preview window showing you the file's content interpreted according whatever encoding you've currently chosen.

As a wild guess I suggest you try UTF-8 first. Usually text files use either the platform's native encoding (Windows-1252 and similar) or UTF-8; so trying UTF-8 first might be a good place to start.

But MediaInfo shows subtitles in Matroska are already UTF-8 encoded!

Well, Matroska requires its content to be encoded in UTF-8. However, mkvmerge has to know the text file's actual encoding so it can convert from the right one to UTF-8. If it's the wrong one then this conversion is what breaks the non-ASCII characters.

Categories: merging, playback