With this change both legacy language/country elements and IETF BCP 47
language tags will be normalized when chapters are read or
written. This fixes a couple of corner cases in all programs dealing
with chapters:
1. IETF BCP 47 elements will now always be created before writing
chapters unless IETF BCP 47 elements are disabled. This wasn't
always the case when chapters were read from Matroska files.
2. When a chapter display element contains legacy language & country
elements but no IETF BCP 47 elements and IETF BCP 47 elements
aren't disabled, the IETF BCP 47 elements created will contain the
region from the legacy element. Before the change the elements
created didn't contain a country, leading to a change in semantics
as IETF BCP 47 elements take precedence over all legacy elements
when they're present.
3. Legacy country elements are now created when IETF BCP 47 elements
are present & contain a region code allowed in legacy country
elements.
Part of the fix of #3193.
When the source file doesn't start with a major sync frame (e.g. if
it's the result of splitting between major sync frames), reading only
five frames from Matroska might not yield a major sync frame. Several
sample files I have contain more than a hundred regular frames between
sync frames.
When using language tags for selecting which tracks to keep or
discard, mkvmerge was so far comparing the given language tag with the
ones in the file (after normalizing each). This meant that in order to
always keep all Spanish tracks but discard others, `--stracks !es`
would not work reliably as a track in the file might be specified as
`es-ES` — and verbatim comparison simply didn't treat `es` and `es-ES`
as the same.
For users this is somewhat counterintuitive. The idea behind allowing
languages for track selection has always been to provide an easy to
remember, easy to use way to select tracks for human beings without
having to look through file identification first. Verbatim comparison
worked fine until support for IETF BCP 47 language tags came along as
until that point languages in Matroska files only ever contained a
language component but not e.g. a region or a variant.
This commit changes the selection to use a matching algorithm similar
to how IETF BCP 47 describes language tag matching. Basically it takes
a track's existing language, normalizes it & splits it into its
components. Then the same is done with all the languages mentioned
with the track selection option currently evaluated.
For each language listed in the track selection all components that
are actually set are compared with the track's language's
corresponding components. If all of them are equal, the track is
considered to be matched. Components set in the track's language but
not in the selection's language are simply ignored.
This means that specifying `--stracks !es` in the example above will
now match all tracks whose language is some kind of Spanish, no matter
if the track's language tag contains a region, variants or
whatever (e.g. it would drop tracks marked as `es`, `es-MX`,
`es-Latn-ES` etc.).