Commit Graph

1488 Commits

Author SHA1 Message Date
Moritz Bunkus
6106416fa4
add test case for keeping colour properties 2022-05-15 18:12:28 +02:00
Moritz Bunkus
357ec495ed
SRT reader: skip entries where end timestamp less than or equal start timestamp
Fixes #3332.
2022-04-20 19:45:50 +02:00
Moritz Bunkus
2e710f9410
tests: intentional update due to translation updates 2022-04-09 12:17:30 +02:00
Moritz Bunkus
d8a4eef67f
add translation to Chinese Simplified (Singapore) by Dian Li 2022-04-06 21:44:54 +02:00
Moritz Bunkus
c50e582fa4
GUI: BCP47: show warning if script should be suppressed
Part of the implementation of #3307.
2022-03-29 21:15:53 +02:00
Moritz Bunkus
d04ac487d1
BCP47: add more unit tests 2022-03-29 21:15:53 +02:00
Moritz Bunkus
3048c61d92
BCP47: add function for finding the first variant not matching its prefixes
Part of the implementation of #3307.
2022-03-29 13:01:53 +02:00
Moritz Bunkus
14a0bec2cf
BCP47: normalize DCNC tags from BCP47 "private use" range to BCP47 equivalents
Replaces e.g. `QMS` with `cmn-Hans`.

Part of the implementation of #3307.
2022-03-28 18:46:09 +02:00
Moritz Bunkus
d91ee1da26
tests: use source file that's faster to process 2022-03-28 18:46:09 +02:00
Moritz Bunkus
4ce17cfddb
BCP47: default to normalize to canonical form
Normalization can be turned off via the `--normalize-language-ietf
off` command line arguments.

Part of the implementation of #3307.
2022-03-28 18:46:09 +02:00
Moritz Bunkus
d3acb1b5aa
BCP47: test: forcefully set normalization mode to use during parsing 2022-03-28 17:01:50 +02:00
Moritz Bunkus
65f599e407
mkvpropedit: add option for normalizing IETF BCP 47 language tags
Part of the implementation of #3307.
2022-03-27 18:02:56 +02:00
Moritz Bunkus
4a6c930adc
tests: refactor language comparison functions to shared functions 2022-03-27 18:02:55 +02:00
Moritz Bunkus
b843544a0c
mkvmerge: add option for normalizing IETF BCP 47 language tags
Part of the implementation of #3307.
2022-03-27 18:02:55 +02:00
Moritz Bunkus
687ace6683
BCP47: add optional normalization & global normalization to parsing
Part of the implementation of #3307.
2022-03-26 22:47:04 +01:00
Moritz Bunkus
252ad22582
tests: intentional update due to translation udpates 2022-03-26 17:27:37 +01:00
Moritz Bunkus
0eaf93d81a
BCP47: add function for calculating the extlang normalization form
Part of the implementation of #3307.
2022-03-26 16:59:10 +01:00
Moritz Bunkus
ed0f96e83a
BCP47: add function for cloning a language_c instance
Part of the implementation of #3307.
2022-03-26 16:59:10 +01:00
Moritz Bunkus
23a00653b7
BCP47: only sort extensions during canonicalization, not during parsing
Part of the implementation of #3307.
2022-03-26 16:59:10 +01:00
Moritz Bunkus
bf06ef5f2b
BCP47: apply all preferred-value replacements, not just the first matched
Part of the implementation of #3307.
2022-03-26 13:40:09 +01:00
Moritz Bunkus
92ef8628f2
BCP47: add regions present in IANA registry but not part of ISO 3166/UN M.49
Part of the implementation of #3307.
2022-03-26 13:40:09 +01:00
Moritz Bunkus
5b6aab5f32
BCP47: add function to convert to canonical form
Part of the implementation of #3307.
2022-03-26 00:22:09 +01:00
Moritz Bunkus
06dce25865
BCP 47: add support for grandfathered language tags
Part of the implementation of #3307.
2022-03-24 22:39:32 +01:00
Moritz Bunkus
a73c424e5e
BCP 47: don't enforce prefixes for variants; enforce uniqueness of variants
BCP 47's verbiage is pretty lax wrt. variants & their prefixes. It
states[1]:

> Variant subtag records in the Language Subtag Registry MAY include
> one or more 'Prefix' (Section 3.1.8) fields.  Each 'Prefix'
> indicates a suitable sequence of subtags for forming (with other
> subtags, as appropriate) a language tag when using the variant.

Therefore a hard check whether a variant is used with only the listed
prefixes is inappropriate.

Furthermore there are other semi-normative sources stating the
same. For example, the W3C[2] says:

> Check the context and ordering for variant subtags. Most variant
> subtag records in the registry have one or more Prefix fields. The
> prefixes indicate with which subtags it is usually appropriate to
> use this variant.

…

> If you have a good reason, you could use a variant subtag with
> different subtags, eg. cmn-Latn-pinyin would be a perfectly legal
> way to say Mandarin Chinese written with pinyin.

And `pinyin` lists neither `cmn` nor `cmn-Latn` as a prefix.

BCP 47 goes on to state that "Most variants that share a prefix are
mutually exclusive", but there's actually no way to identify the
variants for which this holds true automatically. Therefore this
property isn't enforced either.

Lastly BCP 47 does have one hard requirement on variants in [1]:

>  5. The same variant subtag MUST NOT be used more than once within a
>     language tag.

This is now enforced.

Part of the implementation/fix of #3307.

[1]  https://www.rfc-editor.org/rfc/rfc5646.html#section-2.2.5
[2]  https://www.w3.org/International/questions/qa-choosing-language-tags#variants
2022-03-24 21:32:24 +01:00
Moritz Bunkus
20a169e2d5
BCP 47: remove obsolete function get_iso639_2_alpha_3_code_or
Superseded by get_closest_iso639_2_alpha_3_code

Part of the implementation of #3307.
2022-03-23 23:22:39 +01:00
Moritz Bunkus
3a0af2592f
BCP 47: deriving legacy language element's code via extlang prefix as fallback
There are several languages that aren't part of ISO 639-2 but are part
of the 639-3 or 639-5. For those languages the legacy Matroska language
elements cannot be set to the ISO 639 alpha 3 code of the BCP 47
language tag.

However, there are a lot of such languages whose ISO 639 alpha 3 code
is a valid extlang subtag of a BCP 47 tag. For example: the language
"Yue Chinese" has an ISO 639 alpha 3 code of `yue` but isn't part of
ISO 639-2. However, `yue` is also a valid extlang.

As each extlang must have a prefix for which it is valid (in the case
of `yue` it's `zh`) and as that prefix must in turn be an ISO 639 code
itself, that prefix language's ISO 639-2 code is the closest
representation.

Part of the implementation of #3307.
2022-03-23 23:22:39 +01:00
Moritz Bunkus
1015808193
BCP 47: add function for getting the closest ISO 639-2 code for a tag
There are several languages that aren't part of ISO 639-2 but are part
of the 639-3 or 639-5. For those languages the legacy Matroska language
elements cannot be set to the ISO 639 alpha 3 code of the BCP 47
language tag.

However, there are a lot of such languages whose ISO 639 alpha 3 code
is a valid extlang subtag of a BCP 47 tag. For example: the language
"Yue Chinese" has an ISO 639 alpha 3 code of `yue` but isn't part of
ISO 639-2. However, `yue` is also a valid extlang.

As each extlang must have a prefix for which it is valid (in the case
of `yue` it's `zh`) and as that prefix must in turn be an ISO 639 code
itself, that prefix language's ISO 639-2 code is the closest
representation.

Part of the implementation of #3307.
2022-03-23 23:22:39 +01:00
Moritz Bunkus
1c75ad28b2
tests: intentional update due to 6487dba8dc 2022-02-20 18:30:30 +01:00
Moritz Bunkus
1d80b65a68
mkvmerge: add option for setting "track enabled" flag 2022-02-03 21:21:57 +01:00
Moritz Bunkus
98970ebfa8
MP4 reader: map "enabled" flags in tkhd atom to Matroska's "enabled" flag
Implements #3272.
2022-02-03 21:11:39 +01:00
Moritz Bunkus
447a6147bd
utf8-cpp: update to v3.2 revision 6a76caccbe0c186b00cab34df1e4281fa 2021-12-26 13:59:34 +01:00
Moritz Bunkus
20252a2f8b
VobSub reader: support id: -- lines indicating language is unknown
Fixes #3246.
2021-12-19 11:03:58 +01:00
Moritz Bunkus
d0ea7c40b6
AC-3 parser: support E-AC-3 with BSID values > 10 and ≤ 15
Implements #3211.
2021-10-10 16:45:30 +02:00
Moritz Bunkus
b4b4885df1
mkvmerge, mkvpropedit: chapters: write defaulted elements, too
Implements #3210.
2021-10-10 12:25:11 +02:00
Moritz Bunkus
575bd79673
unit tests: fix running on Windows 2021-09-06 15:02:01 +02:00
Moritz Bunkus
df8078fbfc
unit tests: avoid compiler warnings with mingw & gtest 2021-09-06 15:00:04 +02:00
Moritz Bunkus
afb7a10f34
tests: fix compilation on mingw 2021-09-05 10:30:14 +02:00
Moritz Bunkus
00c0d12d34
probing: prefer AVC & HEVC at start of file over audio detection
Even though AVC & HEVC are often mis-detected in the middle of other
container formats, it is pretty unambiguous if the file starts with
the typical NALU marker. So try to detect AVC & HEVC before trying
audio types if the file starts with a NALU marker as audio types are
often mis-detected as well.

Fixes #3201.
2021-09-04 15:02:09 +02:00
Moritz Bunkus
83fee3c98a
memory_c: add operator[] 2021-09-04 14:41:28 +02:00
Moritz Bunkus
fe7521f507
HEVC ES: fix only marking SLNR pictures as discardable
The prior commit didn't take `max_sub_layers_minus1` into account.

Second part of the fix of #3192.
2021-09-03 17:09:50 +02:00
Moritz Bunkus
97a1ada78c
HEVC ES: only mark sub-layer non-reference pictures as B frames/discardable
Fixes #3192.
2021-09-03 17:00:04 +02:00
Moritz Bunkus
88b97761bf
all: chapters: normalize legacy & IETF BCP 47 language/country elements
With this change both legacy language/country elements and IETF BCP 47
language tags will be normalized when chapters are read or
written. This fixes a couple of corner cases in all programs dealing
with chapters:

1. IETF BCP 47 elements will now always be created before writing
   chapters unless IETF BCP 47 elements are disabled. This wasn't
   always the case when chapters were read from Matroska files.

2. When a chapter display element contains legacy language & country
   elements but no IETF BCP 47 elements and IETF BCP 47 elements
   aren't disabled, the IETF BCP 47 elements created will contain the
   region from the legacy element. Before the change the elements
   created didn't contain a country, leading to a change in semantics
   as IETF BCP 47 elements take precedence over all legacy elements
   when they're present.

3. Legacy country elements are now created when IETF BCP 47 elements
   are present & contain a region code allowed in legacy country
   elements.

Part of the fix of #3193.
2021-09-01 22:24:44 +02:00
Moritz Bunkus
9a7a76b565
BCP 47: helper for getting ISO 3166-1 alpha 2/top-level country domain codes 2021-09-01 22:20:42 +02:00
Moritz Bunkus
2523a44459
tests: add one more test for negative track selection by language 2021-08-29 21:52:56 +02:00
Moritz Bunkus
0394a674bd
track selection: use language tag matching instead of verbatim equality
When using language tags for selecting which tracks to keep or
discard, mkvmerge was so far comparing the given language tag with the
ones in the file (after normalizing each). This meant that in order to
always keep all Spanish tracks but discard others, `--stracks !es`
would not work reliably as a track in the file might be specified as
`es-ES` — and verbatim comparison simply didn't treat `es` and `es-ES`
as the same.

For users this is somewhat counterintuitive. The idea behind allowing
languages for track selection has always been to provide an easy to
remember, easy to use way to select tracks for human beings without
having to look through file identification first. Verbatim comparison
worked fine until support for IETF BCP 47 language tags came along as
until that point languages in Matroska files only ever contained a
language component but not e.g. a region or a variant.

This commit changes the selection to use a matching algorithm similar
to how IETF BCP 47 describes language tag matching. Basically it takes
a track's existing language, normalizes it & splits it into its
components. Then the same is done with all the languages mentioned
with the track selection option currently evaluated.

For each language listed in the track selection all components that
are actually set are compared with the track's language's
corresponding components. If all of them are equal, the track is
considered to be matched. Components set in the track's language but
not in the selection's language are simply ignored.

This means that specifying `--stracks !es` in the example above will
now match all tracks whose language is some kind of Spanish, no matter
if the track's language tag contains a region, variants or
whatever (e.g. it would drop tracks marked as `es`, `es-MX`,
`es-Latn-ES` etc.).
2021-08-29 21:33:13 +02:00
Moritz Bunkus
f3e8b50b04
BCP 47: functions for matching languages against others 2021-08-29 12:39:03 +02:00
Moritz Bunkus
bd81867612
BCP 47 tests: use namespace for shorter, easier-to-read lines 2021-08-29 12:39:03 +02:00
Moritz Bunkus
56e97032ab
HEVC/H.265 parser: fix P/B frame type signaling 2021-08-16 21:48:26 +02:00
Moritz Bunkus
cf1529307c
kax_info_c: output frame summary at end of block group
The frame summary requires the number of references to be known in
order to able to determine the frame type. That number is only known
once the whole block group has been parsed as the the reference block
elements are usually located behind the block elements.
2021-08-16 21:46:28 +02:00
Moritz Bunkus
648900e9dd
tests: allow specifying test case to run via file name 2021-08-16 20:29:41 +02:00