Commit Graph

14840 Commits

Author SHA1 Message Date
Moritz Bunkus
4ce17cfddb
BCP47: default to normalize to canonical form
Normalization can be turned off via the `--normalize-language-ietf
off` command line arguments.

Part of the implementation of #3307.
2022-03-28 18:46:09 +02:00
Moritz Bunkus
d3acb1b5aa
BCP47: test: forcefully set normalization mode to use during parsing 2022-03-28 17:01:50 +02:00
Moritz Bunkus
cc6a7b39ff
BCP47: support all language codes reserved for private/local use qaa–qtz
Part of the implementation of #3307.
2022-03-28 16:39:38 +02:00
Moritz Bunkus
7ee98e0798
translations: update list of translatable strings; update German translation 2022-03-27 19:14:10 +02:00
Dian Li
77cb5664c7
translations: update Chinese Simplified 2022-03-27 19:14:09 +02:00
Burak Yavuz
144cfb40bc
translations: update Turkish 2022-03-27 19:14:09 +02:00
Andrei Stepanov
c6bfb3c891
translations: update Russian 2022-03-27 19:14:09 +02:00
Roberto Boriotti
db4286fcad
translations: update Italian 2022-03-27 19:14:09 +02:00
Moritz Bunkus
d7bc51f35a
GUI: BCP47: make verbiage explicit (no "extlang" abbreviation) 2022-03-27 19:14:09 +02:00
Moritz Bunkus
faf86c747e
BCP47: reduce maximum number of extended language subtags from 3 to 1
This is in accordance with RFC 5646 section 2.2.2 which states:

> 4.  Although the ABNF production 'extlang' permits up to three
>     extended language tags in the language tag, extended language
>     subtags MUST NOT include another extended language subtag in
>     their 'Prefix'.  That is, the second and third extended language
>     subtag positions in a language tag are permanently reserved and
>     tags that include those subtags in that position are, and will
>     always remain, invalid.

Part of the implementation of #3307.
2022-03-27 18:58:13 +02:00
Moritz Bunkus
75c4b69160
GUI: BCP47: support for always normalizing to canonical or extlang form
Part of the implementation of #3307.
2022-03-27 18:04:30 +02:00
Moritz Bunkus
9c36e1fb07
GUI: chapters: prevent superfluous third column in name view 2022-03-27 18:02:56 +02:00
Moritz Bunkus
0bb7f11756
docs: add NEWS & man for mkvmerge's/mkvpropedit's --normalize-language-ietf options
Part of the implementation of #3307.
2022-03-27 18:02:56 +02:00
Moritz Bunkus
65f599e407
mkvpropedit: add option for normalizing IETF BCP 47 language tags
Part of the implementation of #3307.
2022-03-27 18:02:56 +02:00
Moritz Bunkus
4a6c930adc
tests: refactor language comparison functions to shared functions 2022-03-27 18:02:55 +02:00
Moritz Bunkus
23f34b59e3
CLI parser: support for parsing certain options first 2022-03-27 18:02:55 +02:00
Moritz Bunkus
b843544a0c
mkvmerge: add option for normalizing IETF BCP 47 language tags
Part of the implementation of #3307.
2022-03-27 18:02:55 +02:00
Moritz Bunkus
687ace6683
BCP47: add optional normalization & global normalization to parsing
Part of the implementation of #3307.
2022-03-26 22:47:04 +01:00
Moritz Bunkus
0356ab5e72
mkvmerge: refactor: use std::optional for CLI parser "next arg" handling 2022-03-26 20:59:00 +01:00
Moritz Bunkus
252ad22582
tests: intentional update due to translation udpates 2022-03-26 17:27:37 +01:00
Moritz Bunkus
6e39c1838a
NEWS: add entries for BCP47 changes 2022-03-26 17:26:43 +01:00
Moritz Bunkus
8a0476a58e
translations: update list of translatable strings; update German translation 2022-03-26 17:13:17 +01:00
Moritz Bunkus
3be4eaf97c
GUI: BCP47: adjust warnings verbiage 2022-03-26 17:07:11 +01:00
Dian Li
be8bbcd891
translations: update Chinese Simplified 2022-03-26 17:04:02 +01:00
Dian Li
b353821d65
man page translations: update Chinese Simplified 2022-03-26 17:03:47 +01:00
Andrei Stepanov
c46a66ad7b
man page translations: update Russian 2022-03-26 17:03:31 +01:00
Antoni Bella Pérez
3abc86404b
translations: update Catalan 2022-03-26 17:03:12 +01:00
Antoni Bella Pérez
339a9f5a09
man page translations: update Catalan 2022-03-26 17:03:06 +01:00
Moritz Bunkus
4438941991
GUI: BCP47: add warnings if canonical/extlang forms are different than input
Part of the implementation of #3307.
2022-03-26 16:59:10 +01:00
Moritz Bunkus
0eaf93d81a
BCP47: add function for calculating the extlang normalization form
Part of the implementation of #3307.
2022-03-26 16:59:10 +01:00
Moritz Bunkus
ed0f96e83a
BCP47: add function for cloning a language_c instance
Part of the implementation of #3307.
2022-03-26 16:59:10 +01:00
Moritz Bunkus
23a00653b7
BCP47: only sort extensions during canonicalization, not during parsing
Part of the implementation of #3307.
2022-03-26 16:59:10 +01:00
Moritz Bunkus
71036260ac
BCP47: invalidate internal format cache on canonicalization
Part of the implementation of #3307.
2022-03-26 16:59:10 +01:00
Moritz Bunkus
4767a4679a
GUI: BCP47: show warnings if deprecated subtags are used
Part of the implementation of #3307.
2022-03-26 16:59:10 +01:00
Moritz Bunkus
6201411475
BCP47: languages: include fact whether entries are deprecated
Part of the implementation of #3307.
2022-03-26 13:40:09 +01:00
Moritz Bunkus
82658cf43d
BCP47: add languages present in IANA registry but not part of ISO 639
Part of the implementation of #3307.
2022-03-26 13:40:09 +01:00
Moritz Bunkus
09b6ebdf9c
BCP47: scripts: include fact whether entries are deprecated
Part of the implementation of #3307.
2022-03-26 13:40:09 +01:00
Moritz Bunkus
7767846416
BCP47: extlangs, variants: include fact whether entries are deprecated
Part of the implementation of #3307.
2022-03-26 13:40:09 +01:00
Moritz Bunkus
65752aedcc
BCP47: regions: include fact whether entries are deprecated
Part of the implementation of #3307.
2022-03-26 13:40:09 +01:00
Moritz Bunkus
ac49424ca7
BCP47: add scripts present in IANA registry but not part of ISO 15924
There actually are none, but it's good to have the code to do it in
case this ever happens.

Part of the implementation of #3307.
2022-03-26 13:40:09 +01:00
Moritz Bunkus
bf06ef5f2b
BCP47: apply all preferred-value replacements, not just the first matched
Part of the implementation of #3307.
2022-03-26 13:40:09 +01:00
Moritz Bunkus
92ef8628f2
BCP47: add regions present in IANA registry but not part of ISO 3166/UN M.49
Part of the implementation of #3307.
2022-03-26 13:40:09 +01:00
Moritz Bunkus
5b6aab5f32
BCP47: add function to convert to canonical form
Part of the implementation of #3307.
2022-03-26 00:22:09 +01:00
Moritz Bunkus
37d48d5d2f
IANA registry parser: parse & format entries for mapping to preferred values
Part of the implementation of #3307.
2022-03-26 00:21:58 +01:00
Moritz Bunkus
c0e49abf8e
IANA registry parser: refactor lambdas to methods 2022-03-25 23:43:34 +01:00
Moritz Bunkus
a82dfbc112
BCP 47: add grandfathered handling in several places 2022-03-25 23:43:27 +01:00
Moritz Bunkus
06dce25865
BCP 47: add support for grandfathered language tags
Part of the implementation of #3307.
2022-03-24 22:39:32 +01:00
Moritz Bunkus
419024293e
update ISO 15924 script code list 2022-03-24 21:39:43 +01:00
Moritz Bunkus
beedd3183d
build system: fix download location for ISO 15924 script code list 2022-03-24 21:39:23 +01:00
Moritz Bunkus
a73c424e5e
BCP 47: don't enforce prefixes for variants; enforce uniqueness of variants
BCP 47's verbiage is pretty lax wrt. variants & their prefixes. It
states[1]:

> Variant subtag records in the Language Subtag Registry MAY include
> one or more 'Prefix' (Section 3.1.8) fields.  Each 'Prefix'
> indicates a suitable sequence of subtags for forming (with other
> subtags, as appropriate) a language tag when using the variant.

Therefore a hard check whether a variant is used with only the listed
prefixes is inappropriate.

Furthermore there are other semi-normative sources stating the
same. For example, the W3C[2] says:

> Check the context and ordering for variant subtags. Most variant
> subtag records in the registry have one or more Prefix fields. The
> prefixes indicate with which subtags it is usually appropriate to
> use this variant.

…

> If you have a good reason, you could use a variant subtag with
> different subtags, eg. cmn-Latn-pinyin would be a perfectly legal
> way to say Mandarin Chinese written with pinyin.

And `pinyin` lists neither `cmn` nor `cmn-Latn` as a prefix.

BCP 47 goes on to state that "Most variants that share a prefix are
mutually exclusive", but there's actually no way to identify the
variants for which this holds true automatically. Therefore this
property isn't enforced either.

Lastly BCP 47 does have one hard requirement on variants in [1]:

>  5. The same variant subtag MUST NOT be used more than once within a
>     language tag.

This is now enforced.

Part of the implementation/fix of #3307.

[1]  https://www.rfc-editor.org/rfc/rfc5646.html#section-2.2.5
[2]  https://www.w3.org/International/questions/qa-choosing-language-tags#variants
2022-03-24 21:32:24 +01:00