Commit Graph

21 Commits

Author SHA1 Message Date
Moritz Bunkus
2fe727f6a7
build system: implement adding new empty translation for man pages 2024-01-20 19:22:34 +01:00
Moritz Bunkus
39529c226b
languages/scripts/regions/IANA lists: use different method of initialization
The prior method was to generate one line of
`g_container.emplace_back(…)` per entry in the list & letting the
compiler chew on that. Each string argument in that call was done was
`u8"Some Name"s`, meaning as a std::string instance.

Drawbacks:

• takes the compiler ages to compile, even forcing me to drop all
  optimizations for the ISO-639 language list file

• even smaller files such as the IANA language subtag registry lists
  take more than 30s to compile

• due to no optimizations initialization is actually not as fast as
  could be

The new method uses a plain C-style array of structs with `char
const *` entries for the initial list. The initialization method then
copies the entries from that list to the actual container, again using
`std::emplace_back(…)`.

This yields sub-1s compilation times even with the longest file, the
ISO-639 language list, and the runtime initialization is actually
faster.
2022-04-23 00:00:15 +02:00
Moritz Bunkus
cc6a7b39ff
BCP47: support all language codes reserved for private/local use qaa–qtz
Part of the implementation of #3307.
2022-03-28 16:39:38 +02:00
Moritz Bunkus
6201411475
BCP47: languages: include fact whether entries are deprecated
Part of the implementation of #3307.
2022-03-26 13:40:09 +01:00
Moritz Bunkus
82658cf43d
BCP47: add languages present in IANA registry but not part of ISO 639
Part of the implementation of #3307.
2022-03-26 13:40:09 +01:00
Moritz Bunkus
200bdaec05
ISO 639: provide name overrides for ancient & modern Greek
Makes them easier to find.
2021-12-26 11:27:51 +01:00
Moritz Bunkus
1acd9a3497
BCP 47: add remaining ISO 639 languages 2021-10-10 12:33:54 +02:00
Moritz Bunkus
e3736e9def
ISO 639: add codes from 639-5 that aren't part of 639-2 2021-07-21 22:34:26 +02:00
Moritz Bunkus
f690be057b
build system: move HTML table data extraction to separate function 2021-07-17 13:51:17 +02:00
Moritz Bunkus
20437cb0f6
build system: move file download handling to dedicated module 2021-07-17 12:26:48 +02:00
Moritz Bunkus
c649427ff2
ISO 639: re-add entries only present in 639-2 but not 639-3 2021-07-15 19:35:29 +02:00
Moritz Bunkus
50505d3d1b
ISO 639: generate language list directly from latest ISO lists 2021-07-14 22:52:56 +02:00
Moritz Bunkus
6a5b4b97db
BCP 47: add ISO 639-3 languages (only those of type "living")
Part of the implementation of #3007.
2021-02-17 22:19:10 +01:00
Moritz Bunkus
c9884c3e77
build system: don't optimize when compiling iso639_language_list.cpp
Even `-O1` causes compilation time & memory usage to skyrocket,
possibly exponentially, with the number of entries to `emplace_back()`
into the vector.

This isn't so bad with the current number of entries (489). In that
case compilation with `-O3` only takes 7.2s.

However, extending the list to cover ISO 639-3 means that the list
will include 7160 entries. With that many entries things are much,
much more severe:

• with `-O1` alone compilation takes 11m 23s already.
• with `-O3` memory usage exceeded 20 GB after six minutes when I had
  to abort due to other running applications getting killed.

Runtime cost is negligible. I ran a micro benchmark. With all 7160
entries and no optimizations (`-O0`) the initialization takes ~1.4
milliseconds for the one-time initialization on startup; with
optimizations (`-O1`) it still took ~570 microseconds.

Part of the implementation of #3007.
2021-02-17 22:18:20 +01:00
Moritz Bunkus
5276839f16
BCP 47: use emplace_back for initialization of ISO 639 language list
It's much faster than using the initializer lists. Here's the result
from a micro benchmark I ran:

2021-01-25T23:49:20+01:00
Running ./bench.g++
Run on (8 X 4500 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 8192 KiB (x1)
Load Average: 1.08, 0.72, 0.60
-------------------------------------------------------------
Benchmark                   Time             CPU   Iterations
-------------------------------------------------------------
BM_InitializerList      59667 ns        59625 ns        70526
BM_EmplaceBack          24515 ns        24497 ns       176817
BM_EmplaceBack2         16970 ns        16961 ns       247652
BM_PushBack             52831 ns        52796 ns        79202
BM_PushBack2            52858 ns        52823 ns        79004

The five benchmarks were:

• BM_InitializerList — the old way with initializer lists. Basically
  the same code currently being replaced.

• BM_EmplaceBack — Reserving space & adding each entry with
  g_languages.emplace_back(). A constructor was added to language_t
  struct taking the std::strings as const references (std::string
  const &), assigning them to the member variables normally.

• BM_EmplaceBack2 — Reserving space & adding each entry with
  g_languages.emplace_back(). A constructor was added to language_t
  struct taking the std::strings as rvalue references (std::string &&)
  assigning them to the member variables using std::move().

• BM_PushBack — Reserving space & adding each entry with
  g_languages.push_back(). A constructor was added to language_t
  struct taking the std::strings as const references (std::string
  const &), assigning them to the member variables normally.

• BM_PushBack2 — Reserving space & adding each entry with
  g_languages.push_back(). A constructor was added to language_t
  struct taking the std::strings as rvalue references (std::string &&)
  assigning them to the member variables using std::move().
2021-01-26 14:53:30 +01:00
Moritz Bunkus
68a38909d4
BCP 47: ISO 639 code list: include bool to say if part of ISO 639-2
Part of #3007.
2021-01-26 14:53:30 +01:00
Moritz Bunkus
ed309582ce
BCP 47: various lists: cosmetics (remove superfluous space at end of row) 2021-01-26 14:53:30 +01:00
Moritz Bunkus
d5dbdb0a7e
replace outdated link to GPLv2 with current one 2020-08-01 18:03:54 +02:00
Moritz Bunkus
bf89f72189
ISO 639 code: move to namespace mtx::iso639
Part of the implementation of #2419.
2020-07-05 11:35:20 +02:00
Moritz Bunkus
572bf8d552
Rakefile: fix name/description of target generating ISO 639 language list 2020-07-02 19:09:47 +02:00
Moritz Bunkus
3e4f59d3ab
build system: add dev target for updating ISO 639 language list 2020-06-02 19:33:10 +02:00