Commit Graph

14 Commits

Author SHA1 Message Date
Moritz Bunkus
e3736e9def
ISO 639: add codes from 639-5 that aren't part of 639-2 2021-07-21 22:34:26 +02:00
Moritz Bunkus
f690be057b
build system: move HTML table data extraction to separate function 2021-07-17 13:51:17 +02:00
Moritz Bunkus
20437cb0f6
build system: move file download handling to dedicated module 2021-07-17 12:26:48 +02:00
Moritz Bunkus
c649427ff2
ISO 639: re-add entries only present in 639-2 but not 639-3 2021-07-15 19:35:29 +02:00
Moritz Bunkus
50505d3d1b
ISO 639: generate language list directly from latest ISO lists 2021-07-14 22:52:56 +02:00
Moritz Bunkus
6a5b4b97db
BCP 47: add ISO 639-3 languages (only those of type "living")
Part of the implementation of #3007.
2021-02-17 22:19:10 +01:00
Moritz Bunkus
c9884c3e77
build system: don't optimize when compiling iso639_language_list.cpp
Even `-O1` causes compilation time & memory usage to skyrocket,
possibly exponentially, with the number of entries to `emplace_back()`
into the vector.

This isn't so bad with the current number of entries (489). In that
case compilation with `-O3` only takes 7.2s.

However, extending the list to cover ISO 639-3 means that the list
will include 7160 entries. With that many entries things are much,
much more severe:

• with `-O1` alone compilation takes 11m 23s already.
• with `-O3` memory usage exceeded 20 GB after six minutes when I had
  to abort due to other running applications getting killed.

Runtime cost is negligible. I ran a micro benchmark. With all 7160
entries and no optimizations (`-O0`) the initialization takes ~1.4
milliseconds for the one-time initialization on startup; with
optimizations (`-O1`) it still took ~570 microseconds.

Part of the implementation of #3007.
2021-02-17 22:18:20 +01:00
Moritz Bunkus
5276839f16
BCP 47: use emplace_back for initialization of ISO 639 language list
It's much faster than using the initializer lists. Here's the result
from a micro benchmark I ran:

2021-01-25T23:49:20+01:00
Running ./bench.g++
Run on (8 X 4500 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 8192 KiB (x1)
Load Average: 1.08, 0.72, 0.60
-------------------------------------------------------------
Benchmark                   Time             CPU   Iterations
-------------------------------------------------------------
BM_InitializerList      59667 ns        59625 ns        70526
BM_EmplaceBack          24515 ns        24497 ns       176817
BM_EmplaceBack2         16970 ns        16961 ns       247652
BM_PushBack             52831 ns        52796 ns        79202
BM_PushBack2            52858 ns        52823 ns        79004

The five benchmarks were:

• BM_InitializerList — the old way with initializer lists. Basically
  the same code currently being replaced.

• BM_EmplaceBack — Reserving space & adding each entry with
  g_languages.emplace_back(). A constructor was added to language_t
  struct taking the std::strings as const references (std::string
  const &), assigning them to the member variables normally.

• BM_EmplaceBack2 — Reserving space & adding each entry with
  g_languages.emplace_back(). A constructor was added to language_t
  struct taking the std::strings as rvalue references (std::string &&)
  assigning them to the member variables using std::move().

• BM_PushBack — Reserving space & adding each entry with
  g_languages.push_back(). A constructor was added to language_t
  struct taking the std::strings as const references (std::string
  const &), assigning them to the member variables normally.

• BM_PushBack2 — Reserving space & adding each entry with
  g_languages.push_back(). A constructor was added to language_t
  struct taking the std::strings as rvalue references (std::string &&)
  assigning them to the member variables using std::move().
2021-01-26 14:53:30 +01:00
Moritz Bunkus
68a38909d4
BCP 47: ISO 639 code list: include bool to say if part of ISO 639-2
Part of #3007.
2021-01-26 14:53:30 +01:00
Moritz Bunkus
ed309582ce
BCP 47: various lists: cosmetics (remove superfluous space at end of row) 2021-01-26 14:53:30 +01:00
Moritz Bunkus
d5dbdb0a7e
replace outdated link to GPLv2 with current one 2020-08-01 18:03:54 +02:00
Moritz Bunkus
bf89f72189
ISO 639 code: move to namespace mtx::iso639
Part of the implementation of #2419.
2020-07-05 11:35:20 +02:00
Moritz Bunkus
572bf8d552
Rakefile: fix name/description of target generating ISO 639 language list 2020-07-02 19:09:47 +02:00
Moritz Bunkus
3e4f59d3ab
build system: add dev target for updating ISO 639 language list 2020-06-02 19:33:10 +02:00