mkvtoolnix/rake.d/iso639.rb

88 lines
2.4 KiB
Ruby
Raw Normal View History

def create_iso639_language_list_file
cpp_file_name = "src/common/iso639_language_list.cpp"
iso639_2 = JSON.parse(IO.readlines("/usr/share/iso-codes/json/iso_639-2.json").join('')) \
["639-2"].
reject { |entry| %r{^qaa}.match(entry["alpha_3"]) }.
map do |entry|
entry["has_639_2"] = true
entry["alpha_3_to_use"] = entry["bibliographic"] || entry["alpha_3"]
entry
end
used_codes = Hash[ *iso639_2.map { |entry| [ entry["alpha_3"], true, entry["bibliographic"], true ] }.flatten ]
JSON.parse(IO.readlines("/usr/share/iso-codes/json/iso_639-3.json").join('')) \
["639-3"].
reject { |entry| entry["type"] != "L" }.
reject { |entry| used_codes.include?(entry["alpha_3"]) }.
each do |entry|
iso639_2 << {
"name" => entry["name"],
"alpha_3" => entry["alpha_3"],
"alpha_3_to_use" => entry["alpha_3"],
"has_639_2" => false,
}
end
rows = iso639_2.
map do |entry|
[ entry["name"].to_u8_cpp_string,
entry["alpha_3_to_use"].to_cpp_string,
(entry["alpha_2"] || '').to_cpp_string,
entry["bibliographic"] ? entry["alpha_3"].to_cpp_string : '""s',
entry["has_639_2"].to_s,
]
end
rows += ("a".."d").map do |letter|
BCP 47: use emplace_back for initialization of ISO 639 language list It's much faster than using the initializer lists. Here's the result from a micro benchmark I ran: 2021-01-25T23:49:20+01:00 Running ./bench.g++ Run on (8 X 4500 MHz CPU s) CPU Caches: L1 Data 32 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 256 KiB (x4) L3 Unified 8192 KiB (x1) Load Average: 1.08, 0.72, 0.60 ------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------- BM_InitializerList 59667 ns 59625 ns 70526 BM_EmplaceBack 24515 ns 24497 ns 176817 BM_EmplaceBack2 16970 ns 16961 ns 247652 BM_PushBack 52831 ns 52796 ns 79202 BM_PushBack2 52858 ns 52823 ns 79004 The five benchmarks were: • BM_InitializerList — the old way with initializer lists. Basically the same code currently being replaced. • BM_EmplaceBack — Reserving space & adding each entry with g_languages.emplace_back(). A constructor was added to language_t struct taking the std::strings as const references (std::string const &), assigning them to the member variables normally. • BM_EmplaceBack2 — Reserving space & adding each entry with g_languages.emplace_back(). A constructor was added to language_t struct taking the std::strings as rvalue references (std::string &&) assigning them to the member variables using std::move(). • BM_PushBack — Reserving space & adding each entry with g_languages.push_back(). A constructor was added to language_t struct taking the std::strings as const references (std::string const &), assigning them to the member variables normally. • BM_PushBack2 — Reserving space & adding each entry with g_languages.push_back(). A constructor was added to language_t struct taking the std::strings as rvalue references (std::string &&) assigning them to the member variables using std::move().
2021-01-25 23:06:46 +00:00
[ %Q{u8"Reserved for local use: qa#{letter}"s},
%Q{u8"qa#{letter}"s},
'""s',
'""s',
'true ',
]
end
header = <<EOT
/*
mkvmerge -- utility for splicing together matroska files
from component media subtypes
Distributed under the GPL v2
see the file COPYING for details
or visit https://www.gnu.org/licenses/old-licenses/gpl-2.0.html
ISO 639 language definitions, lookup functions
Written by Moritz Bunkus <moritz@bunkus.org>.
*/
// -----------------------------------------------------------------------
// NOTE: this file is auto-generated by the "dev:iso639_list" rake target.
// -----------------------------------------------------------------------
#include "common/iso639_types.h"
using namespace std::string_literals;
namespace mtx::iso639 {
BCP 47: use emplace_back for initialization of ISO 639 language list It's much faster than using the initializer lists. Here's the result from a micro benchmark I ran: 2021-01-25T23:49:20+01:00 Running ./bench.g++ Run on (8 X 4500 MHz CPU s) CPU Caches: L1 Data 32 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 256 KiB (x4) L3 Unified 8192 KiB (x1) Load Average: 1.08, 0.72, 0.60 ------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------- BM_InitializerList 59667 ns 59625 ns 70526 BM_EmplaceBack 24515 ns 24497 ns 176817 BM_EmplaceBack2 16970 ns 16961 ns 247652 BM_PushBack 52831 ns 52796 ns 79202 BM_PushBack2 52858 ns 52823 ns 79004 The five benchmarks were: • BM_InitializerList — the old way with initializer lists. Basically the same code currently being replaced. • BM_EmplaceBack — Reserving space & adding each entry with g_languages.emplace_back(). A constructor was added to language_t struct taking the std::strings as const references (std::string const &), assigning them to the member variables normally. • BM_EmplaceBack2 — Reserving space & adding each entry with g_languages.emplace_back(). A constructor was added to language_t struct taking the std::strings as rvalue references (std::string &&) assigning them to the member variables using std::move(). • BM_PushBack — Reserving space & adding each entry with g_languages.push_back(). A constructor was added to language_t struct taking the std::strings as const references (std::string const &), assigning them to the member variables normally. • BM_PushBack2 — Reserving space & adding each entry with g_languages.push_back(). A constructor was added to language_t struct taking the std::strings as rvalue references (std::string &&) assigning them to the member variables using std::move().
2021-01-25 23:06:46 +00:00
std::vector<language_t> g_languages;
void
init() {
g_languages.reserve(#{rows.size});
EOT
footer = <<EOT
BCP 47: use emplace_back for initialization of ISO 639 language list It's much faster than using the initializer lists. Here's the result from a micro benchmark I ran: 2021-01-25T23:49:20+01:00 Running ./bench.g++ Run on (8 X 4500 MHz CPU s) CPU Caches: L1 Data 32 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 256 KiB (x4) L3 Unified 8192 KiB (x1) Load Average: 1.08, 0.72, 0.60 ------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------- BM_InitializerList 59667 ns 59625 ns 70526 BM_EmplaceBack 24515 ns 24497 ns 176817 BM_EmplaceBack2 16970 ns 16961 ns 247652 BM_PushBack 52831 ns 52796 ns 79202 BM_PushBack2 52858 ns 52823 ns 79004 The five benchmarks were: • BM_InitializerList — the old way with initializer lists. Basically the same code currently being replaced. • BM_EmplaceBack — Reserving space & adding each entry with g_languages.emplace_back(). A constructor was added to language_t struct taking the std::strings as const references (std::string const &), assigning them to the member variables normally. • BM_EmplaceBack2 — Reserving space & adding each entry with g_languages.emplace_back(). A constructor was added to language_t struct taking the std::strings as rvalue references (std::string &&) assigning them to the member variables using std::move(). • BM_PushBack — Reserving space & adding each entry with g_languages.push_back(). A constructor was added to language_t struct taking the std::strings as const references (std::string const &), assigning them to the member variables normally. • BM_PushBack2 — Reserving space & adding each entry with g_languages.push_back(). A constructor was added to language_t struct taking the std::strings as rvalue references (std::string &&) assigning them to the member variables using std::move().
2021-01-25 23:06:46 +00:00
}
} // namespace mtx::iso639
EOT
BCP 47: use emplace_back for initialization of ISO 639 language list It's much faster than using the initializer lists. Here's the result from a micro benchmark I ran: 2021-01-25T23:49:20+01:00 Running ./bench.g++ Run on (8 X 4500 MHz CPU s) CPU Caches: L1 Data 32 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 256 KiB (x4) L3 Unified 8192 KiB (x1) Load Average: 1.08, 0.72, 0.60 ------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------- BM_InitializerList 59667 ns 59625 ns 70526 BM_EmplaceBack 24515 ns 24497 ns 176817 BM_EmplaceBack2 16970 ns 16961 ns 247652 BM_PushBack 52831 ns 52796 ns 79202 BM_PushBack2 52858 ns 52823 ns 79004 The five benchmarks were: • BM_InitializerList — the old way with initializer lists. Basically the same code currently being replaced. • BM_EmplaceBack — Reserving space & adding each entry with g_languages.emplace_back(). A constructor was added to language_t struct taking the std::strings as const references (std::string const &), assigning them to the member variables normally. • BM_EmplaceBack2 — Reserving space & adding each entry with g_languages.emplace_back(). A constructor was added to language_t struct taking the std::strings as rvalue references (std::string &&) assigning them to the member variables using std::move(). • BM_PushBack — Reserving space & adding each entry with g_languages.push_back(). A constructor was added to language_t struct taking the std::strings as const references (std::string const &), assigning them to the member variables normally. • BM_PushBack2 — Reserving space & adding each entry with g_languages.push_back(). A constructor was added to language_t struct taking the std::strings as rvalue references (std::string &&) assigning them to the member variables using std::move().
2021-01-25 23:06:46 +00:00
content = header + format_table(rows.sort, :column_suffix => ',', :row_prefix => " g_languages.emplace_back(", :row_suffix => ");").join("\n") + "\n" + footer
runq("write", cpp_file_name) { IO.write("#{$source_dir}/#{cpp_file_name}", content); 0 }
end