source/data/translit/Hira_Kana.txt - Issue 2440913002: Update ICU to 58.1

Unified Diff: source/data/translit/Hira_Kana.txt

Issue 2440913002: Update ICU to 58.1

Patch Set: Created 4 years, 2 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Index: source/data/translit/Hira_Kana.txt

diff --git a/source/data/translit/Hira_Kana.txt b/source/data/translit/Hira_Kana.txt

new file mode 100644

index 0000000000000000000000000000000000000000..2b871e2531ad92c40ccac3fb49666c1b1c1813d8

--- /dev/null

+++ b/source/data/translit/Hira_Kana.txt

@@ -0,0 +1,185 @@

+# License & terms of use: http://www.unicode.org/copyright.html#License

+# File: Hira_Kana.txt

+# Generated from CLDR

+# note: a global filter is more efficient, but MUST include all source chars

+:: [\u0000-\u007E 、。 \u3099-゜ァ-ー｡-ﾟー[:Hiragana:] [:Katakana:] [:nonspacing mark:]] ;

+:: NFKC ();

+# Hiragana-Katakana

+# This is largely a one-to-one mapping, but it has a

+# few kinks:

+# 1. The Katakana va/vi/ve/vo (30F7-30FA) have no

+# Hiragana equivalents. We use Hiragana wa/wi/we/wo

+# (308F-3092) with a voicing mark (3099), which is

+# semantically equivalent. However, this is a non-

+# roundtripping transformation.

+# 2. The Katakana small ka/ke (30F5,30F6) have no

+# Hiragana equiavlents. We convert them to normal

+# Hiragana ka/ke (304B,3051). This is a one-way

+# information-losing transformation and precludes

+# round-tripping of 30F5 and 30F6.

+# 3. The combining marks 3099-309C are in the Hiragana

+# block, but they apply to Katakana as well, so we

+# leave them untouched.

+# 4. The Katakana prolonged sound mark 30FC doubles the

+# preceding vowel. This is a one-way information-

+# losing transformation from Katakana to Hiragana.

+# 5. The Katakana middle dot separates words in foreign

+# expressions; we leave this unmodified.

+# The above points preclude successful round-trip

+# transformations of arbitrary input text. However,

+# they provide naturalistic results that should conform

+# to user expectations.

+# Combining equivalents va/vi/ve/vo

+わ\u3099 ↔ ヷ;

+ゐ\u3099 ↔ ヸ;

+ゑ\u3099 ↔ ヹ;

+を\u3099 ↔ ヺ;

+# One-to-one mappings, main block

+# 3041:3094 ↔ 30A1:30F4

+# 309D,E ↔ 30FD,E

+ぁ ↔ ァ;

+あ ↔ ア;

+ぃ ↔ ィ;

+い ↔ イ;

+ぅ ↔ ゥ;

+う ↔ ウ;

+ぇ ↔ ェ;

+え ↔ エ;

+ぉ ↔ ォ;

+お ↔ オ;

+か ↔ カ;

+が ↔ ガ;

+き ↔ キ;

+ぎ ↔ ギ;

+く ↔ ク;

+ぐ ↔ グ;

+け ↔ ケ;

+げ ↔ ゲ;

+こ ↔ コ;

+ご ↔ ゴ;

+さ ↔ サ;

+ざ ↔ ザ;

+し ↔ シ;

+じ ↔ ジ;

+す ↔ ス;

+ず ↔ ズ;

+せ ↔ セ;

+ぜ ↔ ゼ;

+そ ↔ ソ;

+ぞ ↔ ゾ;

+た ↔ タ;

+だ ↔ ダ;

+ち ↔ チ;

+ぢ ↔ ヂ;

+っ ↔ ッ;

+つ ↔ ツ;

+づ ↔ ヅ;

+て ↔ テ;

+で ↔ デ;

+と ↔ ト;

+ど ↔ ド;

+な ↔ ナ;

+に ↔ ニ;

+ぬ ↔ ヌ;

+ね ↔ ネ;

+の ↔ ノ;

+は ↔ ハ;

+ば ↔ バ;

+ぱ ↔ パ;

+ひ ↔ ヒ;

+び ↔ ビ;

+ぴ ↔ ピ;

+ふ ↔ フ;

+ぶ ↔ ブ;

+ぷ ↔ プ;

+へ ↔ ヘ;

+べ ↔ ベ;

+ぺ ↔ ペ;

+ほ ↔ ホ;

+ぼ ↔ ボ;

+ぽ ↔ ポ;

+ま ↔ マ;

+み ↔ ミ;

+む ↔ ム;

+め ↔ メ;

+も ↔ モ;

+ゃ ↔ ャ;

+や ↔ ヤ;

+ゅ ↔ ュ;

+ゆ ↔ ユ;

+ょ ↔ ョ;

+よ ↔ ヨ;

+ら ↔ ラ;

+り ↔ リ;

+る ↔ ル;

+れ ↔ レ;

+ろ ↔ ロ;

+ゎ ↔ ヮ;

+わ ↔ ワ;

+ゐ ↔ ヰ;

+ゑ ↔ ヱ;

+を ↔ ヲ;

+ん ↔ ン;

+ゔ ↔ ヴ;

+ゝ ↔ ヽ;

+ゞ ↔ ヾ;

+# One-way Katakana-Hiragana xform of small K ka/ke to

+# normal H ka/ke.

+か ← ヵ;

+け ← ヶ;

+# Katakana followed by a prolonged sound mark 30FC has

+# its final vowel doubled. This is a Katakana-Hiragana

+# one-way information-losing transformation. We

+# include the small Katakana (e.g., small A 3041) and

+# do not distinguish them from their large

+# counterparts. It doesn't make sense to double a

+# small counterpart vowel as a small Hiragana vowel, so

+# we don't do so. In natural text this should never

+# occur anyway. If a 30FC is seen without a preceding

+# vowel sound (e.g., after n 30F3) we do not change it.

+### $long = ー;

+# The following categories are Hiragana, not Katakana

+# as might be expected, since by the time we get to the

+# 30FC, the preceding character will have already been

+# transformed to Hiragana.

+# {The following mechanically generated from the

+# Unicode 3.0 data:}

+$xa = [ \

+ぁあかがさざ \

+ただなはばぱ \

+まゃやらゎわ \

+];

+$xi = [ \

+ぃいきぎしじ \

+ちぢにひびぴ \

+みりゐ \

+];

+$xu = [ \

+ぅうくぐすず \

+っつづぬふぶ \

+ぷむゅゆるゔ \

+];

+$xe = [ \

+ぇえけげせぜ \

+てでねへべぺ \

+めれゑ \

+];

+$xo = [ \

+ぉおこごそぞ \

+とどのほぼぽ \

+もょよろを \

+];

+あ ← $xa {ー};

+い ← $xi {ー};

+う ← $xu {ー};

+え ← $xe {ー};

+お ← $xo {ー};

+:: (NFKC) ;

+# note: a global filter is more efficient, but MUST include all source chars!!

+:: ([\u0000-\u007E 、。 \u3099-゜ァ-ー｡-ﾟー[:Hiragana:] [:Katakana:] [:nonspacing mark:]]);

+# eof

« no previous file with comments | « source/data/translit/Hebrew_Latin_BGN.txt ('k') | source/data/translit/Hira_Latn.txt » ('j') | no next file with comments »