Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(4)

Unified Diff: source/data/translit/sat_Olck_sat_FONIPA.txt

Issue 2440913002: Update ICU to 58.1
Patch Set: Created 4 years, 2 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « source/data/translit/ru_zh.txt ('k') | source/data/translit/sat_am.txt » ('j') | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: source/data/translit/sat_Olck_sat_FONIPA.txt
diff --git a/source/data/translit/sat_Olck_sat_FONIPA.txt b/source/data/translit/sat_Olck_sat_FONIPA.txt
new file mode 100644
index 0000000000000000000000000000000000000000..4a6105d0b0d86eda4f25aff43dfb52ffded6c7d1
--- /dev/null
+++ b/source/data/translit/sat_Olck_sat_FONIPA.txt
@@ -0,0 +1,180 @@
+# © 2016 and later: Unicode, Inc. and others.
+# License & terms of use: http://www.unicode.org/copyright.html#License
+#
+# File: sat_Olck_sat_FONIPA.txt
+# Generated from CLDR
+#
+
+# Santali (Ol Chiki) → Santali (International Phonetic Alphabet)
+# Output
+# ------
+# m mː n nː ɳ ɳː ɲ ɲː ŋ ŋː
+# p pʰ pʼ b bʰ t tʰ tʼ d dʰ ʈ ʈʰ ɖ ɖʰ c cʰ cʼ k kʰ kʼ ɡ ʔ
+# s sː h
+# d\u0361ʒ
+# ɽ r
+# l lː
+# w wː w\u0303 w\u0303ː
+#
+# i iː ĩ ĩː u uː ũ ũː
+# e eː ẽ ẽː ə əː ə\u0303 ə\u0303ː o oː õ õː
+# ɛ ɛː ɛ\u0303 ɛ\u0303ː ɔ ɔː ɔ\u0303 ɔ\u0303ː
+# a aː ã ãː
+# References
+# ----------
+# [1] Michael Everson: Final proposal to encode the Ol Chiki script
+# in the UCS. ISO/IEC JTC1/SC2/WG2 Working Group Document N2984R,
+# September 21, 2005. http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2984.pdf
+#
+# [2] George L. Campbell: Compendium of the World's Languages.
+# Volume 2: Ladakhi to Zuni. ISBN 0-415-20297-3. Taylor & Francis, 2000.
+# Pages 1454 to 1458.
+# Notes
+# -----
+# According to [1] (page 3), ᱽ can only follow the four ejective
+# consonants ᱵ /pʼ/, ᱡ /cʼ/, ᱫ /tʼ/, and ᱜ /kʼ/; these become
+# ᱵᱽ /b/, ᱫᱽ /d/, ᱡᱽ /d\u0361ʒ/, and ᱜᱽ /ɡ/. In online texts, however,
+# we have occasionally encountered ᱽ following non-ejective plosives,
+# for example after ᱯ /p/. These might possibly be typos. Our rules
+# try to be resilient and handle ᱯᱽ as /b/.
+#
+# According to [1] (page 2), U+1C7C PHAARKAA follows the four “glottal”
+# consonants ᱵ /pʼ/, ᱡ /cʼ/, ᱫ /tʼ/, and ᱜ /kʼ/ (these are actually
+# ejective, not glottal). In online texts, however, we have frequently
+# encountered ᱼ following non-ejective consonants.
+$inword = [[:L:][:M:]];
+# Some online texts use a decomposed form of U+1C7A MU-GAAHLAA TTUDDAG.
+ᱹᱸ → ᱺ ;
+ᱸᱹ → ᱺ ;
+::null();
+# To simplify the rules below, enforce a uniform ordering of marks.
+ᱻᱹ → ᱹᱻ ;
+ᱻᱸ → ᱸᱻ ;
+ᱻᱺ → ᱺᱻ ;
+ᱼᱹ → ᱹᱼ ;
+ᱼᱸ → ᱸᱼ ;
+ᱼᱺ → ᱺᱼ ;
+::null();
+# Some online texts use U+1C7C PHAARKAA instead of U+1C7B RELAA for indicating
+# long phonemes, presumably because the graphemes look similar in some fonts.
+# Since phaarkaa is used for voicing ejectives and plosives (which cannot
+# be lenghtened), we rewrite phaarkaa to relaa.
+[ᱚᱟᱤᱩᱮᱳᱶᱢᱝᱞᱱ] [ᱹᱸᱺ]* {ᱼ} → ᱻ ;
+::null();
+ᱚᱹᱻ → ɔː ;
+ᱚᱹ → ɔ ;
+ᱚᱸᱻ → ɔ\u0303ː ;
+ᱚᱸ → ɔ\u0303 ;
+ᱚᱺᱻ → ɔ\u0303ː ;
+ᱚᱺ → ɔ\u0303 ;
+ᱚᱻ → ɔː ;
+ᱚ → ɔ ;
+ᱛᱼ → t ;
+ᱛᱷ → tʰ ;
+ᱛᱽ → d ;
+$inword {ᱛ} → d ;
+ᱛ → t ;
+ᱜᱼ → kʼ ;
+ᱜᱷ → kʰ ;
+ᱜᱽ → ɡ ;
+$inword {ᱜ} → ɡ ;
+ᱜ → kʼ ;
+ᱝᱻ → ŋː ;
+ᱝ → ŋ ;
+ᱞᱻ → lː ;
+ᱞ → l ;
+ᱟᱹᱻ → əː ;
+ᱟᱹ → ə ;
+ᱟᱸᱻ → ãː ;
+ᱟᱸ → ã ;
+ᱟᱺᱻ → ə\u0303ː ;
+ᱟᱺ → ə\u0303 ;
+ᱟᱻ → aː ;
+ᱟ → a ;
+ᱠᱼ → k ;
+ᱠᱷ → kʰ ;
+ᱠᱽ → ɡ ;
+ᱠ → k ;
+ᱡᱼ → cʼ ;
+ᱡᱷ → cʰ ;
+ᱡᱽ → d\u0361ʒ ;
+$inword {ᱡ} → d\u0361ʒ ;
+ᱡ → cʼ ;
+ᱢᱻ → mː ;
+ᱢ → m ;
+# According to [1], ᱣ is sometimes /v/ and sometimes /w/.
+# TODO: Find out if there is a rule for this.
+ᱣᱸ → w\u0303 ;
+ᱣ → w ;
+ᱤᱹᱻ → iː ;
+ᱤᱹ → i ;
+ᱤᱸᱻ → ĩː ;
+ᱤᱸ → ĩ ;
+ᱤᱺᱻ → ĩː ;
+ᱤᱺ → ĩ ;
+ᱤᱻ → iː ;
+ᱤ → i ;
+ᱥᱻ → sː ;
+ᱥ → s ;
+# According to [1], ᱦ is sometimes /h/ and sometimes /ʔ/.
+# TODO: Find out if there is a rule for this.
+ᱦ → h ;
+ᱧᱻ → ɲː ;
+ᱧ → ɲ ;
+ᱨᱻ → r ;
+ᱨ → r ;
+ᱩᱹᱻ → uː ;
+ᱩᱹ → u ;
+ᱩᱸᱻ → ũː ;
+ᱩᱸ → ũ ;
+ᱩᱺᱻ → ũː ;
+ᱩᱺ → ũ ;
+ᱩᱻ → uː ;
+ᱩ → u ;
+ᱪᱼ → c ;
+ᱪᱷ → cʰ ;
+ᱪᱽ → d\u0361ʒ ;
+ᱪ → c ;
+ᱫᱼ → tʼ ;
+ᱫᱷ → tʰ ;
+ᱫᱽ → d ;
+$inword {ᱫ} → d ;
+ᱫ → tʼ ;
+ᱬᱻ → ɳː ;
+ᱬ → ɳ ;
+# TODO: ᱵᱷᱭᱨᱚᱵ → bʰhrɔb seems unlikely; would be good to verify.
+ᱭ → h ;
+ᱮᱹᱻ → ɛː ;
+ᱮᱹ → ɛ ;
+ᱮᱺᱻ → ɛ\u0303ː ;
+ᱮᱺ → ɛ\u0303 ;
+ᱮᱸᱻ → ẽː ;
+ᱮᱸ → ẽ ;
+ᱮᱻ → eː ;
+ᱮ → e ;
+ᱯᱼ → p ;
+ᱯᱷ → pʰ ;
+ᱯᱽ → b ;
+ᱯ → p ;
+ᱰᱷ → ɖʰ ;
+ᱰ → ɖ ;
+ᱱᱻ → nː ;
+ᱱ → n ;
+ᱲᱻ → ɽ ;
+ᱲ → ɽ ;
+ᱳᱸᱻ → õː ;
+ᱳᱸ → õ ;
+ᱳᱻ → oː ;
+ᱳ → o ;
+ᱴᱼ → ʈ ;
+ᱴᱷ → ʈʰ ;
+ᱴᱽ → ɖ ;
+ᱴ → ʈ ;
+ᱵᱼ → pʼ ;
+ᱵᱷ → bʰ ;
+ᱵᱽ → b ;
+$inword {ᱵ} → b ;
+ᱵ → pʼ ;
+ᱶᱻ → w\u0303ː ;
+ᱶ → w\u0303 ;
+
« no previous file with comments | « source/data/translit/ru_zh.txt ('k') | source/data/translit/sat_am.txt » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698