source/data/unidata/changes.txt - Issue 2440913002: Update ICU to 58.1

Unified Diff: source/data/unidata/changes.txt

Issue 2440913002: Update ICU to 58.1

Patch Set: Created 4 years, 2 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Index: source/data/unidata/changes.txt

diff --git a/source/data/unidata/changes.txt b/source/data/unidata/changes.txt

index 2060212970786a73d5a0ede4964126b24665fbf7..74425830ccb1e738476dc59fe7a6ac8abd9f4eea 100644

--- a/source/data/unidata/changes.txt

+++ b/source/data/unidata/changes.txt

@@ -1,4 +1,6 @@

+* License & terms of use: http://www.unicode.org/copyright.html

* file name: changes.txt

@@ -15,33 +17,445 @@

* New ISO 15924 script codes

-Starting with ICU 55, we do not add UScriptCode constants any more until their scripts

-are encoded in Unicode, or can be assumed to be encoded in the next Unicode version.

+Starting with ICU 55, we do not add UScriptCode constants for new scripts any more

+until they are encoded in Unicode,

+or can be assumed to be encoded in the next Unicode version.

Script enum constant names want to follow the Unicode script property value aliases,

which are assigned only when the scripts are encoded.

When we encode scripts early and guess wrong, then we have confusing enum constants

and have sometimes added aliases.

-Exception: Script codes like Latf and Aran that are not subject to separate encoding

+Variant script codes like Latf and Aran that are not subject to separate encoding

can be added at any time.

+(For example, Aran could be added as USCRIPT_ARABIC_NASTALIQ.)

-Script codes not yet in ICU: http://www.unicode.org/iso15924/codechanges.html

+We add script codes used in CLDR or in the spoof checker.

+This includes combination/alias codes like Hanb and Jamo.

+See http://unicode.org/reports/tr35/#unicode_script_subtag_validity

+and look for "alias" on http://unicode.org/iso15924/iso15924-codes.html

-Added 2014-11-15, see http://bugs.icu-project.org/trac/ticket/11561

-- Adlm 166 Adlam

-- Aran 161 Arabic (Nastaliq variant)

-- Kitl 505 Khitan large script

-- Kits 288 Khitan small script

-- Marc 332 Marchen

-- Osge 219 Osage

+We add special Z* script codes like Zsye.

-Aran can be added as USCRIPT_ARABIC_NASTALIQ at any time.

+For new script codes see http://www.unicode.org/iso15924/codechanges.html

-Adlam, Marchen, and Osage are expected to go into Unicode 9;

-we should assign Unicode script property value aliases for them

-soon after Unicode 8 is released, and add them in ICU 56.

+---------------------------------------------------------------------------- ***

+Unicode 9.0 update for ICU 58

+* Command-line environment setup

+ICU_ROOT=~/svn.icu/trunk

+ICU_SRC_DIR=$ICU_ROOT/src

+ICUDT=icudt58b

+export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib

+SRC_DATA_IN=$ICU_SRC_DIR/source/data/in

+UNIDATA=$ICU_SRC_DIR/source/data/unidata

+http://www.unicode.org/review/pri323/ -- beta review

+http://www.unicode.org/reports/uax-proposed-updates.html

+http://www.unicode.org/versions/beta-9.0.0.html

+http://www.unicode.org/versions/Unicode9.0.0/

+http://www.unicode.org/reports/tr44/tr44-17.html

+*** ICU Trac

+- ticket:12526: integrate Unicode 9

+- C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b

+- Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b

+*** CLDR Trac

-Khitan scripts will be encoded later.

+- cldrbug 9414: UCA 9

+- ^/branches/markus/uni90 at r11518 from trunk at r11517

+- cldrbug 8745: Unicode 9.0 script metadata

+*** Unicode version numbers

+- makedata.mak

+- uchar.h

+- com.ibm.icu.util.VersionInfo

+- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_

+- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h

+ so that the makefiles see the new version number.

+*** data files & enums & parser code

+* file preparation

+- download UCD & IDNA files

+- make sure that the Unicode data folder passed into preparseucd.py

+ includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)

+- only for manual diffs: remove version suffixes from the file names

+ ~/unidata/uni70/20140403$ ../../desuffixucd.py .

+ (see https://sites.google.com/site/unicodetools/inputdata)

+- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip

+- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src

+- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.

+- also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt

+ and copy to $UNIDATA

+ cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA

+* preparseucd.py changes

+- remove or add new Unicode scripts from/to the

+ only-in-ISO-15924 list according to the error messages:

+ ValueError: remove ['Tang'] from _scripts_only_in_iso15924

+ ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD

+ ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD

+ ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD

+ -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()

+ and in com.ibm.icu.dev.test.lang.TestUScript.java

+- DerivedNumericValues.txt new numeric values

+ 0D58 ; 0.00625 ; ; 1/160 # No MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH

+ 0D59 ; 0.025 ; ; 1/40 # No MALAYALAM FRACTION ONE FORTIETH

+ 0D5A ; 0.0375 ; ; 3/80 # No MALAYALAM FRACTION THREE EIGHTIETHS

+ 0D5B ; 0.05 ; ; 1/20 # No MALAYALAM FRACTION ONE TWENTIETH

+ 0D5D ; 0.15 ; ; 3/20 # No MALAYALAM FRACTION THREE TWENTIETHS

+ -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(),

+ uchar.c, UCharacterProperty.java

+ to support a new series of values

+- adjust preparseucd.py for Tangut algorithmic names

+ in ppucd.txt:

+ algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH-

+ ->

+ algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH-

+- avoid block-compressing most String/Miscellaneous property values,

+ triggered by genprops not coping with a multi-code point Case_Folding on

+ block;1C80..1C8F;...;Cased;cf=0442;CWCF;...

+ keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors

+* PropertyAliases.txt changes

+- 1 new property PCM=Prepended_Concatenation_Mark

+ Ignore: Only useful for layout engines.

+ Ok to list in ppucd.txt.

+* PropertyValueAliases.txt new property values

+ blk; Adlam ; Adlam

+ blk; Bhaiksuki ; Bhaiksuki

+ blk; Cyrillic_Ext_C ; Cyrillic_Extended_C

+ blk; Glagolitic_Sup ; Glagolitic_Supplement

+ blk; Ideographic_Symbols ; Ideographic_Symbols_And_Punctuation

+ blk; Marchen ; Marchen

+ blk; Mongolian_Sup ; Mongolian_Supplement

+ blk; Newa ; Newa

+ blk; Osage ; Osage

+ blk; Tangut ; Tangut

+ blk; Tangut_Components ; Tangut_Components

+ -> add to uchar.h

+ use long property names for enum constants

+ -> add to UCharacter.UnicodeBlock IDs

+ Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)

+ replace public static final int \1_ID = \2; \3

+ -> add to UCharacter.UnicodeBlock objects

+ Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)

+ replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2

+ GCB; EB ; E_Base

+ GCB; EBG ; E_Base_GAZ

+ GCB; EM ; E_Modifier

+ GCB; GAZ ; Glue_After_Zwj

+ GCB; ZWJ ; ZWJ

+ -> uchar.h & UCharacter.GraphemeClusterBreak

+ jg ; African_Feh ; African_Feh

+ jg ; African_Noon ; African_Noon

+ jg ; African_Qaf ; African_Qaf

+ -> uchar.h & UCharacter.JoiningGroup

+ lb ; EB ; E_Base

+ lb ; EM ; E_Modifier

+ lb ; ZWJ ; ZWJ

+ -> uchar.h & UCharacter.LineBreak

+ sc ; Adlm ; Adlam

+ sc ; Bhks ; Bhaiksuki

+ sc ; Marc ; Marchen

+ sc ; Newa ; Newa

+ sc ; Osge ; Osage

+ sc ; Tang ; Tangut

+ -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript

+ WB ; EB ; E_Base

+ WB ; EBG ; E_Base_GAZ

+ WB ; EM ; E_Modifier

+ WB ; GAZ ; Glue_After_Zwj

+ WB ; ZWJ ; ZWJ

+ -> uchar.h & UCharacter.WordBreak

+* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata

+ (not strictly necessary for NOT_ENCODED scripts)

+ ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt

+* generate normalization data files

+ cd $ICU_ROOT/dbg

+ bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource

+ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt

+ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt

+ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt

+ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt

+* build ICU (make install)

+ so that the tools build can pick up the new definitions from the installed header files.

+ $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt

+* build Unicode tools using CMake+make

+~/svn.icutools/trunk/src/unicode/c/icudefs.txt:

+ # Location (--prefix) of where ICU was installed.

+ set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)

+ # Location of the ICU source tree.

+ set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)

+ ~/svn.icutools/trunk/dbg/unicode/c$

+ cmake ../../../src/unicode/c

+ make

+* generate core properties data files

+ ~/svn.icutools/trunk/dbg/unicode/c$

+ genprops/genprops $ICU_SRC_DIR

+ genuca/genuca --hanOrder implicit $ICU_SRC_DIR

+ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR

+- rebuild ICU (make install) & tools

+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to

+ sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)

+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters

+- Unicode 6.0..9.0: U+2260, U+226E, U+226F

+- nothing new in 9.0, no test file to update

+* run & fix ICU4C tests

+- Andy handles RBBI & spoof check test failures

+* collation: CLDR collation root, UCA DUCET

+- UCA DUCET goes into Mark's Unicode tools, see

+ https://sites.google.com/site/unicodetools/home#TOC-UCA

+- CLDR root data files are checked into (CLDR UCA branch)/common/uca/

+ cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/

+- cd (CLDR UCA branch)/common/uca/

+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt

+ cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt

+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt

+ cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt

+ (note removing the underscore before "Rules")

+ cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt

+- restore TODO diffs in UCARules.txt

+ meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt

+- update (ICU4C)/source/test/testdata/CollationTest_*.txt

+ and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt

+ from the CLDR root files (..._CLDR_..._SHORT.txt)

+ cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt

+ cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt

+ cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data

+- if CLDR common/uca/unihan-index.txt changes, then update

+ CLDR common/collation/root.xml <collation type="private-unihan">

+ and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt

+- run genuca, see command line above;

+ deal with

+ Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt:

+ FDD1 104B5; [75 B8 02, 05, 05] # Osage first primary (compressible)

+ (add the character to genuca.cpp sampleCharsToScripts[])

+ + look up the USCRIPT_ code for the new sample characters

+ (should be obvious from the comment in the error output)

+ + *add* mappings to sampleCharsToScripts[], do not replace them

+ (in case the script sample characters flip-flop)

+ + insert new scripts in DUCET script order, see the top_byte table

+ at the beginning of FractionalUCA.txt

+- rebuild ICU4C

+* Unihan collators

+- run Unicode Tools

+ org.unicode.draft.GenerateUnihanCollators

+ with VM arguments

+ -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk

+ -DOTHER_WORKSPACE=/home/mscherer/svn.unitools

+ -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data

+ -DCLDR_DIR=/home/mscherer/svn.cldr/trunk

+ -DUVERSION=9.0.0

+ -ea

+- run Unicode Tools

+ org.unicode.draft.GenerateUnihanCollatorFiles

+ with the same arguments

+- check CLDR diffs

+ cd ~/svn.cldr/trunk

+ meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml

+ meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml

+- copy to CLDR

+ cd ~/svn.cldr/trunk

+ cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml

+ cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml

+- commit to CLDR

+- generate ICU zh collation data: run CLDR

+ org.unicode.cldr.icu.NewLdml2IcuConverter

+ with program arguments

+ -t collation

+ -s /home/mscherer/svn.cldr/trunk/common/collation

+ -m /home/mscherer/svn.cldr/trunk/common/supplemental

+ -d /home/mscherer/svn.icu/trunk/src/source/data/coll

+ -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation

+ zh

+ and VM arguments

+ -DCLDR_DIR=/home/mscherer/svn.cldr/trunk

+- rebuild ICU4C

+* run & fix ICU4C tests, now with new CLDR collation root data

+- run all tests with the collation test data *_SHORT.txt or the full files

+ (the full ones have comments, useful for debugging)

+- note on intltest: if collate/UCAConformanceTest fails, then

+ utility/MultithreadTest/TestCollators will fail as well;

+ fix the conformance test before looking into the multi-thread test

+* update Java data files

+- refresh just the UCD/UCA-related/derived files, just to be safe

+- see (ICU4C)/source/data/icu4j-readme.txt

+- mkdir /tmp/icu4j

+- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install

+ output:

+ ...

+ Unicode .icu files built to ./out/build/icudt58l

+ echo timestamp > uni-core-data

+ mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b

+ mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b

+ echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt

+ LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b

+ mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b"

+ jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/

+ mkdir -p /tmp/icu4j/main/shared/data

+ cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data

+ jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/

+ mkdir -p /tmp/icu4j/main/shared/data

+ cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data

+ make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'

+- copy the big-endian Unicode data files to another location,

+ separate from the other data files,

+ and then refresh ICU4J

+ cd ~/svn.icu/trunk/dbg/data/out/icu4j

+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll

+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr

+ cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT

+ cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT

+ rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu

+ cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT

+ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll

+ cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr

+ jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT

+* When refreshing all of ICU4J data from ICU4C

+- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install

+- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data

+or

+- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install

+* update CollationFCD.java

+ + copy & paste the initializers of lcccIndex[] etc. from

+ ICU4C/source/i18n/collationfcd.cpp to

+ ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java

+* refresh Java test .txt files

+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode

+ cd $ICU_SRC_DIR/source/data/unidata

+ cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode

+ cd ../../test/testdata

+ cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode

+ cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode

+* run & fix ICU4J tests

+*** LayoutEngine script information

+* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.

+ This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp

+ in the working directory.

+ (It also generates ScriptRunData.cpp, which is no longer needed.)

+ It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages

+ (a plain text file)

+ which maps ICU versions to the numbers of script/language constants

+ that were added then.

+ (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)

+ The generated files have a current copyright date and "@deprecated" statement.

+* Review changes, fix Java tool if necessary, and copy to ICU4C

+ cd ~/svn.icu4j/trunk/src

+ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout

+ cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout

+ cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout

+*** API additions

+- send notice to icu-design about new born-@stable API (enum constants etc.)

+*** merge the Unicode update branches back onto the trunk

+- do not merge the icudata.jar and testdata.jar,

+ instead rebuild them from merged & tested ICU4C

+- make sure that changes to Unicode tools & ICU tools are checked in

+ http://www.unicode.org/utility/trac/log/trunk/unicodetools

+ http://bugs.icu-project.org/trac/log/tools/trunk

+---------------------------------------------------------------------------- ***

+New script codes early in ICU 58: http://bugs.icu-project.org/trac/ticket/11764

+Adding

+- new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge

+- new combination/alias codes: Hanb, Jamo

+ - used in CLDR 29 and in spoof checker

+- new Z* code: Zsye

+Add new codes to uscript.h & UScript.java, see Unicode update logs.

+ -> com.ibm.icu.lang.UScript

+ find USCRIPT_([^ ]+) *= ([0-9]+),(.+)

+ replace public static final int \1 = \2; \3

+Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h,

+add new script codes.

+"Long" script names only where established in Unicode 9 PropertyValueAliases.txt.

+Note: If we have to run preparseucd.py again before the Unicode 9 update,

+then we need to manually keep/restore the new script codes.

+ICU_ROOT=~/svn.icu/trunk

+ICU_SRC_DIR=$ICU_ROOT/src

+ICUDT=icudt57b

+export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib

+SRC_DATA_IN=$ICU_SRC_DIR/source/data/in

+UNIDATA=$ICU_SRC_DIR/source/data/unidata

+Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files,

+see http://bugs.icu-project.org/trac/ticket/12141

+make install, then icutools cmake & make, then

+~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR

+Generate Java data as usual, only update pnames.icu & uprops.icu.

+*** LayoutEngine script information

+* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.

+ This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp

+ in the working directory.

+ (It also generates ScriptRunData.cpp, which is no longer needed.)

+ It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages

+ (a plain text file)

+ which maps ICU versions to the numbers of script/language constants

+ that were added then.

+ (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)

+ The generated files have a current copyright date and "@deprecated" statement.

+* Review changes, fix Java tool if necessary, and copy to ICU4C

+ cd ~/svn.icu4j/trunk/src

+ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout

+ cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout

+ cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout

---------------------------------------------------------------------------- ***

« no previous file with comments | « source/data/unidata/UnicodeData.txt ('k') | source/data/unidata/confusables.txt » ('j') | no next file with comments »