Index: source/data/unidata/changes.txt |
diff --git a/source/data/unidata/changes.txt b/source/data/unidata/changes.txt |
index 2060212970786a73d5a0ede4964126b24665fbf7..74425830ccb1e738476dc59fe7a6ac8abd9f4eea 100644 |
--- a/source/data/unidata/changes.txt |
+++ b/source/data/unidata/changes.txt |
@@ -1,4 +1,6 @@ |
-* Copyright (C) 2004-2016, International Business Machines |
+* Copyright (C) 2016 and later: Unicode, Inc. and others. |
+* License & terms of use: http://www.unicode.org/copyright.html |
+* Copyright (C) 2004-2016, International Business Machines |
* Corporation and others. All Rights Reserved. |
* |
* file name: changes.txt |
@@ -15,33 +17,445 @@ |
* New ISO 15924 script codes |
-Starting with ICU 55, we do not add UScriptCode constants any more until their scripts |
-are encoded in Unicode, or can be assumed to be encoded in the next Unicode version. |
+Starting with ICU 55, we do not add UScriptCode constants for new scripts any more |
+until they are encoded in Unicode, |
+or can be assumed to be encoded in the next Unicode version. |
Script enum constant names want to follow the Unicode script property value aliases, |
which are assigned only when the scripts are encoded. |
When we encode scripts early and guess wrong, then we have confusing enum constants |
and have sometimes added aliases. |
-Exception: Script codes like Latf and Aran that are not subject to separate encoding |
+Variant script codes like Latf and Aran that are not subject to separate encoding |
can be added at any time. |
+(For example, Aran could be added as USCRIPT_ARABIC_NASTALIQ.) |
-Script codes not yet in ICU: http://www.unicode.org/iso15924/codechanges.html |
+We add script codes used in CLDR or in the spoof checker. |
+This includes combination/alias codes like Hanb and Jamo. |
+See http://unicode.org/reports/tr35/#unicode_script_subtag_validity |
+and look for "alias" on http://unicode.org/iso15924/iso15924-codes.html |
-Added 2014-11-15, see http://bugs.icu-project.org/trac/ticket/11561 |
-- Adlm 166 Adlam |
-- Aran 161 Arabic (Nastaliq variant) |
-- Kitl 505 Khitan large script |
-- Kits 288 Khitan small script |
-- Marc 332 Marchen |
-- Osge 219 Osage |
+We add special Z* script codes like Zsye. |
-Aran can be added as USCRIPT_ARABIC_NASTALIQ at any time. |
+For new script codes see http://www.unicode.org/iso15924/codechanges.html |
-Adlam, Marchen, and Osage are expected to go into Unicode 9; |
-we should assign Unicode script property value aliases for them |
-soon after Unicode 8 is released, and add them in ICU 56. |
+---------------------------------------------------------------------------- *** |
+ |
+Unicode 9.0 update for ICU 58 |
+ |
+* Command-line environment setup |
+ |
+ICU_ROOT=~/svn.icu/trunk |
+ICU_SRC_DIR=$ICU_ROOT/src |
+ICUDT=icudt58b |
+export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib |
+SRC_DATA_IN=$ICU_SRC_DIR/source/data/in |
+UNIDATA=$ICU_SRC_DIR/source/data/unidata |
+ |
+http://www.unicode.org/review/pri323/ -- beta review |
+http://www.unicode.org/reports/uax-proposed-updates.html |
+http://www.unicode.org/versions/beta-9.0.0.html |
+http://www.unicode.org/versions/Unicode9.0.0/ |
+http://www.unicode.org/reports/tr44/tr44-17.html |
+ |
+*** ICU Trac |
+ |
+- ticket:12526: integrate Unicode 9 |
+- C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b |
+- Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b |
+ |
+*** CLDR Trac |
-Khitan scripts will be encoded later. |
+- cldrbug 9414: UCA 9 |
+- ^/branches/markus/uni90 at r11518 from trunk at r11517 |
+ |
+- cldrbug 8745: Unicode 9.0 script metadata |
+ |
+*** Unicode version numbers |
+- makedata.mak |
+- uchar.h |
+- com.ibm.icu.util.VersionInfo |
+- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_ |
+ |
+- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h |
+ so that the makefiles see the new version number. |
+ |
+*** data files & enums & parser code |
+ |
+* file preparation |
+ |
+- download UCD & IDNA files |
+- make sure that the Unicode data folder passed into preparseucd.py |
+ includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder) |
+- only for manual diffs: remove version suffixes from the file names |
+ ~/unidata/uni70/20140403$ ../../desuffixucd.py . |
+ (see https://sites.google.com/site/unicodetools/inputdata) |
+- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip |
+- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src |
+- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. |
+ |
+- also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt |
+ and copy to $UNIDATA |
+ cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA |
+ |
+* preparseucd.py changes |
+- remove or add new Unicode scripts from/to the |
+ only-in-ISO-15924 list according to the error messages: |
+ ValueError: remove ['Tang'] from _scripts_only_in_iso15924 |
+ ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD |
+ ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD |
+ ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD |
+ -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() |
+ and in com.ibm.icu.dev.test.lang.TestUScript.java |
+- DerivedNumericValues.txt new numeric values |
+ 0D58 ; 0.00625 ; ; 1/160 # No MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH |
+ 0D59 ; 0.025 ; ; 1/40 # No MALAYALAM FRACTION ONE FORTIETH |
+ 0D5A ; 0.0375 ; ; 3/80 # No MALAYALAM FRACTION THREE EIGHTIETHS |
+ 0D5B ; 0.05 ; ; 1/20 # No MALAYALAM FRACTION ONE TWENTIETH |
+ 0D5D ; 0.15 ; ; 3/20 # No MALAYALAM FRACTION THREE TWENTIETHS |
+ -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(), |
+ uchar.c, UCharacterProperty.java |
+ to support a new series of values |
+- adjust preparseucd.py for Tangut algorithmic names |
+ in ppucd.txt: |
+ algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH- |
+ -> |
+ algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH- |
+- avoid block-compressing most String/Miscellaneous property values, |
+ triggered by genprops not coping with a multi-code point Case_Folding on |
+ block;1C80..1C8F;...;Cased;cf=0442;CWCF;... |
+ keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors |
+ |
+* PropertyAliases.txt changes |
+- 1 new property PCM=Prepended_Concatenation_Mark |
+ Ignore: Only useful for layout engines. |
+ Ok to list in ppucd.txt. |
+ |
+* PropertyValueAliases.txt new property values |
+ blk; Adlam ; Adlam |
+ blk; Bhaiksuki ; Bhaiksuki |
+ blk; Cyrillic_Ext_C ; Cyrillic_Extended_C |
+ blk; Glagolitic_Sup ; Glagolitic_Supplement |
+ blk; Ideographic_Symbols ; Ideographic_Symbols_And_Punctuation |
+ blk; Marchen ; Marchen |
+ blk; Mongolian_Sup ; Mongolian_Supplement |
+ blk; Newa ; Newa |
+ blk; Osage ; Osage |
+ blk; Tangut ; Tangut |
+ blk; Tangut_Components ; Tangut_Components |
+ -> add to uchar.h |
+ use long property names for enum constants |
+ -> add to UCharacter.UnicodeBlock IDs |
+ Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+) |
+ replace public static final int \1_ID = \2; \3 |
+ -> add to UCharacter.UnicodeBlock objects |
+ Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) |
+ replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 |
+ |
+ GCB; EB ; E_Base |
+ GCB; EBG ; E_Base_GAZ |
+ GCB; EM ; E_Modifier |
+ GCB; GAZ ; Glue_After_Zwj |
+ GCB; ZWJ ; ZWJ |
+ -> uchar.h & UCharacter.GraphemeClusterBreak |
+ |
+ jg ; African_Feh ; African_Feh |
+ jg ; African_Noon ; African_Noon |
+ jg ; African_Qaf ; African_Qaf |
+ -> uchar.h & UCharacter.JoiningGroup |
+ |
+ lb ; EB ; E_Base |
+ lb ; EM ; E_Modifier |
+ lb ; ZWJ ; ZWJ |
+ -> uchar.h & UCharacter.LineBreak |
+ |
+ sc ; Adlm ; Adlam |
+ sc ; Bhks ; Bhaiksuki |
+ sc ; Marc ; Marchen |
+ sc ; Newa ; Newa |
+ sc ; Osge ; Osage |
+ sc ; Tang ; Tangut |
+ -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript |
+ |
+ WB ; EB ; E_Base |
+ WB ; EBG ; E_Base_GAZ |
+ WB ; EM ; E_Modifier |
+ WB ; GAZ ; Glue_After_Zwj |
+ WB ; ZWJ ; ZWJ |
+ -> uchar.h & UCharacter.WordBreak |
+ |
+* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata |
+ (not strictly necessary for NOT_ENCODED scripts) |
+ ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt |
+ |
+* generate normalization data files |
+ cd $ICU_ROOT/dbg |
+ bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource |
+ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt |
+ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt |
+ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt |
+ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt |
+ |
+* build ICU (make install) |
+ so that the tools build can pick up the new definitions from the installed header files. |
+ |
+ $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt |
+ |
+* build Unicode tools using CMake+make |
+ |
+~/svn.icutools/trunk/src/unicode/c/icudefs.txt: |
+ |
+ # Location (--prefix) of where ICU was installed. |
+ set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst) |
+ # Location of the ICU source tree. |
+ set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src) |
+ |
+ ~/svn.icutools/trunk/dbg/unicode/c$ |
+ cmake ../../../src/unicode/c |
+ make |
+ |
+* generate core properties data files |
+ ~/svn.icutools/trunk/dbg/unicode/c$ |
+ genprops/genprops $ICU_SRC_DIR |
+ genuca/genuca --hanOrder implicit $ICU_SRC_DIR |
+ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR |
+- rebuild ICU (make install) & tools |
+ |
+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to |
+ sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) |
+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters |
+- Unicode 6.0..9.0: U+2260, U+226E, U+226F |
+- nothing new in 9.0, no test file to update |
+ |
+* run & fix ICU4C tests |
+- Andy handles RBBI & spoof check test failures |
+ |
+* collation: CLDR collation root, UCA DUCET |
+ |
+- UCA DUCET goes into Mark's Unicode tools, see |
+ https://sites.google.com/site/unicodetools/home#TOC-UCA |
+- CLDR root data files are checked into (CLDR UCA branch)/common/uca/ |
+ cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/ |
+ |
+- cd (CLDR UCA branch)/common/uca/ |
+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt |
+ cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt |
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt |
+ cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt |
+ (note removing the underscore before "Rules") |
+ cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt |
+- restore TODO diffs in UCARules.txt |
+ meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt |
+- update (ICU4C)/source/test/testdata/CollationTest_*.txt |
+ and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt |
+ from the CLDR root files (..._CLDR_..._SHORT.txt) |
+ cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt |
+ cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt |
+ cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data |
+- if CLDR common/uca/unihan-index.txt changes, then update |
+ CLDR common/collation/root.xml <collation type="private-unihan"> |
+ and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt |
+ |
+- run genuca, see command line above; |
+ deal with |
+ Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt: |
+ FDD1 104B5; [75 B8 02, 05, 05] # Osage first primary (compressible) |
+ (add the character to genuca.cpp sampleCharsToScripts[]) |
+ + look up the USCRIPT_ code for the new sample characters |
+ (should be obvious from the comment in the error output) |
+ + *add* mappings to sampleCharsToScripts[], do not replace them |
+ (in case the script sample characters flip-flop) |
+ + insert new scripts in DUCET script order, see the top_byte table |
+ at the beginning of FractionalUCA.txt |
+- rebuild ICU4C |
+ |
+* Unihan collators |
+- run Unicode Tools |
+ org.unicode.draft.GenerateUnihanCollators |
+ with VM arguments |
+ -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk |
+ -DOTHER_WORKSPACE=/home/mscherer/svn.unitools |
+ -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data |
+ -DCLDR_DIR=/home/mscherer/svn.cldr/trunk |
+ -DUVERSION=9.0.0 |
+ -ea |
+- run Unicode Tools |
+ org.unicode.draft.GenerateUnihanCollatorFiles |
+ with the same arguments |
+- check CLDR diffs |
+ cd ~/svn.cldr/trunk |
+ meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml |
+ meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml |
+- copy to CLDR |
+ cd ~/svn.cldr/trunk |
+ cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml |
+ cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml |
+- commit to CLDR |
+- generate ICU zh collation data: run CLDR |
+ org.unicode.cldr.icu.NewLdml2IcuConverter |
+ with program arguments |
+ -t collation |
+ -s /home/mscherer/svn.cldr/trunk/common/collation |
+ -m /home/mscherer/svn.cldr/trunk/common/supplemental |
+ -d /home/mscherer/svn.icu/trunk/src/source/data/coll |
+ -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation |
+ zh |
+ and VM arguments |
+ -DCLDR_DIR=/home/mscherer/svn.cldr/trunk |
+- rebuild ICU4C |
+ |
+* run & fix ICU4C tests, now with new CLDR collation root data |
+- run all tests with the collation test data *_SHORT.txt or the full files |
+ (the full ones have comments, useful for debugging) |
+- note on intltest: if collate/UCAConformanceTest fails, then |
+ utility/MultithreadTest/TestCollators will fail as well; |
+ fix the conformance test before looking into the multi-thread test |
+ |
+* update Java data files |
+- refresh just the UCD/UCA-related/derived files, just to be safe |
+- see (ICU4C)/source/data/icu4j-readme.txt |
+- mkdir /tmp/icu4j |
+- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
+ output: |
+ ... |
+ Unicode .icu files built to ./out/build/icudt58l |
+ echo timestamp > uni-core-data |
+ mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b |
+ mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b |
+ echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt |
+ LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b |
+ mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b" |
+ jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/ |
+ mkdir -p /tmp/icu4j/main/shared/data |
+ cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data |
+ jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/ |
+ mkdir -p /tmp/icu4j/main/shared/data |
+ cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data |
+ make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data' |
+- copy the big-endian Unicode data files to another location, |
+ separate from the other data files, |
+ and then refresh ICU4J |
+ cd ~/svn.icu/trunk/dbg/data/out/icu4j |
+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll |
+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr |
+ cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT |
+ cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT |
+ rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu |
+ cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT |
+ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll |
+ cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr |
+ jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT |
+ |
+* When refreshing all of ICU4J data from ICU4C |
+- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
+- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data |
+or |
+- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install |
+ |
+* update CollationFCD.java |
+ + copy & paste the initializers of lcccIndex[] etc. from |
+ ICU4C/source/i18n/collationfcd.cpp to |
+ ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java |
+ |
+* refresh Java test .txt files |
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode |
+ cd $ICU_SRC_DIR/source/data/unidata |
+ cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode |
+ cd ../../test/testdata |
+ cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode |
+ cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode |
+ |
+* run & fix ICU4J tests |
+ |
+*** LayoutEngine script information |
+ |
+* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder. |
+ This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp |
+ in the working directory. |
+ |
+ (It also generates ScriptRunData.cpp, which is no longer needed.) |
+ |
+ It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages |
+ (a plain text file) |
+ which maps ICU versions to the numbers of script/language constants |
+ that were added then. |
+ (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.) |
+ |
+ The generated files have a current copyright date and "@deprecated" statement. |
+ |
+* Review changes, fix Java tool if necessary, and copy to ICU4C |
+ cd ~/svn.icu4j/trunk/src |
+ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout |
+ cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout |
+ cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout |
+ |
+*** API additions |
+- send notice to icu-design about new born-@stable API (enum constants etc.) |
+ |
+*** merge the Unicode update branches back onto the trunk |
+- do not merge the icudata.jar and testdata.jar, |
+ instead rebuild them from merged & tested ICU4C |
+- make sure that changes to Unicode tools & ICU tools are checked in |
+ http://www.unicode.org/utility/trac/log/trunk/unicodetools |
+ http://bugs.icu-project.org/trac/log/tools/trunk |
+ |
+---------------------------------------------------------------------------- *** |
+ |
+New script codes early in ICU 58: http://bugs.icu-project.org/trac/ticket/11764 |
+ |
+Adding |
+- new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge |
+- new combination/alias codes: Hanb, Jamo |
+ - used in CLDR 29 and in spoof checker |
+- new Z* code: Zsye |
+ |
+Add new codes to uscript.h & UScript.java, see Unicode update logs. |
+ -> com.ibm.icu.lang.UScript |
+ find USCRIPT_([^ ]+) *= ([0-9]+),(.+) |
+ replace public static final int \1 = \2; \3 |
+ |
+Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h, |
+add new script codes. |
+"Long" script names only where established in Unicode 9 PropertyValueAliases.txt. |
+ |
+Note: If we have to run preparseucd.py again before the Unicode 9 update, |
+then we need to manually keep/restore the new script codes. |
+ |
+ICU_ROOT=~/svn.icu/trunk |
+ICU_SRC_DIR=$ICU_ROOT/src |
+ICUDT=icudt57b |
+export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib |
+SRC_DATA_IN=$ICU_SRC_DIR/source/data/in |
+UNIDATA=$ICU_SRC_DIR/source/data/unidata |
+ |
+Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files, |
+see http://bugs.icu-project.org/trac/ticket/12141 |
+ |
+make install, then icutools cmake & make, then |
+~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR |
+ |
+Generate Java data as usual, only update pnames.icu & uprops.icu. |
+ |
+*** LayoutEngine script information |
+ |
+* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder. |
+ This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp |
+ in the working directory. |
+ |
+ (It also generates ScriptRunData.cpp, which is no longer needed.) |
+ |
+ It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages |
+ (a plain text file) |
+ which maps ICU versions to the numbers of script/language constants |
+ that were added then. |
+ (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.) |
+ |
+ The generated files have a current copyright date and "@deprecated" statement. |
+ |
+* Review changes, fix Java tool if necessary, and copy to ICU4C |
+ cd ~/svn.icu4j/trunk/src |
+ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout |
+ cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout |
+ cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout |
---------------------------------------------------------------------------- *** |