| Index: source/data/unidata/changes.txt
|
| diff --git a/source/data/unidata/changes.txt b/source/data/unidata/changes.txt
|
| index 2060212970786a73d5a0ede4964126b24665fbf7..74425830ccb1e738476dc59fe7a6ac8abd9f4eea 100644
|
| --- a/source/data/unidata/changes.txt
|
| +++ b/source/data/unidata/changes.txt
|
| @@ -1,4 +1,6 @@
|
| -* Copyright (C) 2004-2016, International Business Machines
|
| +* Copyright (C) 2016 and later: Unicode, Inc. and others.
|
| +* License & terms of use: http://www.unicode.org/copyright.html
|
| +* Copyright (C) 2004-2016, International Business Machines
|
| * Corporation and others. All Rights Reserved.
|
| *
|
| * file name: changes.txt
|
| @@ -15,33 +17,445 @@
|
|
|
| * New ISO 15924 script codes
|
|
|
| -Starting with ICU 55, we do not add UScriptCode constants any more until their scripts
|
| -are encoded in Unicode, or can be assumed to be encoded in the next Unicode version.
|
| +Starting with ICU 55, we do not add UScriptCode constants for new scripts any more
|
| +until they are encoded in Unicode,
|
| +or can be assumed to be encoded in the next Unicode version.
|
| Script enum constant names want to follow the Unicode script property value aliases,
|
| which are assigned only when the scripts are encoded.
|
| When we encode scripts early and guess wrong, then we have confusing enum constants
|
| and have sometimes added aliases.
|
|
|
| -Exception: Script codes like Latf and Aran that are not subject to separate encoding
|
| +Variant script codes like Latf and Aran that are not subject to separate encoding
|
| can be added at any time.
|
| +(For example, Aran could be added as USCRIPT_ARABIC_NASTALIQ.)
|
|
|
| -Script codes not yet in ICU: http://www.unicode.org/iso15924/codechanges.html
|
| +We add script codes used in CLDR or in the spoof checker.
|
| +This includes combination/alias codes like Hanb and Jamo.
|
| +See http://unicode.org/reports/tr35/#unicode_script_subtag_validity
|
| +and look for "alias" on http://unicode.org/iso15924/iso15924-codes.html
|
|
|
| -Added 2014-11-15, see http://bugs.icu-project.org/trac/ticket/11561
|
| -- Adlm 166 Adlam
|
| -- Aran 161 Arabic (Nastaliq variant)
|
| -- Kitl 505 Khitan large script
|
| -- Kits 288 Khitan small script
|
| -- Marc 332 Marchen
|
| -- Osge 219 Osage
|
| +We add special Z* script codes like Zsye.
|
|
|
| -Aran can be added as USCRIPT_ARABIC_NASTALIQ at any time.
|
| +For new script codes see http://www.unicode.org/iso15924/codechanges.html
|
|
|
| -Adlam, Marchen, and Osage are expected to go into Unicode 9;
|
| -we should assign Unicode script property value aliases for them
|
| -soon after Unicode 8 is released, and add them in ICU 56.
|
| +---------------------------------------------------------------------------- ***
|
| +
|
| +Unicode 9.0 update for ICU 58
|
| +
|
| +* Command-line environment setup
|
| +
|
| +ICU_ROOT=~/svn.icu/trunk
|
| +ICU_SRC_DIR=$ICU_ROOT/src
|
| +ICUDT=icudt58b
|
| +export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
|
| +SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
|
| +UNIDATA=$ICU_SRC_DIR/source/data/unidata
|
| +
|
| +http://www.unicode.org/review/pri323/ -- beta review
|
| +http://www.unicode.org/reports/uax-proposed-updates.html
|
| +http://www.unicode.org/versions/beta-9.0.0.html
|
| +http://www.unicode.org/versions/Unicode9.0.0/
|
| +http://www.unicode.org/reports/tr44/tr44-17.html
|
| +
|
| +*** ICU Trac
|
| +
|
| +- ticket:12526: integrate Unicode 9
|
| +- C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b
|
| +- Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b
|
| +
|
| +*** CLDR Trac
|
|
|
| -Khitan scripts will be encoded later.
|
| +- cldrbug 9414: UCA 9
|
| +- ^/branches/markus/uni90 at r11518 from trunk at r11517
|
| +
|
| +- cldrbug 8745: Unicode 9.0 script metadata
|
| +
|
| +*** Unicode version numbers
|
| +- makedata.mak
|
| +- uchar.h
|
| +- com.ibm.icu.util.VersionInfo
|
| +- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
|
| +
|
| +- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
|
| + so that the makefiles see the new version number.
|
| +
|
| +*** data files & enums & parser code
|
| +
|
| +* file preparation
|
| +
|
| +- download UCD & IDNA files
|
| +- make sure that the Unicode data folder passed into preparseucd.py
|
| + includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
|
| +- only for manual diffs: remove version suffixes from the file names
|
| + ~/unidata/uni70/20140403$ ../../desuffixucd.py .
|
| + (see https://sites.google.com/site/unicodetools/inputdata)
|
| +- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
|
| +- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src
|
| +- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
|
| +
|
| +- also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt
|
| + and copy to $UNIDATA
|
| + cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA
|
| +
|
| +* preparseucd.py changes
|
| +- remove or add new Unicode scripts from/to the
|
| + only-in-ISO-15924 list according to the error messages:
|
| + ValueError: remove ['Tang'] from _scripts_only_in_iso15924
|
| + ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD
|
| + ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD
|
| + ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD
|
| + -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
|
| + and in com.ibm.icu.dev.test.lang.TestUScript.java
|
| +- DerivedNumericValues.txt new numeric values
|
| + 0D58 ; 0.00625 ; ; 1/160 # No MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH
|
| + 0D59 ; 0.025 ; ; 1/40 # No MALAYALAM FRACTION ONE FORTIETH
|
| + 0D5A ; 0.0375 ; ; 3/80 # No MALAYALAM FRACTION THREE EIGHTIETHS
|
| + 0D5B ; 0.05 ; ; 1/20 # No MALAYALAM FRACTION ONE TWENTIETH
|
| + 0D5D ; 0.15 ; ; 3/20 # No MALAYALAM FRACTION THREE TWENTIETHS
|
| + -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(),
|
| + uchar.c, UCharacterProperty.java
|
| + to support a new series of values
|
| +- adjust preparseucd.py for Tangut algorithmic names
|
| + in ppucd.txt:
|
| + algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH-
|
| + ->
|
| + algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH-
|
| +- avoid block-compressing most String/Miscellaneous property values,
|
| + triggered by genprops not coping with a multi-code point Case_Folding on
|
| + block;1C80..1C8F;...;Cased;cf=0442;CWCF;...
|
| + keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors
|
| +
|
| +* PropertyAliases.txt changes
|
| +- 1 new property PCM=Prepended_Concatenation_Mark
|
| + Ignore: Only useful for layout engines.
|
| + Ok to list in ppucd.txt.
|
| +
|
| +* PropertyValueAliases.txt new property values
|
| + blk; Adlam ; Adlam
|
| + blk; Bhaiksuki ; Bhaiksuki
|
| + blk; Cyrillic_Ext_C ; Cyrillic_Extended_C
|
| + blk; Glagolitic_Sup ; Glagolitic_Supplement
|
| + blk; Ideographic_Symbols ; Ideographic_Symbols_And_Punctuation
|
| + blk; Marchen ; Marchen
|
| + blk; Mongolian_Sup ; Mongolian_Supplement
|
| + blk; Newa ; Newa
|
| + blk; Osage ; Osage
|
| + blk; Tangut ; Tangut
|
| + blk; Tangut_Components ; Tangut_Components
|
| + -> add to uchar.h
|
| + use long property names for enum constants
|
| + -> add to UCharacter.UnicodeBlock IDs
|
| + Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
|
| + replace public static final int \1_ID = \2; \3
|
| + -> add to UCharacter.UnicodeBlock objects
|
| + Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
|
| + replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
|
| +
|
| + GCB; EB ; E_Base
|
| + GCB; EBG ; E_Base_GAZ
|
| + GCB; EM ; E_Modifier
|
| + GCB; GAZ ; Glue_After_Zwj
|
| + GCB; ZWJ ; ZWJ
|
| + -> uchar.h & UCharacter.GraphemeClusterBreak
|
| +
|
| + jg ; African_Feh ; African_Feh
|
| + jg ; African_Noon ; African_Noon
|
| + jg ; African_Qaf ; African_Qaf
|
| + -> uchar.h & UCharacter.JoiningGroup
|
| +
|
| + lb ; EB ; E_Base
|
| + lb ; EM ; E_Modifier
|
| + lb ; ZWJ ; ZWJ
|
| + -> uchar.h & UCharacter.LineBreak
|
| +
|
| + sc ; Adlm ; Adlam
|
| + sc ; Bhks ; Bhaiksuki
|
| + sc ; Marc ; Marchen
|
| + sc ; Newa ; Newa
|
| + sc ; Osge ; Osage
|
| + sc ; Tang ; Tangut
|
| + -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
|
| +
|
| + WB ; EB ; E_Base
|
| + WB ; EBG ; E_Base_GAZ
|
| + WB ; EM ; E_Modifier
|
| + WB ; GAZ ; Glue_After_Zwj
|
| + WB ; ZWJ ; ZWJ
|
| + -> uchar.h & UCharacter.WordBreak
|
| +
|
| +* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
|
| + (not strictly necessary for NOT_ENCODED scripts)
|
| + ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
|
| +
|
| +* generate normalization data files
|
| + cd $ICU_ROOT/dbg
|
| + bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
|
| + bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
|
| + bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
|
| + bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
|
| + bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
|
| +
|
| +* build ICU (make install)
|
| + so that the tools build can pick up the new definitions from the installed header files.
|
| +
|
| + $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt
|
| +
|
| +* build Unicode tools using CMake+make
|
| +
|
| +~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
|
| +
|
| + # Location (--prefix) of where ICU was installed.
|
| + set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
|
| + # Location of the ICU source tree.
|
| + set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
|
| +
|
| + ~/svn.icutools/trunk/dbg/unicode/c$
|
| + cmake ../../../src/unicode/c
|
| + make
|
| +
|
| +* generate core properties data files
|
| + ~/svn.icutools/trunk/dbg/unicode/c$
|
| + genprops/genprops $ICU_SRC_DIR
|
| + genuca/genuca --hanOrder implicit $ICU_SRC_DIR
|
| + genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
|
| +- rebuild ICU (make install) & tools
|
| +
|
| +* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
|
| + sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
|
| +- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
|
| +- Unicode 6.0..9.0: U+2260, U+226E, U+226F
|
| +- nothing new in 9.0, no test file to update
|
| +
|
| +* run & fix ICU4C tests
|
| +- Andy handles RBBI & spoof check test failures
|
| +
|
| +* collation: CLDR collation root, UCA DUCET
|
| +
|
| +- UCA DUCET goes into Mark's Unicode tools, see
|
| + https://sites.google.com/site/unicodetools/home#TOC-UCA
|
| +- CLDR root data files are checked into (CLDR UCA branch)/common/uca/
|
| + cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
|
| +
|
| +- cd (CLDR UCA branch)/common/uca/
|
| +- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
|
| + cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
|
| +- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
|
| + cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
|
| + (note removing the underscore before "Rules")
|
| + cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
|
| +- restore TODO diffs in UCARules.txt
|
| + meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
|
| +- update (ICU4C)/source/test/testdata/CollationTest_*.txt
|
| + and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
|
| + from the CLDR root files (..._CLDR_..._SHORT.txt)
|
| + cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
|
| + cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
|
| + cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
|
| +- if CLDR common/uca/unihan-index.txt changes, then update
|
| + CLDR common/collation/root.xml <collation type="private-unihan">
|
| + and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
|
| +
|
| +- run genuca, see command line above;
|
| + deal with
|
| + Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt:
|
| + FDD1 104B5; [75 B8 02, 05, 05] # Osage first primary (compressible)
|
| + (add the character to genuca.cpp sampleCharsToScripts[])
|
| + + look up the USCRIPT_ code for the new sample characters
|
| + (should be obvious from the comment in the error output)
|
| + + *add* mappings to sampleCharsToScripts[], do not replace them
|
| + (in case the script sample characters flip-flop)
|
| + + insert new scripts in DUCET script order, see the top_byte table
|
| + at the beginning of FractionalUCA.txt
|
| +- rebuild ICU4C
|
| +
|
| +* Unihan collators
|
| +- run Unicode Tools
|
| + org.unicode.draft.GenerateUnihanCollators
|
| + with VM arguments
|
| + -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk
|
| + -DOTHER_WORKSPACE=/home/mscherer/svn.unitools
|
| + -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data
|
| + -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
|
| + -DUVERSION=9.0.0
|
| + -ea
|
| +- run Unicode Tools
|
| + org.unicode.draft.GenerateUnihanCollatorFiles
|
| + with the same arguments
|
| +- check CLDR diffs
|
| + cd ~/svn.cldr/trunk
|
| + meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
|
| + meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
|
| +- copy to CLDR
|
| + cd ~/svn.cldr/trunk
|
| + cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
|
| + cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
|
| +- commit to CLDR
|
| +- generate ICU zh collation data: run CLDR
|
| + org.unicode.cldr.icu.NewLdml2IcuConverter
|
| + with program arguments
|
| + -t collation
|
| + -s /home/mscherer/svn.cldr/trunk/common/collation
|
| + -m /home/mscherer/svn.cldr/trunk/common/supplemental
|
| + -d /home/mscherer/svn.icu/trunk/src/source/data/coll
|
| + -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation
|
| + zh
|
| + and VM arguments
|
| + -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
|
| +- rebuild ICU4C
|
| +
|
| +* run & fix ICU4C tests, now with new CLDR collation root data
|
| +- run all tests with the collation test data *_SHORT.txt or the full files
|
| + (the full ones have comments, useful for debugging)
|
| +- note on intltest: if collate/UCAConformanceTest fails, then
|
| + utility/MultithreadTest/TestCollators will fail as well;
|
| + fix the conformance test before looking into the multi-thread test
|
| +
|
| +* update Java data files
|
| +- refresh just the UCD/UCA-related/derived files, just to be safe
|
| +- see (ICU4C)/source/data/icu4j-readme.txt
|
| +- mkdir /tmp/icu4j
|
| +- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
|
| + output:
|
| + ...
|
| + Unicode .icu files built to ./out/build/icudt58l
|
| + echo timestamp > uni-core-data
|
| + mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b
|
| + mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b
|
| + echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
|
| + LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b
|
| + mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b"
|
| + jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/
|
| + mkdir -p /tmp/icu4j/main/shared/data
|
| + cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
|
| + jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/
|
| + mkdir -p /tmp/icu4j/main/shared/data
|
| + cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
|
| + make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
|
| +- copy the big-endian Unicode data files to another location,
|
| + separate from the other data files,
|
| + and then refresh ICU4J
|
| + cd ~/svn.icu/trunk/dbg/data/out/icu4j
|
| + mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
|
| + mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
|
| + cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
|
| + cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
|
| + rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
|
| + cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
|
| + cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
|
| + cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
|
| + jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
|
| +
|
| +* When refreshing all of ICU4J data from ICU4C
|
| +- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
|
| +- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
|
| +or
|
| +- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
|
| +
|
| +* update CollationFCD.java
|
| + + copy & paste the initializers of lcccIndex[] etc. from
|
| + ICU4C/source/i18n/collationfcd.cpp to
|
| + ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
|
| +
|
| +* refresh Java test .txt files
|
| +- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
|
| + cd $ICU_SRC_DIR/source/data/unidata
|
| + cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
|
| + cd ../../test/testdata
|
| + cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
|
| + cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
|
| +
|
| +* run & fix ICU4J tests
|
| +
|
| +*** LayoutEngine script information
|
| +
|
| +* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
|
| + This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
|
| + in the working directory.
|
| +
|
| + (It also generates ScriptRunData.cpp, which is no longer needed.)
|
| +
|
| + It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
|
| + (a plain text file)
|
| + which maps ICU versions to the numbers of script/language constants
|
| + that were added then.
|
| + (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
|
| +
|
| + The generated files have a current copyright date and "@deprecated" statement.
|
| +
|
| +* Review changes, fix Java tool if necessary, and copy to ICU4C
|
| + cd ~/svn.icu4j/trunk/src
|
| + meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
|
| + cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
|
| + cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
|
| +
|
| +*** API additions
|
| +- send notice to icu-design about new born-@stable API (enum constants etc.)
|
| +
|
| +*** merge the Unicode update branches back onto the trunk
|
| +- do not merge the icudata.jar and testdata.jar,
|
| + instead rebuild them from merged & tested ICU4C
|
| +- make sure that changes to Unicode tools & ICU tools are checked in
|
| + http://www.unicode.org/utility/trac/log/trunk/unicodetools
|
| + http://bugs.icu-project.org/trac/log/tools/trunk
|
| +
|
| +---------------------------------------------------------------------------- ***
|
| +
|
| +New script codes early in ICU 58: http://bugs.icu-project.org/trac/ticket/11764
|
| +
|
| +Adding
|
| +- new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge
|
| +- new combination/alias codes: Hanb, Jamo
|
| + - used in CLDR 29 and in spoof checker
|
| +- new Z* code: Zsye
|
| +
|
| +Add new codes to uscript.h & UScript.java, see Unicode update logs.
|
| + -> com.ibm.icu.lang.UScript
|
| + find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
|
| + replace public static final int \1 = \2; \3
|
| +
|
| +Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h,
|
| +add new script codes.
|
| +"Long" script names only where established in Unicode 9 PropertyValueAliases.txt.
|
| +
|
| +Note: If we have to run preparseucd.py again before the Unicode 9 update,
|
| +then we need to manually keep/restore the new script codes.
|
| +
|
| +ICU_ROOT=~/svn.icu/trunk
|
| +ICU_SRC_DIR=$ICU_ROOT/src
|
| +ICUDT=icudt57b
|
| +export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
|
| +SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
|
| +UNIDATA=$ICU_SRC_DIR/source/data/unidata
|
| +
|
| +Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files,
|
| +see http://bugs.icu-project.org/trac/ticket/12141
|
| +
|
| +make install, then icutools cmake & make, then
|
| +~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
|
| +
|
| +Generate Java data as usual, only update pnames.icu & uprops.icu.
|
| +
|
| +*** LayoutEngine script information
|
| +
|
| +* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
|
| + This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
|
| + in the working directory.
|
| +
|
| + (It also generates ScriptRunData.cpp, which is no longer needed.)
|
| +
|
| + It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
|
| + (a plain text file)
|
| + which maps ICU versions to the numbers of script/language constants
|
| + that were added then.
|
| + (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
|
| +
|
| + The generated files have a current copyright date and "@deprecated" statement.
|
| +
|
| +* Review changes, fix Java tool if necessary, and copy to ICU4C
|
| + cd ~/svn.icu4j/trunk/src
|
| + meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
|
| + cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
|
| + cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
|
|
|
| ---------------------------------------------------------------------------- ***
|
|
|
|
|