IdentifiantMot de passe
Loading...
Mot de passe oublié ?Je m'inscris ! (gratuit)
Navigation

Inscrivez-vous gratuitement
pour pouvoir participer, suivre les réponses en temps réel, voter pour les messages, poser vos propres questions et recevoir la newsletter

Langage PHP Discussion :

GB2312 en unicode


Sujet :

Langage PHP

Vue hybride

Message précédent Message précédent   Message suivant Message suivant
  1. #1
    Membre averti
    Profil pro
    Inscrit en
    Décembre 2009
    Messages
    27
    Détails du profil
    Informations personnelles :
    Localisation : France

    Informations forums :
    Inscription : Décembre 2009
    Messages : 27
    Par défaut GB2312 en unicode
    Bonjour,

    Je suis actuellement en train de faire un stage en informatique dans une entreprise chinoise dans laquelle il n'y a aucun informaticien pour me renseigner sur un problème bien particulier.

    Les utilisateurs du site web que je dois créer écriront en chinois et auront des noms d'utilisateurs chinois. J'ai donc encodé, dans phpmyadmin, les champs concernés en gb2312_chinese_ci. Le problème que je rencontre est dans le script php.

    Après avoir cherché sur internet, j'ai tenté d'utiliser la fonction unicode_encode($string) et unicode_decode($string). Malheureusement, le navigateur m'annonce que ces méthodes n'existent pas, alors que l'on peut trouver de nombreuses informations sur celles-ci sur le web.

    J'ai donc essayé une autre solution. J'ai trouvé la fonction suivante créée par un programmeur qui permettrait de faire ce que je désire :

    Code : Sélectionner tout - Visualiser dans une fenêtre à part
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
     
    function gb2unicode($gb)
    {
    	if(!trim($gb))
    		return $gb;
    	$filename="gb2312.txt";
    	$tmp=file($filename);
    	$codetable=array();
    	while(list($key,$value)=each($tmp))
    	$codetable[hexdec(substr($value,0,6))]=substr($value,9,4);
    	$utf="";
    	while($gb)
    	{
    		if (ord(substr($gb,0,1))>127)
    		{
    			$th=substr($gb,0,2);
    			$gb=substr($gb,2,strlen($gb));
    			$utf.="&#x".$codetable[hexdec(bin2hex($th))-0x8080].";";
    		}
    		else
    		{
    			$gb=substr($gb,1,strlen($gb));
    			$utf.=substr($gb,0,1);
    		}
    	}
    	return $utf;
    }
    et voici le fichier gb2312.txt qu'elle utilise :

    Code : Sélectionner tout - Visualiser dans une fenêtre à part
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    264
    265
    266
    267
    268
    269
    270
    271
    272
    273
    274
    275
    276
    277
    278
    279
    280
    281
    282
    283
    284
    285
    286
    287
    288
    289
    290
    291
    292
    293
    294
    295
    296
    297
    298
    299
    300
    301
    302
    303
    304
    305
    306
    307
    308
    309
    310
    311
    312
    313
    314
    315
    316
    317
    318
    319
    320
    321
    322
    323
    324
    325
    326
    327
    328
    329
    330
    331
    332
    333
    334
    335
    336
    337
    338
    339
    340
    341
    342
    343
    344
    345
    346
    347
    348
    349
    350
    351
    352
    353
    354
    355
    356
    357
    358
    359
    360
    361
    362
    363
    364
    365
    366
    367
    368
    369
    370
    371
    372
    373
    374
    375
    376
    377
    378
    379
    380
    381
    382
    383
    384
    385
    386
    387
    388
    389
    390
    391
    392
    393
    394
    395
    396
    397
    398
    399
    400
    401
    402
    403
    404
    405
    406
    407
    408
    409
    410
    411
    412
    413
    414
    415
    416
    417
    418
    419
    420
    421
    422
    423
    424
    425
    426
    427
    428
    429
    430
    431
    432
    433
    434
    435
    436
    437
    438
    439
    440
    441
    442
    443
    444
    445
    446
    447
    448
    449
    450
    451
    452
    453
    454
    455
    456
    457
    458
    459
    460
    461
    462
    463
    464
    465
    466
    467
    468
    469
    470
    471
    472
    473
    474
    475
    476
    477
    478
    479
    480
    481
    482
    483
    484
    485
    486
    487
    488
    489
    490
    491
    492
    493
    494
    495
    496
    497
    498
    499
    500
    501
    502
    503
    504
    505
    506
    507
    508
    509
    510
    511
    512
    513
    514
    515
    516
    517
    518
    519
    520
    521
    522
    523
    524
    525
    526
    527
    528
    529
    530
    531
    532
    533
    534
    535
    536
    537
    538
    539
    540
    541
    542
    543
    544
    545
    546
    547
    548
    549
    550
    551
    552
    553
    554
    555
    556
    557
    558
    559
    560
    561
    562
    563
    564
    565
    566
    567
    568
    569
    570
    571
    572
    573
    574
    575
    576
    577
    578
    579
    580
    581
    582
    583
    584
    585
    586
    587
    588
    589
    590
    591
    592
    593
    594
    595
    596
    597
    598
    599
    600
    601
    602
    603
    604
    605
    606
    607
    608
    609
    610
    611
    612
    613
    614
    615
    616
    617
    618
    619
    620
    621
    622
    623
    624
    625
    626
    627
    628
    629
    630
    631
    632
    633
    634
    635
    636
    637
    638
    639
    640
    641
    642
    643
    644
    645
    646
    647
    648
    649
    650
    651
    652
    653
    654
    655
    656
    657
    658
    659
    660
    661
    662
    663
    664
    665
    666
    667
    668
    669
    670
    671
    672
    673
    674
    675
    676
    677
    678
    679
    680
    681
    682
    683
    684
    685
    686
    687
    688
    689
    690
    691
    692
    693
    694
    695
    696
    697
    698
    699
    700
    701
    702
    703
    704
    705
    706
    707
    708
    709
    710
    711
    712
    713
    714
    715
    716
    717
    718
    719
    720
    721
    722
    723
    724
    725
    726
    727
    728
    729
    730
    731
    732
    733
    734
    735
    736
    737
    738
    739
    740
    741
    742
    743
    744
    745
    746
    747
    748
    749
    750
    751
    752
    753
    754
    755
    756
    757
    758
    759
    760
    761
    762
    763
    764
    765
    766
    767
    768
    769
    770
    771
    772
    773
    774
    775
    776
    777
    778
    779
    780
    781
    782
    783
    784
    785
    786
    787
    788
    789
    790
    791
    792
    793
    794
    795
    796
    797
    798
    799
    800
    801
    802
    803
    804
    805
    806
    807
    808
    809
    810
    811
    812
    813
    814
    815
    816
    817
    818
    819
    820
    821
    822
    823
    824
    825
    826
    827
    828
    829
    830
    831
    832
    833
    834
    835
    836
    837
    838
    839
    840
    841
    842
    843
    844
    845
    846
    847
    848
    849
    850
    851
    852
    853
    854
    855
    856
    857
    858
    859
    860
    861
    862
    863
    864
    865
    866
    867
    868
    869
    870
    871
    872
    873
    874
    875
    876
    877
    878
    879
    880
    881
    882
    883
    884
    885
    886
    887
    888
    889
    890
    891
    892
    893
    894
    895
    896
    897
    898
    899
    900
    901
    902
    903
    904
    905
    906
    907
    908
    909
    910
    911
    912
    913
    914
    915
    916
    917
    918
    919
    920
    921
    922
    923
    924
    925
     
    # gb2312.txt --
    #
    #	GB2312 to Unicode table (modified)
    # from: 
    # http://tcl.apache.org/sources/tcl/tools/encoding/gb2312.txt
    # ftp://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/GB/GB2312.TXT
    #
    # Copyright (c) 1998-1999 by Scriptics Corporation.
    #
    # See the file "license.terms" for information on usage and redistribution
    # of this file, and for a DISCLAIMER OF ALL WARRANTIES.
    #
    # RCS: @(#) $Id: gb2312.txt,v 1.2 1999/04/16 00:47:55 stanton Exp $
    #
    # NOTE: this table has been modified to include the 7-bit ASCII
    # characters that are allowed in GB2312 files.
    #
    #
    #	Name:             GB2312-80 to Unicode table (complete, hex format)
    #	Unicode version:  1.1
    #	Table version:    0.0d2
    #	Table format:     Format A
    #	Date:             6 December 1993
    #	Author:           Glenn Adams <glenn@metis.com>
    #                     John H. Jenkins <John_Jenkins@taligent.com>
    #
    #	Copyright (c) 1991-1994 Unicode, Inc.  All Rights reserved.
    #
    #	This file is provided as-is by Unicode, Inc. (The Unicode Consortium).
    #	No claims are made as to fitness for any particular purpose.  No
    #	warranties of any kind are expressed or implied.  The recipient
    #	agrees to determine applicability of information provided.  If this
    #	file has been provided on magnetic media by Unicode, Inc., the sole
    #	remedy for any claim will be exchange of defective media within 90
    #	days of receipt.
    #
    #	Recipient is granted the right to make copies in any form for
    #	internal distribution and to freely use the information supplied
    #	in the creation of products supporting Unicode.  Unicode, Inc.
    #	specifically excludes the right to re-distribute this file directly
    #	to third parties or other organizations whether for profit or not.
    #
    #	General notes:
    #
    #	This table contains the data Metis and Taligent currently have on how
    #       GB2312-80 characters map into Unicode.
    #
    #	Format:  Three tab-separated columns
    #		 Column #1 is the GB2312 code (in hex as 0xXXXX)
    #		 Column #2 is the Unicode (in hex as 0xXXXX)
    #		 Column #3 the Unicode name (follows a comment sign, '#')
    #					The official names for Unicode characters U+4E00
    #					to U+9FA5, inclusive, is "CJK UNIFIED IDEOGRAPH-XXXX",
    #					where XXXX is the code point.  Including all these
    #					names in this file increases its size substantially
    #					and needlessly.  The token "<CJK>" is used for the
    #					name of these characters.  If necessary, it can be
    #					expanded algorithmically by a parser or editor.
    #
    #	The entries are in GB2312 order
    #
    #	The following algorithms can be used to change the hex form
    #		of GB2312 to other standard forms:
    #
    #		To change hex to EUC form, add 0x8080
    #		To change hex to kuten form, first subtract 0x2020.  Then
    #			the high and low bytes correspond to the ku and ten of
    #			the kuten form.  For example, 0x2121 -> 0x0101 -> 0101;
    #			0x777E -> 0x575E -> 8794
    #
    #	Any comments or problems, contact <John_Jenkins@taligent.com>
    #
    #
    0x2121=>0x3000, # IDEOGRAPHIC SPACE
    0x2122=>0x3001, # IDEOGRAPHIC COMMA
    0x2123=>0x3002, # IDEOGRAPHIC FULL STOP
    0x2124=>0x30FB, # KATAKANA MIDDLE DOT
    0x2125=>0x02C9, # MODIFIER LETTER MACRON (Mandarin Chinese first tone)
    0x2126=>0x02C7, # CARON (Mandarin Chinese third tone)
    0x2127=>0x00A8, # DIAERESIS
    0x2128=>0x3003, # DITTO MARK
    0x2129=>0x3005, # IDEOGRAPHIC ITERATION MARK
    0x212A=>0x2015, # HORIZONTAL BAR
    0x212B=>0xFF5E, # FULLWIDTH TILDE
    0x212C=>0x2225, # PARALLEL TO
    0x212D=>0x2026, # HORIZONTAL ELLIPSIS
    0x212E=>0x2018, # LEFT SINGLE QUOTATION MARK
    0x212F=>0x2019, # RIGHT SINGLE QUOTATION MARK
    0x2130=>0x201C, # LEFT DOUBLE QUOTATION MARK
    0x2131=>0x201D, # RIGHT DOUBLE QUOTATION MARK
    0x2132=>0x3014, # LEFT TORTOISE SHELL BRACKET
    0x2133=>0x3015, # RIGHT TORTOISE SHELL BRACKET
    0x2134=>0x3008, # LEFT ANGLE BRACKET
    0x2135=>0x3009, # RIGHT ANGLE BRACKET
    0x2136=>0x300A, # LEFT DOUBLE ANGLE BRACKET
    0x2137=>0x300B, # RIGHT DOUBLE ANGLE BRACKET
    0x2138=>0x300C, # LEFT CORNER BRACKET
    0x2139=>0x300D, # RIGHT CORNER BRACKET
    0x213A=>0x300E, # LEFT WHITE CORNER BRACKET
    0x213B=>0x300F, # RIGHT WHITE CORNER BRACKET
    0x213C=>0x3016, # LEFT WHITE LENTICULAR BRACKET
    0x213D=>0x3017, # RIGHT WHITE LENTICULAR BRACKET
    0x213E=>0x3010, # LEFT BLACK LENTICULAR BRACKET
    0x213F=>0x3011, # RIGHT BLACK LENTICULAR BRACKET
    0x2140=>0x00B1, # PLUS-MINUS SIGN
    0x2141=>0x00D7, # MULTIPLICATION SIGN
    0x2142=>0x00F7, # DIVISION SIGN
    0x2143=>0x2236, # RATIO
    0x2144=>0x2227, # LOGICAL AND
    0x2145=>0x2228, # LOGICAL OR
    0x2146=>0x2211, # N-ARY SUMMATION
    0x2147=>0x220F, # N-ARY PRODUCT
    0x2148=>0x222A, # UNION
    0x2149=>0x2229, # INTERSECTION
    0x214A=>0x2208, # ELEMENT OF
    0x214B=>0x2237, # PROPORTION
    0x214C=>0x221A, # SQUARE ROOT
    0x214D=>0x22A5, # UP TACK
    0x214E=>0x2225, # PARALLEL TO
    0x214F=>0x2220, # ANGLE
    0x2150=>0x2312, # ARC
    0x2151=>0x2299, # CIRCLED DOT OPERATOR
    0x2152=>0x222B, # INTEGRAL
    0x2153=>0x222E, # CONTOUR INTEGRAL
    0x2154=>0x2261, # IDENTICAL TO
    0x2155=>0x224C, # ALL EQUAL TO
    0x2156=>0x2248, # ALMOST EQUAL TO
    0x2157=>0x223D, # REVERSED TILDE
    0x2158=>0x221D, # PROPORTIONAL TO
    0x2159=>0x2260, # NOT EQUAL TO
    0x215A=>0x226E, # NOT LESS-THAN
    0x215B=>0x226F, # NOT GREATER-THAN
    0x215C=>0x2264, # LESS-THAN OR EQUAL TO
    0x215D=>0x2265, # GREATER-THAN OR EQUAL TO
    0x215E=>0x221E, # INFINITY
    0x215F=>0x2235, # BECAUSE
    0x2160=>0x2234, # THEREFORE
    0x2161=>0x2642, # MALE SIGN
    0x2162=>0x2640, # FEMALE SIGN
    0x2163=>0x00B0, # DEGREE SIGN
    0x2164=>0x2032, # PRIME
    0x2165=>0x2033, # DOUBLE PRIME
    0x2166=>0x2103, # DEGREE CELSIUS
    0x2167=>0xFF04, # FULLWIDTH DOLLAR SIGN
    0x2168=>0x00A4, # CURRENCY SIGN
    0x2169=>0xFFE0, # FULLWIDTH CENT SIGN
    0x216A=>0xFFE1, # FULLWIDTH POUND SIGN
    0x216B=>0x2030, # PER MILLE SIGN
    0x216C=>0x00A7, # SECTION SIGN
    0x216D=>0x2116, # NUMERO SIGN
    0x216E=>0x2606, # WHITE STAR
    0x216F=>0x2605, # BLACK STAR
    0x2170=>0x25CB, # WHITE CIRCLE
    0x2171=>0x25CF, # BLACK CIRCLE
    0x2172=>0x25CE, # BULLSEYE
    0x2173=>0x25C7, # WHITE DIAMOND
    0x2174=>0x25C6, # BLACK DIAMOND
    0x2175=>0x25A1, # WHITE SQUARE
    0x2176=>0x25A0, # BLACK SQUARE
    0x2177=>0x25B3, # WHITE UP-POINTING TRIANGLE
    0x2178=>0x25B2, # BLACK UP-POINTING TRIANGLE
    0x2179=>0x203B, # REFERENCE MARK
    0x217A=>0x2192, # RIGHTWARDS ARROW
    0x217B=>0x2190, # LEFTWARDS ARROW
    0x217C=>0x2191, # UPWARDS ARROW
    0x217D=>0x2193, # DOWNWARDS ARROW
    0x217E=>0x3013, # GETA MARK
    0x2231=>0x2488, # DIGIT ONE FULL STOP
    0x2232=>0x2489, # DIGIT TWO FULL STOP
    0x2233=>0x248A, # DIGIT THREE FULL STOP
    0x2234=>0x248B, # DIGIT FOUR FULL STOP
    0x2235=>0x248C, # DIGIT FIVE FULL STOP
    0x2236=>0x248D, # DIGIT SIX FULL STOP
    0x2237=>0x248E, # DIGIT SEVEN FULL STOP
    0x2238=>0x248F, # DIGIT EIGHT FULL STOP
    0x2239=>0x2490, # DIGIT NINE FULL STOP
    0x223A=>0x2491, # NUMBER TEN FULL STOP
    0x223B=>0x2492, # NUMBER ELEVEN FULL STOP
    0x223C=>0x2493, # NUMBER TWELVE FULL STOP
    0x223D=>0x2494, # NUMBER THIRTEEN FULL STOP
    0x223E=>0x2495, # NUMBER FOURTEEN FULL STOP
    0x223F=>0x2496, # NUMBER FIFTEEN FULL STOP
    0x2240=>0x2497, # NUMBER SIXTEEN FULL STOP
    0x2241=>0x2498, # NUMBER SEVENTEEN FULL STOP
    0x2242=>0x2499, # NUMBER EIGHTEEN FULL STOP
    0x2243=>0x249A, # NUMBER NINETEEN FULL STOP
    0x2244=>0x249B, # NUMBER TWENTY FULL STOP
    0x2245=>0x2474, # PARENTHESIZED DIGIT ONE
    0x2246=>0x2475, # PARENTHESIZED DIGIT TWO
    0x2247=>0x2476, # PARENTHESIZED DIGIT THREE
    0x2248=>0x2477, # PARENTHESIZED DIGIT FOUR
    0x2249=>0x2478, # PARENTHESIZED DIGIT FIVE
    0x224A=>0x2479, # PARENTHESIZED DIGIT SIX
    0x224B=>0x247A, # PARENTHESIZED DIGIT SEVEN
    0x224C=>0x247B, # PARENTHESIZED DIGIT EIGHT
    0x224D=>0x247C, # PARENTHESIZED DIGIT NINE
    0x224E=>0x247D, # PARENTHESIZED NUMBER TEN
    0x224F=>0x247E, # PARENTHESIZED NUMBER ELEVEN
    0x2250=>0x247F, # PARENTHESIZED NUMBER TWELVE
    0x2251=>0x2480, # PARENTHESIZED NUMBER THIRTEEN
    0x2252=>0x2481, # PARENTHESIZED NUMBER FOURTEEN
    0x2253=>0x2482, # PARENTHESIZED NUMBER FIFTEEN
    0x2254=>0x2483, # PARENTHESIZED NUMBER SIXTEEN
    0x2255=>0x2484, # PARENTHESIZED NUMBER SEVENTEEN
    0x2256=>0x2485, # PARENTHESIZED NUMBER EIGHTEEN
    0x2257=>0x2486, # PARENTHESIZED NUMBER NINETEEN
    0x2258=>0x2487, # PARENTHESIZED NUMBER TWENTY
    0x2259=>0x2460, # CIRCLED DIGIT ONE
    0x225A=>0x2461, # CIRCLED DIGIT TWO
    0x225B=>0x2462, # CIRCLED DIGIT THREE
    0x225C=>0x2463, # CIRCLED DIGIT FOUR
    0x225D=>0x2464, # CIRCLED DIGIT FIVE
    0x225E=>0x2465, # CIRCLED DIGIT SIX
    0x225F=>0x2466, # CIRCLED DIGIT SEVEN
    0x2260=>0x2467, # CIRCLED DIGIT EIGHT
    0x2261=>0x2468, # CIRCLED DIGIT NINE
    0x2262=>0x2469, # CIRCLED NUMBER TEN
    0x2265=>0x3220, # PARENTHESIZED IDEOGRAPH ONE
    0x2266=>0x3221, # PARENTHESIZED IDEOGRAPH TWO
    0x2267=>0x3222, # PARENTHESIZED IDEOGRAPH THREE
    0x2268=>0x3223, # PARENTHESIZED IDEOGRAPH FOUR
    0x2269=>0x3224, # PARENTHESIZED IDEOGRAPH FIVE
    0x226A=>0x3225, # PARENTHESIZED IDEOGRAPH SIX
    0x226B=>0x3226, # PARENTHESIZED IDEOGRAPH SEVEN
    0x226C=>0x3227, # PARENTHESIZED IDEOGRAPH EIGHT
    0x226D=>0x3228, # PARENTHESIZED IDEOGRAPH NINE
    0x226E=>0x3229, # PARENTHESIZED IDEOGRAPH TEN
    0x2271=>0x2160, # ROMAN NUMERAL ONE
    0x2272=>0x2161, # ROMAN NUMERAL TWO
    0x2273=>0x2162, # ROMAN NUMERAL THREE
    0x2274=>0x2163, # ROMAN NUMERAL FOUR
    0x2275=>0x2164, # ROMAN NUMERAL FIVE
    0x2276=>0x2165, # ROMAN NUMERAL SIX
    0x2277=>0x2166, # ROMAN NUMERAL SEVEN
    0x2278=>0x2167, # ROMAN NUMERAL EIGHT
    0x2279=>0x2168, # ROMAN NUMERAL NINE
    0x227A=>0x2169, # ROMAN NUMERAL TEN
    0x227B=>0x216A, # ROMAN NUMERAL ELEVEN
    0x227C=>0x216B, # ROMAN NUMERAL TWELVE
    0x2321=>0xFF01, # FULLWIDTH EXCLAMATION MARK
    0x2322=>0xFF02, # FULLWIDTH QUOTATION MARK
    0x2323=>0xFF03, # FULLWIDTH NUMBER SIGN
    0x2324=>0xFFE5, # FULLWIDTH YEN SIGN
    0x2325=>0xFF05, # FULLWIDTH PERCENT SIGN
    0x2326=>0xFF06, # FULLWIDTH AMPERSAND
    0x2327=>0xFF07, # FULLWIDTH APOSTROPHE
    0x2328=>0xFF08, # FULLWIDTH LEFT PARENTHESIS
    0x2329=>0xFF09, # FULLWIDTH RIGHT PARENTHESIS
    0x232A=>0xFF0A, # FULLWIDTH ASTERISK
    0x232B=>0xFF0B, # FULLWIDTH PLUS SIGN
    0x232C=>0xFF0C, # FULLWIDTH COMMA
    0x232D=>0xFF0D, # FULLWIDTH HYPHEN-MINUS
    0x232E=>0xFF0E, # FULLWIDTH FULL STOP
    0x232F=>0xFF0F, # FULLWIDTH SOLIDUS
    0x2330=>0xFF10, # FULLWIDTH DIGIT ZERO
    0x2331=>0xFF11, # FULLWIDTH DIGIT ONE
    0x2332=>0xFF12, # FULLWIDTH DIGIT TWO
    0x2333=>0xFF13, # FULLWIDTH DIGIT THREE
    0x2334=>0xFF14, # FULLWIDTH DIGIT FOUR
    0x2335=>0xFF15, # FULLWIDTH DIGIT FIVE
    0x2336=>0xFF16, # FULLWIDTH DIGIT SIX
    0x2337=>0xFF17, # FULLWIDTH DIGIT SEVEN
    0x2338=>0xFF18, # FULLWIDTH DIGIT EIGHT
    0x2339=>0xFF19, # FULLWIDTH DIGIT NINE
    0x233A=>0xFF1A, # FULLWIDTH COLON
    0x233B=>0xFF1B, # FULLWIDTH SEMICOLON
    0x233C=>0xFF1C, # FULLWIDTH LESS-THAN SIGN
    0x233D=>0xFF1D, # FULLWIDTH EQUALS SIGN
    0x233E=>0xFF1E, # FULLWIDTH GREATER-THAN SIGN
    0x233F=>0xFF1F, # FULLWIDTH QUESTION MARK
    0x2340=>0xFF20, # FULLWIDTH COMMERCIAL AT
    0x2341=>0xFF21, # FULLWIDTH LATIN CAPITAL LETTER A
    0x2342=>0xFF22, # FULLWIDTH LATIN CAPITAL LETTER B
    0x2343=>0xFF23, # FULLWIDTH LATIN CAPITAL LETTER C
    0x2344=>0xFF24, # FULLWIDTH LATIN CAPITAL LETTER D
    0x2345=>0xFF25, # FULLWIDTH LATIN CAPITAL LETTER E
    0x2346=>0xFF26, # FULLWIDTH LATIN CAPITAL LETTER F
    0x2347=>0xFF27, # FULLWIDTH LATIN CAPITAL LETTER G
    0x2348=>0xFF28, # FULLWIDTH LATIN CAPITAL LETTER H
    0x2349=>0xFF29, # FULLWIDTH LATIN CAPITAL LETTER I
    0x234A=>0xFF2A, # FULLWIDTH LATIN CAPITAL LETTER J
    0x234B=>0xFF2B, # FULLWIDTH LATIN CAPITAL LETTER K
    0x234C=>0xFF2C, # FULLWIDTH LATIN CAPITAL LETTER L
    0x234D=>0xFF2D, # FULLWIDTH LATIN CAPITAL LETTER M
    0x234E=>0xFF2E, # FULLWIDTH LATIN CAPITAL LETTER N
    0x234F=>0xFF2F, # FULLWIDTH LATIN CAPITAL LETTER O
    0x2350=>0xFF30, # FULLWIDTH LATIN CAPITAL LETTER P
    0x2351=>0xFF31, # FULLWIDTH LATIN CAPITAL LETTER Q
    0x2352=>0xFF32, # FULLWIDTH LATIN CAPITAL LETTER R
    0x2353=>0xFF33, # FULLWIDTH LATIN CAPITAL LETTER S
    0x2354=>0xFF34, # FULLWIDTH LATIN CAPITAL LETTER T
    0x2355=>0xFF35, # FULLWIDTH LATIN CAPITAL LETTER U
    0x2356=>0xFF36, # FULLWIDTH LATIN CAPITAL LETTER V
    0x2357=>0xFF37, # FULLWIDTH LATIN CAPITAL LETTER W
    0x2358=>0xFF38, # FULLWIDTH LATIN CAPITAL LETTER X
    0x2359=>0xFF39, # FULLWIDTH LATIN CAPITAL LETTER Y
    0x235A=>0xFF3A, # FULLWIDTH LATIN CAPITAL LETTER Z
    0x235B=>0xFF3B, # FULLWIDTH LEFT SQUARE BRACKET
    0x235C=>0xFF3C, # FULLWIDTH REVERSE SOLIDUS
    0x235D=>0xFF3D, # FULLWIDTH RIGHT SQUARE BRACKET
    0x235E=>0xFF3E, # FULLWIDTH CIRCUMFLEX ACCENT
    0x235F=>0xFF3F, # FULLWIDTH LOW LINE
    0x2360=>0xFF40, # FULLWIDTH GRAVE ACCENT
    0x2361=>0xFF41, # FULLWIDTH LATIN SMALL LETTER A
    0x2362=>0xFF42, # FULLWIDTH LATIN SMALL LETTER B
    0x2363=>0xFF43, # FULLWIDTH LATIN SMALL LETTER C
    0x2364=>0xFF44, # FULLWIDTH LATIN SMALL LETTER D
    0x2365=>0xFF45, # FULLWIDTH LATIN SMALL LETTER E
    0x2366=>0xFF46, # FULLWIDTH LATIN SMALL LETTER F
    0x2367=>0xFF47, # FULLWIDTH LATIN SMALL LETTER G
    0x2368=>0xFF48, # FULLWIDTH LATIN SMALL LETTER H
    0x2369=>0xFF49, # FULLWIDTH LATIN SMALL LETTER I
    0x236A=>0xFF4A, # FULLWIDTH LATIN SMALL LETTER J
    0x236B=>0xFF4B, # FULLWIDTH LATIN SMALL LETTER K
    0x236C=>0xFF4C, # FULLWIDTH LATIN SMALL LETTER L
    0x236D=>0xFF4D, # FULLWIDTH LATIN SMALL LETTER M
    0x236E=>0xFF4E, # FULLWIDTH LATIN SMALL LETTER N
    0x236F=>0xFF4F, # FULLWIDTH LATIN SMALL LETTER O
    0x2370=>0xFF50, # FULLWIDTH LATIN SMALL LETTER P
    0x2371=>0xFF51, # FULLWIDTH LATIN SMALL LETTER Q
    0x2372=>0xFF52, # FULLWIDTH LATIN SMALL LETTER R
    0x2373=>0xFF53, # FULLWIDTH LATIN SMALL LETTER S
    0x2374=>0xFF54, # FULLWIDTH LATIN SMALL LETTER T
    0x2375=>0xFF55, # FULLWIDTH LATIN SMALL LETTER U
    0x2376=>0xFF56, # FULLWIDTH LATIN SMALL LETTER V
    0x2377=>0xFF57, # FULLWIDTH LATIN SMALL LETTER W
    0x2378=>0xFF58, # FULLWIDTH LATIN SMALL LETTER X
    0x2379=>0xFF59, # FULLWIDTH LATIN SMALL LETTER Y
    0x237A=>0xFF5A, # FULLWIDTH LATIN SMALL LETTER Z
    0x237B=>0xFF5B, # FULLWIDTH LEFT CURLY BRACKET
    0x237C=>0xFF5C, # FULLWIDTH VERTICAL LINE
    0x237D=>0xFF5D, # FULLWIDTH RIGHT CURLY BRACKET
    0x237E=>0xFFE3, # FULLWIDTH MACRON
    0x2421=>0x3041, # HIRAGANA LETTER SMALL A
    0x2422=>0x3042, # HIRAGANA LETTER A
    0x2423=>0x3043, # HIRAGANA LETTER SMALL I
    0x2424=>0x3044, # HIRAGANA LETTER I
    0x2425=>0x3045, # HIRAGANA LETTER SMALL U
    0x2426=>0x3046, # HIRAGANA LETTER U
    0x2427=>0x3047, # HIRAGANA LETTER SMALL E
    0x2428=>0x3048, # HIRAGANA LETTER E
    0x2429=>0x3049, # HIRAGANA LETTER SMALL O
    0x242A=>0x304A, # HIRAGANA LETTER O
    0x242B=>0x304B, # HIRAGANA LETTER KA
    0x242C=>0x304C, # HIRAGANA LETTER GA
    0x242D=>0x304D, # HIRAGANA LETTER KI
    0x242E=>0x304E, # HIRAGANA LETTER GI
    0x242F=>0x304F, # HIRAGANA LETTER KU
    0x2430=>0x3050, # HIRAGANA LETTER GU
    0x2431=>0x3051, # HIRAGANA LETTER KE
    0x2432=>0x3052, # HIRAGANA LETTER GE
    0x2433=>0x3053, # HIRAGANA LETTER KO
    0x2434=>0x3054, # HIRAGANA LETTER GO
    0x2435=>0x3055, # HIRAGANA LETTER SA
    0x2436=>0x3056, # HIRAGANA LETTER ZA
    0x2437=>0x3057, # HIRAGANA LETTER SI
    0x2438=>0x3058, # HIRAGANA LETTER ZI
    0x2439=>0x3059, # HIRAGANA LETTER SU
    0x243A=>0x305A, # HIRAGANA LETTER ZU
    0x243B=>0x305B, # HIRAGANA LETTER SE
    0x243C=>0x305C, # HIRAGANA LETTER ZE
    0x243D=>0x305D, # HIRAGANA LETTER SO
    0x243E=>0x305E, # HIRAGANA LETTER ZO
    0x243F=>0x305F, # HIRAGANA LETTER TA
    0x2440=>0x3060, # HIRAGANA LETTER DA
    0x2441=>0x3061, # HIRAGANA LETTER TI
    0x2442=>0x3062, # HIRAGANA LETTER DI
    0x2443=>0x3063, # HIRAGANA LETTER SMALL TU
    0x2444=>0x3064, # HIRAGANA LETTER TU
    0x2445=>0x3065, # HIRAGANA LETTER DU
    0x2446=>0x3066, # HIRAGANA LETTER TE
    0x2447=>0x3067, # HIRAGANA LETTER DE
    0x2448=>0x3068, # HIRAGANA LETTER TO
    0x2449=>0x3069, # HIRAGANA LETTER DO
    0x244A=>0x306A, # HIRAGANA LETTER NA
    0x244B=>0x306B, # HIRAGANA LETTER NI
    0x244C=>0x306C, # HIRAGANA LETTER NU
    0x244D=>0x306D, # HIRAGANA LETTER NE
    0x244E=>0x306E, # HIRAGANA LETTER NO
    0x244F=>0x306F, # HIRAGANA LETTER HA
    0x2450=>0x3070, # HIRAGANA LETTER BA
    0x2451=>0x3071, # HIRAGANA LETTER PA
    0x2452=>0x3072, # HIRAGANA LETTER HI
    0x2453=>0x3073, # HIRAGANA LETTER BI
    0x2454=>0x3074, # HIRAGANA LETTER PI
    0x2455=>0x3075, # HIRAGANA LETTER HU
    0x2456=>0x3076, # HIRAGANA LETTER BU
    0x2457=>0x3077, # HIRAGANA LETTER PU
    0x2458=>0x3078, # HIRAGANA LETTER HE
    0x2459=>0x3079, # HIRAGANA LETTER BE
    0x245A=>0x307A, # HIRAGANA LETTER PE
    0x245B=>0x307B, # HIRAGANA LETTER HO
    0x245C=>0x307C, # HIRAGANA LETTER BO
    0x245D=>0x307D, # HIRAGANA LETTER PO
    0x245E=>0x307E, # HIRAGANA LETTER MA
    0x245F=>0x307F, # HIRAGANA LETTER MI
    0x2460=>0x3080, # HIRAGANA LETTER MU
    0x2461=>0x3081, # HIRAGANA LETTER ME
    0x2462=>0x3082, # HIRAGANA LETTER MO
    0x2463=>0x3083, # HIRAGANA LETTER SMALL YA
    0x2464=>0x3084, # HIRAGANA LETTER YA
    0x2465=>0x3085, # HIRAGANA LETTER SMALL YU
    0x2466=>0x3086, # HIRAGANA LETTER YU
    0x2467=>0x3087, # HIRAGANA LETTER SMALL YO
    0x2468=>0x3088, # HIRAGANA LETTER YO
    0x2469=>0x3089, # HIRAGANA LETTER RA
    0x246A=>0x308A, # HIRAGANA LETTER RI
    0x246B=>0x308B, # HIRAGANA LETTER RU
    0x246C=>0x308C, # HIRAGANA LETTER RE
    0x246D=>0x308D, # HIRAGANA LETTER RO
    0x246E=>0x308E, # HIRAGANA LETTER SMALL WA
    0x246F=>0x308F, # HIRAGANA LETTER WA
    0x2470=>0x3090, # HIRAGANA LETTER WI
    0x2471=>0x3091, # HIRAGANA LETTER WE
    0x2472=>0x3092, # HIRAGANA LETTER WO
    0x2473=>0x3093, # HIRAGANA LETTER N
    0x2521=>0x30A1, # KATAKANA LETTER SMALL A
    0x2522=>0x30A2, # KATAKANA LETTER A
    0x2523=>0x30A3, # KATAKANA LETTER SMALL I
    0x2524=>0x30A4, # KATAKANA LETTER I
    0x2525=>0x30A5, # KATAKANA LETTER SMALL U
    0x2526=>0x30A6, # KATAKANA LETTER U
    0x2527=>0x30A7, # KATAKANA LETTER SMALL E
    0x2528=>0x30A8, # KATAKANA LETTER E
    0x2529=>0x30A9, # KATAKANA LETTER SMALL O
    0x252A=>0x30AA, # KATAKANA LETTER O
    0x252B=>0x30AB, # KATAKANA LETTER KA
    0x252C=>0x30AC, # KATAKANA LETTER GA
    0x252D=>0x30AD, # KATAKANA LETTER KI
    0x252E=>0x30AE, # KATAKANA LETTER GI
    0x252F=>0x30AF, # KATAKANA LETTER KU
    0x2530=>0x30B0, # KATAKANA LETTER GU
    0x2531=>0x30B1, # KATAKANA LETTER KE
    0x2532=>0x30B2, # KATAKANA LETTER GE
    0x2533=>0x30B3, # KATAKANA LETTER KO
    0x2534=>0x30B4, # KATAKANA LETTER GO
    0x2535=>0x30B5, # KATAKANA LETTER SA
    0x2536=>0x30B6, # KATAKANA LETTER ZA
    0x2537=>0x30B7, # KATAKANA LETTER SI
    0x2538=>0x30B8, # KATAKANA LETTER ZI
    0x2539=>0x30B9, # KATAKANA LETTER SU
    0x253A=>0x30BA, # KATAKANA LETTER ZU
    0x253B=>0x30BB, # KATAKANA LETTER SE
    0x253C=>0x30BC, # KATAKANA LETTER ZE
    0x253D=>0x30BD, # KATAKANA LETTER SO
    0x253E=>0x30BE, # KATAKANA LETTER ZO
    0x253F=>0x30BF, # KATAKANA LETTER TA
    0x2540=>0x30C0, # KATAKANA LETTER DA
    0x2541=>0x30C1, # KATAKANA LETTER TI
    0x2542=>0x30C2, # KATAKANA LETTER DI
    0x2543=>0x30C3, # KATAKANA LETTER SMALL TU
    0x2544=>0x30C4, # KATAKANA LETTER TU
    0x2545=>0x30C5, # KATAKANA LETTER DU
    0x2546=>0x30C6, # KATAKANA LETTER TE
    0x2547=>0x30C7, # KATAKANA LETTER DE
    0x2548=>0x30C8, # KATAKANA LETTER TO
    0x2549=>0x30C9, # KATAKANA LETTER DO
    0x254A=>0x30CA, # KATAKANA LETTER NA
    0x254B=>0x30CB, # KATAKANA LETTER NI
    0x254C=>0x30CC, # KATAKANA LETTER NU
    0x254D=>0x30CD, # KATAKANA LETTER NE
    0x254E=>0x30CE, # KATAKANA LETTER NO
    0x254F=>0x30CF, # KATAKANA LETTER HA
    0x2550=>0x30D0, # KATAKANA LETTER BA
    0x2551=>0x30D1, # KATAKANA LETTER PA
    0x2552=>0x30D2, # KATAKANA LETTER HI
    0x2553=>0x30D3, # KATAKANA LETTER BI
    0x2554=>0x30D4, # KATAKANA LETTER PI
    0x2555=>0x30D5, # KATAKANA LETTER HU
    0x2556=>0x30D6, # KATAKANA LETTER BU
    0x2557=>0x30D7, # KATAKANA LETTER PU
    0x2558=>0x30D8, # KATAKANA LETTER HE
    0x2559=>0x30D9, # KATAKANA LETTER BE
    0x255A=>0x30DA, # KATAKANA LETTER PE
    0x255B=>0x30DB, # KATAKANA LETTER HO
    0x255C=>0x30DC, # KATAKANA LETTER BO
    0x255D=>0x30DD, # KATAKANA LETTER PO
    0x255E=>0x30DE, # KATAKANA LETTER MA
    0x255F=>0x30DF, # KATAKANA LETTER MI
    0x2560=>0x30E0, # KATAKANA LETTER MU
    0x2561=>0x30E1, # KATAKANA LETTER ME
    0x2562=>0x30E2, # KATAKANA LETTER MO
    0x2563=>0x30E3, # KATAKANA LETTER SMALL YA
    0x2564=>0x30E4, # KATAKANA LETTER YA
    0x2565=>0x30E5, # KATAKANA LETTER SMALL YU
    0x2566=>0x30E6, # KATAKANA LETTER YU
    0x2567=>0x30E7, # KATAKANA LETTER SMALL YO
    0x2568=>0x30E8, # KATAKANA LETTER YO
    0x2569=>0x30E9, # KATAKANA LETTER RA
    0x256A=>0x30EA, # KATAKANA LETTER RI
    0x256B=>0x30EB, # KATAKANA LETTER RU
    0x256C=>0x30EC, # KATAKANA LETTER RE
    0x256D=>0x30ED, # KATAKANA LETTER RO
    0x256E=>0x30EE, # KATAKANA LETTER SMALL WA
    0x256F=>0x30EF, # KATAKANA LETTER WA
    0x2570=>0x30F0, # KATAKANA LETTER WI
    0x2571=>0x30F1, # KATAKANA LETTER WE
    0x2572=>0x30F2, # KATAKANA LETTER WO
    0x2573=>0x30F3, # KATAKANA LETTER N
    0x2574=>0x30F4, # KATAKANA LETTER VU
    0x2575=>0x30F5, # KATAKANA LETTER SMALL KA
    0x2576=>0x30F6, # KATAKANA LETTER SMALL KE
    0x2621=>0x0391, # GREEK CAPITAL LETTER ALPHA
    0x2622=>0x0392, # GREEK CAPITAL LETTER BETA
    0x2623=>0x0393, # GREEK CAPITAL LETTER GAMMA
    0x2624=>0x0394, # GREEK CAPITAL LETTER DELTA
    0x2625=>0x0395, # GREEK CAPITAL LETTER EPSILON
    0x2626=>0x0396, # GREEK CAPITAL LETTER ZETA
    0x2627=>0x0397, # GREEK CAPITAL LETTER ETA
    0x2628=>0x0398, # GREEK CAPITAL LETTER THETA
    0x2629=>0x0399, # GREEK CAPITAL LETTER IOTA
    0x262A=>0x039A, # GREEK CAPITAL LETTER KAPPA
    0x262B=>0x039B, # GREEK CAPITAL LETTER LAMDA
    0x262C=>0x039C, # GREEK CAPITAL LETTER MU
    0x262D=>0x039D, # GREEK CAPITAL LETTER NU
    0x262E=>0x039E, # GREEK CAPITAL LETTER XI
    0x262F=>0x039F, # GREEK CAPITAL LETTER OMICRON
    0x2630=>0x03A0, # GREEK CAPITAL LETTER PI
    0x2631=>0x03A1, # GREEK CAPITAL LETTER RHO
    0x2632=>0x03A3, # GREEK CAPITAL LETTER SIGMA
    0x2633=>0x03A4, # GREEK CAPITAL LETTER TAU
    0x2634=>0x03A5, # GREEK CAPITAL LETTER UPSILON
    0x2635=>0x03A6, # GREEK CAPITAL LETTER PHI
    0x2636=>0x03A7, # GREEK CAPITAL LETTER CHI
    0x2637=>0x03A8, # GREEK CAPITAL LETTER PSI
    0x2638=>0x03A9, # GREEK CAPITAL LETTER OMEGA
    0x2641=>0x03B1, # GREEK SMALL LETTER ALPHA
    0x2642=>0x03B2, # GREEK SMALL LETTER BETA
    0x2643=>0x03B3, # GREEK SMALL LETTER GAMMA
    0x2644=>0x03B4, # GREEK SMALL LETTER DELTA
    0x2645=>0x03B5, # GREEK SMALL LETTER EPSILON
    0x2646=>0x03B6, # GREEK SMALL LETTER ZETA
    0x2647=>0x03B7, # GREEK SMALL LETTER ETA
    0x2648=>0x03B8, # GREEK SMALL LETTER THETA
    0x2649=>0x03B9, # GREEK SMALL LETTER IOTA
    0x264A=>0x03BA, # GREEK SMALL LETTER KAPPA
    0x264B=>0x03BB, # GREEK SMALL LETTER LAMDA
    0x264C=>0x03BC, # GREEK SMALL LETTER MU
    0x264D=>0x03BD, # GREEK SMALL LETTER NU
    0x264E=>0x03BE, # GREEK SMALL LETTER XI
    0x264F=>0x03BF, # GREEK SMALL LETTER OMICRON
    0x2650=>0x03C0, # GREEK SMALL LETTER PI
    0x2651=>0x03C1, # GREEK SMALL LETTER RHO
    0x2652=>0x03C3, # GREEK SMALL LETTER SIGMA
    0x2653=>0x03C4, # GREEK SMALL LETTER TAU
    0x2654=>0x03C5, # GREEK SMALL LETTER UPSILON
    0x2655=>0x03C6, # GREEK SMALL LETTER PHI
    0x2656=>0x03C7, # GREEK SMALL LETTER CHI
    0x2657=>0x03C8, # GREEK SMALL LETTER PSI
    0x2658=>0x03C9, # GREEK SMALL LETTER OMEGA
    0x2721=>0x0410, # CYRILLIC CAPITAL LETTER A
    0x2722=>0x0411, # CYRILLIC CAPITAL LETTER BE
    0x2723=>0x0412, # CYRILLIC CAPITAL LETTER VE
    0x2724=>0x0413, # CYRILLIC CAPITAL LETTER GHE
    0x2725=>0x0414, # CYRILLIC CAPITAL LETTER DE
    0x2726=>0x0415, # CYRILLIC CAPITAL LETTER IE
    0x2727=>0x0401, # CYRILLIC CAPITAL LETTER IO
    0x2728=>0x0416, # CYRILLIC CAPITAL LETTER ZHE
    0x2729=>0x0417, # CYRILLIC CAPITAL LETTER ZE
    0x272A=>0x0418, # CYRILLIC CAPITAL LETTER I
    0x272B=>0x0419, # CYRILLIC CAPITAL LETTER SHORT I
    0x272C=>0x041A, # CYRILLIC CAPITAL LETTER KA
    0x272D=>0x041B, # CYRILLIC CAPITAL LETTER EL
    0x272E=>0x041C, # CYRILLIC CAPITAL LETTER EM
    0x272F=>0x041D, # CYRILLIC CAPITAL LETTER EN
    0x2730=>0x041E, # CYRILLIC CAPITAL LETTER O
    0x2731=>0x041F, # CYRILLIC CAPITAL LETTER PE
    0x2732=>0x0420, # CYRILLIC CAPITAL LETTER ER
    0x2733=>0x0421, # CYRILLIC CAPITAL LETTER ES
    0x2734=>0x0422, # CYRILLIC CAPITAL LETTER TE
    0x2735=>0x0423, # CYRILLIC CAPITAL LETTER U
    0x2736=>0x0424, # CYRILLIC CAPITAL LETTER EF
    0x2737=>0x0425, # CYRILLIC CAPITAL LETTER HA
    0x2738=>0x0426, # CYRILLIC CAPITAL LETTER TSE
    0x2739=>0x0427, # CYRILLIC CAPITAL LETTER CHE
    0x273A=>0x0428, # CYRILLIC CAPITAL LETTER SHA
    0x273B=>0x0429, # CYRILLIC CAPITAL LETTER SHCHA
    0x273C=>0x042A, # CYRILLIC CAPITAL LETTER HARD SIGN
    0x273D=>0x042B, # CYRILLIC CAPITAL LETTER YERU
    0x273E=>0x042C, # CYRILLIC CAPITAL LETTER SOFT SIGN
    0x273F=>0x042D, # CYRILLIC CAPITAL LETTER E
    0x2740=>0x042E, # CYRILLIC CAPITAL LETTER YU
    0x2741=>0x042F, # CYRILLIC CAPITAL LETTER YA
    0x2751=>0x0430, # CYRILLIC SMALL LETTER A
    0x2752=>0x0431, # CYRILLIC SMALL LETTER BE
    0x2753=>0x0432, # CYRILLIC SMALL LETTER VE
    0x2754=>0x0433, # CYRILLIC SMALL LETTER GHE
    0x2755=>0x0434, # CYRILLIC SMALL LETTER DE
    0x2756=>0x0435, # CYRILLIC SMALL LETTER IE
    0x2757=>0x0451, # CYRILLIC SMALL LETTER IO
    0x2758=>0x0436, # CYRILLIC SMALL LETTER ZHE
    0x2759=>0x0437, # CYRILLIC SMALL LETTER ZE
    0x275A=>0x0438, # CYRILLIC SMALL LETTER I
    0x275B=>0x0439, # CYRILLIC SMALL LETTER SHORT I
    0x275C=>0x043A, # CYRILLIC SMALL LETTER KA
    0x275D=>0x043B, # CYRILLIC SMALL LETTER EL
    0x275E=>0x043C, # CYRILLIC SMALL LETTER EM
    0x275F=>0x043D, # CYRILLIC SMALL LETTER EN
    0x2760=>0x043E, # CYRILLIC SMALL LETTER O
    0x2761=>0x043F, # CYRILLIC SMALL LETTER PE
    0x2762=>0x0440, # CYRILLIC SMALL LETTER ER
    0x2763=>0x0441, # CYRILLIC SMALL LETTER ES
    0x2764=>0x0442, # CYRILLIC SMALL LETTER TE
    0x2765=>0x0443, # CYRILLIC SMALL LETTER U
    0x2766=>0x0444, # CYRILLIC SMALL LETTER EF
    0x2767=>0x0445, # CYRILLIC SMALL LETTER HA
    0x2768=>0x0446, # CYRILLIC SMALL LETTER TSE
    0x2769=>0x0447, # CYRILLIC SMALL LETTER CHE
    0x276A=>0x0448, # CYRILLIC SMALL LETTER SHA
    0x276B=>0x0449, # CYRILLIC SMALL LETTER SHCHA
    0x276C=>0x044A, # CYRILLIC SMALL LETTER HARD SIGN
    0x276D=>0x044B, # CYRILLIC SMALL LETTER YERU
    0x276E=>0x044C, # CYRILLIC SMALL LETTER SOFT SIGN
    0x276F=>0x044D, # CYRILLIC SMALL LETTER E
    0x2770=>0x044E, # CYRILLIC SMALL LETTER YU
    0x2771=>0x044F, # CYRILLIC SMALL LETTER YA
    0x2821=>0x0101, # LATIN SMALL LETTER A WITH MACRON
    0x2822=>0x00E1, # LATIN SMALL LETTER A WITH ACUTE
    0x2823=>0x01CE, # LATIN SMALL LETTER A WITH CARON
    0x2824=>0x00E0, # LATIN SMALL LETTER A WITH GRAVE
    0x2825=>0x0113, # LATIN SMALL LETTER E WITH MACRON
    0x2826=>0x00E9, # LATIN SMALL LETTER E WITH ACUTE
    0x2827=>0x011B, # LATIN SMALL LETTER E WITH CARON
    0x2828=>0x00E8, # LATIN SMALL LETTER E WITH GRAVE
    0x2829=>0x012B, # LATIN SMALL LETTER I WITH MACRON
    0x282A=>0x00ED, # LATIN SMALL LETTER I WITH ACUTE
    0x282B=>0x01D0, # LATIN SMALL LETTER I WITH CARON
    0x282C=>0x00EC, # LATIN SMALL LETTER I WITH GRAVE
    0x282D=>0x014D, # LATIN SMALL LETTER O WITH MACRON
    0x282E=>0x00F3, # LATIN SMALL LETTER O WITH ACUTE
    0x282F=>0x01D2, # LATIN SMALL LETTER O WITH CARON
    0x2830=>0x00F2, # LATIN SMALL LETTER O WITH GRAVE
    0x2831=>0x016B, # LATIN SMALL LETTER U WITH MACRON
    0x2832=>0x00FA, # LATIN SMALL LETTER U WITH ACUTE
    0x2833=>0x01D4, # LATIN SMALL LETTER U WITH CARON
    0x2834=>0x00F9, # LATIN SMALL LETTER U WITH GRAVE
    0x2835=>0x01D6, # LATIN SMALL LETTER U WITH DIAERESIS AND MACRON
    0x2836=>0x01D8, # LATIN SMALL LETTER U WITH DIAERESIS AND ACUTE
    0x2837=>0x01DA, # LATIN SMALL LETTER U WITH DIAERESIS AND CARON
    0x2838=>0x01DC, # LATIN SMALL LETTER U WITH DIAERESIS AND GRAVE
    0x2839=>0x00FC, # LATIN SMALL LETTER U WITH DIAERESIS
    0x283A=>0x00EA, # LATIN SMALL LETTER E WITH CIRCUMFLEX
    0x2845=>0x3105, # BOPOMOFO LETTER B
    0x2846=>0x3106, # BOPOMOFO LETTER P
    0x2847=>0x3107, # BOPOMOFO LETTER M
    0x2848=>0x3108, # BOPOMOFO LETTER F
    0x2849=>0x3109, # BOPOMOFO LETTER D
    0x284A=>0x310A, # BOPOMOFO LETTER T
    0x284B=>0x310B, # BOPOMOFO LETTER N
    0x284C=>0x310C, # BOPOMOFO LETTER L
    0x284D=>0x310D, # BOPOMOFO LETTER G
    0x284E=>0x310E, # BOPOMOFO LETTER K
    0x284F=>0x310F, # BOPOMOFO LETTER H
    0x2850=>0x3110, # BOPOMOFO LETTER J
    0x2851=>0x3111, # BOPOMOFO LETTER Q
    0x2852=>0x3112, # BOPOMOFO LETTER X
    0x2853=>0x3113, # BOPOMOFO LETTER ZH
    0x2854=>0x3114, # BOPOMOFO LETTER CH
    0x2855=>0x3115, # BOPOMOFO LETTER SH
    0x2856=>0x3116, # BOPOMOFO LETTER R
    0x2857=>0x3117, # BOPOMOFO LETTER Z
    0x2858=>0x3118, # BOPOMOFO LETTER C
    0x2859=>0x3119, # BOPOMOFO LETTER S
    0x285A=>0x311A, # BOPOMOFO LETTER A
    0x285B=>0x311B, # BOPOMOFO LETTER O
    0x285C=>0x311C, # BOPOMOFO LETTER E
    0x285D=>0x311D, # BOPOMOFO LETTER EH
    0x285E=>0x311E, # BOPOMOFO LETTER AI
    0x285F=>0x311F, # BOPOMOFO LETTER EI
    0x2860=>0x3120, # BOPOMOFO LETTER AU
    0x2861=>0x3121, # BOPOMOFO LETTER OU
    0x2862=>0x3122, # BOPOMOFO LETTER AN
    0x2863=>0x3123, # BOPOMOFO LETTER EN
    0x2864=>0x3124, # BOPOMOFO LETTER ANG
    0x2865=>0x3125, # BOPOMOFO LETTER ENG
    0x2866=>0x3126, # BOPOMOFO LETTER ER
    0x2867=>0x3127, # BOPOMOFO LETTER I
    0x2868=>0x3128, # BOPOMOFO LETTER U
    0x2869=>0x3129, # BOPOMOFO LETTER IU
    0x2924=>0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
    0x2925=>0x2501, # BOX DRAWINGS HEAVY HORIZONTAL
    0x2926=>0x2502, # BOX DRAWINGS LIGHT VERTICAL
    0x2927=>0x2503, # BOX DRAWINGS HEAVY VERTICAL
    0x2928=>0x2504, # BOX DRAWINGS LIGHT TRIPLE DASH HORIZONTAL
    0x2929=>0x2505, # BOX DRAWINGS HEAVY TRIPLE DASH HORIZONTAL
    0x292A=>0x2506, # BOX DRAWINGS LIGHT TRIPLE DASH VERTICAL
    0x292B=>0x2507, # BOX DRAWINGS HEAVY TRIPLE DASH VERTICAL
    0x292C=>0x2508, # BOX DRAWINGS LIGHT QUADRUPLE DASH HORIZONTAL
    0x292D=>0x2509, # BOX DRAWINGS HEAVY QUADRUPLE DASH HORIZONTAL
    0x292E=>0x250A, # BOX DRAWINGS LIGHT QUADRUPLE DASH VERTICAL
    0x292F=>0x250B, # BOX DRAWINGS HEAVY QUADRUPLE DASH VERTICAL
    0x2930=>0x250C, # BOX DRAWINGS LIGHT DOWN AND RIGHT
    0x2931=>0x250D, # BOX DRAWINGS DOWN LIGHT AND RIGHT HEAVY
    0x2932=>0x250E, # BOX DRAWINGS DOWN HEAVY AND RIGHT LIGHT
    0x2933=>0x250F, # BOX DRAWINGS HEAVY DOWN AND RIGHT
    0x2934=>0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
    0x2935=>0x2511, # BOX DRAWINGS DOWN LIGHT AND LEFT HEAVY
    0x2936=>0x2512, # BOX DRAWINGS DOWN HEAVY AND LEFT LIGHT
    0x2937=>0x2513, # BOX DRAWINGS HEAVY DOWN AND LEFT
    0x2938=>0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
    0x2939=>0x2515, # BOX DRAWINGS UP LIGHT AND RIGHT HEAVY
    0x293A=>0x2516, # BOX DRAWINGS UP HEAVY AND RIGHT LIGHT
    0x293B=>0x2517, # BOX DRAWINGS HEAVY UP AND RIGHT
    0x293C=>0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
    0x293D=>0x2519, # BOX DRAWINGS UP LIGHT AND LEFT HEAVY
    0x293E=>0x251A, # BOX DRAWINGS UP HEAVY AND LEFT LIGHT
    0x293F=>0x251B, # BOX DRAWINGS HEAVY UP AND LEFT
    0x2940=>0x251C, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
    0x2941=>0x251D, # BOX DRAWINGS VERTICAL LIGHT AND RIGHT HEAVY
    0x2942=>0x251E, # BOX DRAWINGS UP HEAVY AND RIGHT DOWN LIGHT
    0x2943=>0x251F, # BOX DRAWINGS DOWN HEAVY AND RIGHT UP LIGHT
    0x2944=>0x2520, # BOX DRAWINGS VERTICAL HEAVY AND RIGHT LIGHT
    0x2945=>0x2521, # BOX DRAWINGS DOWN LIGHT AND RIGHT UP HEAVY
    0x2946=>0x2522, # BOX DRAWINGS UP LIGHT AND RIGHT DOWN HEAVY
    0x2947=>0x2523, # BOX DRAWINGS HEAVY VERTICAL AND RIGHT
    0x2948=>0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
    0x2949=>0x2525, # BOX DRAWINGS VERTICAL LIGHT AND LEFT HEAVY
    0x294A=>0x2526, # BOX DRAWINGS UP HEAVY AND LEFT DOWN LIGHT
    0x294B=>0x2527, # BOX DRAWINGS DOWN HEAVY AND LEFT UP LIGHT
    0x294C=>0x2528, # BOX DRAWINGS VERTICAL HEAVY AND LEFT LIGHT
    0x294D=>0x2529, # BOX DRAWINGS DOWN LIGHT AND LEFT UP HEAVY
    0x294E=>0x252A, # BOX DRAWINGS UP LIGHT AND LEFT DOWN HEAVY
    0x294F=>0x252B, # BOX DRAWINGS HEAVY VERTICAL AND LEFT
    0x2950=>0x252C, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
    0x2951=>0x252D, # BOX DRAWINGS LEFT HEAVY AND RIGHT DOWN LIGHT
    0x2952=>0x252E, # BOX DRAWINGS RIGHT HEAVY AND LEFT DOWN LIGHT
    0x2953=>0x252F, # BOX DRAWINGS DOWN LIGHT AND HORIZONTAL HEAVY
    0x2954=>0x2530, # BOX DRAWINGS DOWN HEAVY AND HORIZONTAL LIGHT
    0x2955=>0x2531, # BOX DRAWINGS RIGHT LIGHT AND LEFT DOWN HEAVY
    0x2956=>0x2532, # BOX DRAWINGS LEFT LIGHT AND RIGHT DOWN HEAVY
    0x2957=>0x2533, # BOX DRAWINGS HEAVY DOWN AND HORIZONTAL
    0x2958=>0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
    0x2959=>0x2535, # BOX DRAWINGS LEFT HEAVY AND RIGHT UP LIGHT
    0x295A=>0x2536, # BOX DRAWINGS RIGHT HEAVY AND LEFT UP LIGHT
    0x295B=>0x2537, # BOX DRAWINGS UP LIGHT AND HORIZONTAL HEAVY
    0x295C=>0x2538, # BOX DRAWINGS UP HEAVY AND HORIZONTAL LIGHT
    0x295D=>0x2539, # BOX DRAWINGS RIGHT LIGHT AND LEFT UP HEAVY
    0x295E=>0x253A, # BOX DRAWINGS LEFT LIGHT AND RIGHT UP HEAVY
    0x295F=>0x253B, # BOX DRAWINGS HEAVY UP AND HORIZONTAL
    0x2960=>0x253C, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
    0x2961=>0x253D, # BOX DRAWINGS LEFT HEAVY AND RIGHT VERTICAL LIGHT
    0x2962=>0x253E, # BOX DRAWINGS RIGHT HEAVY AND LEFT VERTICAL LIGHT
    0x2963=>0x253F, # BOX DRAWINGS VERTICAL LIGHT AND HORIZONTAL HEAVY
    0x2964=>0x2540, # BOX DRAWINGS UP HEAVY AND DOWN HORIZONTAL LIGHT
    0x2965=>0x2541, # BOX DRAWINGS DOWN HEAVY AND UP HORIZONTAL LIGHT
    0x2966=>0x2542, # BOX DRAWINGS VERTICAL HEAVY AND HORIZONTAL LIGHT
    0x2967=>0x2543, # BOX DRAWINGS LEFT UP HEAVY AND RIGHT DOWN LIGHT
    0x2968=>0x2544, # BOX DRAWINGS RIGHT UP HEAVY AND LEFT DOWN LIGHT
    0x2969=>0x2545, # BOX DRAWINGS LEFT DOWN HEAVY AND RIGHT UP LIGHT
    0x296A=>0x2546, # BOX DRAWINGS RIGHT DOWN HEAVY AND LEFT UP LIGHT
    0x296B=>0x2547, # BOX DRAWINGS DOWN LIGHT AND UP HORIZONTAL HEAVY
    0x296C=>0x2548, # BOX DRAWINGS UP LIGHT AND DOWN HORIZONTAL HEAVY
    0x296D=>0x2549, # BOX DRAWINGS RIGHT LIGHT AND LEFT VERTICAL HEAVY
    0x296E=>0x254A, # BOX DRAWINGS LEFT LIGHT AND RIGHT VERTICAL HEAVY
    0x296F=>0x254B, # BOX DRAWINGS HEAVY VERTICAL AND HORIZONTAL
    0x3021=>0x554A, # <CJK>
    0x3022=>0x963F, # <CJK>
    0x3023=>0x57C3, # <CJK>
    0x3024=>0x6328, # <CJK>
    0x3025=>0x54CE, # <CJK>
    0x3026=>0x5509, # <CJK>
    0x3027=>0x54C0, # <CJK>
    0x3028=>0x7691, # <CJK>
    0x3029=>0x764C, # <CJK>
    0x302A=>0x853C, # <CJK>
    0x302B=>0x77EE, # <CJK>
    0x302C=>0x827E, # <CJK>
    0x302D=>0x788D, # <CJK>
    0x302E=>0x7231, # <CJK>
    0x302F=>0x9698, # <CJK>
    0x3030=>0x978D, # <CJK>
    0x3031=>0x6C28, # <CJK>
    0x3032=>0x5B89, # <CJK>
    0x3033=>0x4FFA, # <CJK>
    0x3034=>0x6309, # <CJK>
    0x3035=>0x6697, # <CJK>
    0x3036=>0x5CB8, # <CJK>
    0x3037=>0x80FA, # <CJK>
    0x3038=>0x6848, # <CJK>
    0x3039=>0x80AE, # <CJK>
    0x303A=>0x6602, # <CJK>
    0x303B=>0x76CE, # <CJK>
    0x303C=>0x51F9, # <CJK>
    0x303D=>0x6556, # <CJK>
    0x303E=>0x71AC, # <CJK>
    0x303F=>0x7FF1, # <CJK>
    0x3040=>0x8884, # <CJK>
    0x3041=>0x50B2, # <CJK>
    0x3042=>0x5965, # <CJK>
    0x3043=>0x61CA, # <CJK>
    0x3044=>0x6FB3, # <CJK>
    0x3045=>0x82AD, # <CJK>
    0x3046=>0x634C, # <CJK>
    0x3047=>0x6252, # <CJK>
    0x3048=>0x53ED, # <CJK>
    0x3049=>0x5427, # <CJK>
    0x304A=>0x7B06, # <CJK>
    0x304B=>0x516B, # <CJK>
    0x304C=>0x75A4, # <CJK>
    0x304D=>0x5DF4, # <CJK>
    0x304E=>0x62D4, # <CJK>
    0x304F=>0x8DCB, # <CJK>
    0x3050=>0x9776, # <CJK>
    0x3051=>0x628A, # <CJK>
    0x3052=>0x8019, # <CJK>
    0x3053=>0x575D, # <CJK>
    0x3054=>0x9738, # <CJK>
    0x3055=>0x7F62, # <CJK>
    0x3056=>0x7238, # <CJK>
    0x3057=>0x767D, # <CJK>
    0x3058=>0x67CF, # <CJK>
    0x3059=>0x767E, # <CJK>
    0x305A=>0x6446, # <CJK>
    0x305B=>0x4F70, # <CJK>
    0x305C=>0x8D25, # <CJK>
    0x305D=>0x62DC, # <CJK>
    0x305E=>0x7A17, # <CJK>
    0x305F=>0x6591, # <CJK>
    0x3060=>0x73ED, # <CJK>
    0x3061=>0x642C, # <CJK>
    0x3062=>0x6273, # <CJK>
    0x3063=>0x822C, # <CJK>
    0x3064=>0x9881, # <CJK>
    0x3065=>0x677F, # <CJK>
    0x3066=>0x7248, # <CJK>
    0x3067=>0x626E, # <CJK>
    0x3068=>0x62CC, # <CJK>
    0x3069=>0x4F34, # <CJK>
    0x306A=>0x74E3, # <CJK>
    0x306B=>0x534A, # <CJK>
    0x306C=>0x529E, # <CJK>
    0x306D=>0x7ECA, # <CJK>
    0x306E=>0x90A6, # <CJK>
    0x306F=>0x5E2E, # <CJK>
    0x3070=>0x6886, # <CJK>
    0x3071=>0x699C, # <CJK>
    0x3072=>0x8180, # <CJK>
    0x3073=>0x7ED1, # <CJK>
    0x3074=>0x68D2, # <CJK>
    0x3075=>0x78C5, # <CJK>
    0x3076=>0x868C, # <CJK>
    0x3077=>0x9551, # <CJK>
    0x3078=>0x508D, # <CJK>
    0x3079=>0x8C24, # <CJK>
    0x307A=>0x82DE, # <CJK>
    0x307B=>0x80DE, # <CJK>
    0x307C=>0x5305, # <CJK>
    0x307D=>0x8912, # <CJK>
    0x307E=>0x5265, # <CJK>
    0x3121=>0x8584, # <CJK>
    0x3122=>0x96F9, # <CJK>
    0x3123=>0x4FDD, # <CJK>
    0x3124=>0x5821, # <CJK>
    0x3125=>0x9971, # <CJK>
    0x3126=>0x5B9D, # <CJK>
    0x3127=>0x62B1, # <CJK>
    0x3128=>0x62A5, # <CJK>
    0x3129=>0x66B4, # <CJK>
    0x312A=>0x8C79, # <CJK>
    0x312B=>0x9C8D, # <CJK>
    0x312C=>0x7206, # <CJK>
    0x312D=>0x676F, # <CJK>
    0x312E=>0x7891, # <CJK>
    0x312F=>0x60B2, # <CJK>
    0x3130=>0x5351, # <CJK>
    0x3131=>0x5317, # <CJK>
    0x3132=>0x8F88, # <CJK>
    0x3133=>0x80CC, # <CJK>
    0x3134=>0x8D1D, # <CJK>
    0x3135=>0x94A1, # <CJK>
    0x3136=>0x500D, # <CJK>
    0x3137=>0x72C8, # <CJK>
    0x3138=>0x5907, # <CJK>
    0x3139=>0x60EB, # <CJK>
    0x313A=>0x7119, # <CJK>
    0x313B=>0x88AB, # <CJK>
    0x313C=>0x5954, # <CJK>
    0x313D=>0x82EF, # <CJK>
    0x313E=>0x672C, # <CJK>
    0x313F=>0x7B28, # <CJK>
    0x3140=>0x5D29, # <CJK>
    0x3141=>0x7EF7, # <CJK>
    0x3142=>0x752D, # <CJK>
    0x3143=>0x6CF5, # <CJK>
    0x3144=>0x8E66, # <CJK>
    0x3145=>0x8FF8, # <CJK>
    0x3146=>0x903C, # <CJK>
    0x3147=>0x9F3B, # <CJK>
    0x3148=>0x6BD4, # <CJK>
    0x3149=>0x9119, # <CJK>
    0x314A=>0x7B14, # <CJK>
    0x314B=>0x5F7C, # <CJK>
    0x314C=>0x78A7, # <CJK>
    0x314D=>0x84D6, # <CJK>
    0x314E=>0x853D, # <CJK>
    0x314F=>0x6BD5, # <CJK>
    0x3150=>0x6BD9, # <CJK>
    0x3151=>0x6BD6, # <CJK>
    0x3152=>0x5E01, # <CJK>
    0x3153=>0x5E87, # <CJK>
    0x3154=>0x75F9, # <CJK>
    0x3155=>0x95ED, # <CJK>
    0x3156=>0x655D, # <CJK>
    0x3157=>0x5F0A, # <CJK>
    0x3158=>0x5FC5, # <CJK>
    0x3159=>0x8F9F, # <CJK>
    0x315A=>0x58C1, # <CJK>
    0x315B=>0x81C2, # <CJK>
    0x315C=>0x907F, # <CJK>
    0x315D=>0x965B, # <CJK>
    0x315E=>0x97AD, # <CJK>
    0x315F=>0x8FB9, # <CJK>
    0x3160=>0x7F16, # <CJK>
    0x3161=>0x8D2C, # <CJK>
    0x3162=>0x6241, # <CJK>
    0x3163=>0x4FBF, # <CJK>
    0x3164=>0x53D8, # <CJK>
    0x3165=>0x535E, # <CJK>
    0x3166=>0x8FA8, # <CJK>
    0x3167=>0x8FA9, # <CJK>
    0x3168=>0x8FAB, # <CJK>
    0x3169=>0x904D, # <CJK>
    0x316A=>0x6807, # <CJK>
    // etc etc jusqu'à la fin
    J'ai trouvé le fichier sur un autre site car le lien que proposait le programmeur était corrompu. Le fichier gb2312.txt semble-t-il correct?

    Si ces méthodes ne sont pas les bonnes, quelqu'un aurait-il une idée quant à la résolution de mon problème ?

    Cordialement,

    Christophe

  2. #2
    Modérateur
    Avatar de sabotage
    Homme Profil pro
    Inscrit en
    Juillet 2005
    Messages
    29 208
    Détails du profil
    Informations personnelles :
    Sexe : Homme

    Informations forums :
    Inscription : Juillet 2005
    Messages : 29 208
    Par défaut
    Le developpement Unicode devait concerner PHP6.
    Mais que cherches-tu a faire avec tes fonctions ?
    N'oubliez pas de consulter les FAQ PHP et les cours et tutoriels PHP

  3. #3
    Membre averti
    Profil pro
    Inscrit en
    Décembre 2009
    Messages
    27
    Détails du profil
    Informations personnelles :
    Localisation : France

    Informations forums :
    Inscription : Décembre 2009
    Messages : 27
    Par défaut
    En fait, mon site est un forum avec des utilisateurs. Dans ma base de données j'ai donc une table pour les utilisateurs. Leur pseudo peut être écris en chinois, et les messages dans le forum aussi.

    Mon problème se pose lorsqu'un utilisateur va se logger ou pour toute requête dans laquelle je passe une variable dont le contenu est du chinois.

    Lors de la création de l'utilisateur, son nom chinois se stocke dans la base de données en caractères unicodes (& # 23015; pour un caractère par exemple). Mais lorsque l'utilisateur tente de se logger, la variable post contient les caractères chinois (en ayant poussé mes recherches il me semble que dans la variable POST le caractère chinois se présente sous la forme : %E7%90%32 (par exemple)). Dans une requête vers la base de données, ce sont ces caractères qui sont envoyés donc étants différent de ce qui est stocké en base, le log ne fonctionne pas.

    Je cherche donc à convertir ses caractères dans le format stocké en base qui à priori serait de l'unicode.

    Christophe

  4. #4
    Membre averti
    Profil pro
    Inscrit en
    Décembre 2009
    Messages
    27
    Détails du profil
    Informations personnelles :
    Localisation : France

    Informations forums :
    Inscription : Décembre 2009
    Messages : 27
    Par défaut
    Je viens de me rendre compte qu'il ne s'agît pas du gb2312 (du moins il me semble), mais que les caractères chinois que j'ai sont de l'UTF-8. Donc le problème change complètement il me semble. Je suis un peu perdu avec l'encodage des caractères

  5. #5
    Membre Expert
    Avatar de Thes32
    Homme Profil pro
    Développeur PHP, .Net, T-SQL
    Inscrit en
    Décembre 2006
    Messages
    2 379
    Détails du profil
    Informations personnelles :
    Sexe : Homme

    Informations professionnelles :
    Activité : Développeur PHP, .Net, T-SQL

    Informations forums :
    Inscription : Décembre 2006
    Messages : 2 379
    Par défaut
    Salut,

    as tu essayé d'utiliser les fonctions mb_* et iconv ?

  6. #6
    Modérateur
    Avatar de sabotage
    Homme Profil pro
    Inscrit en
    Juillet 2005
    Messages
    29 208
    Détails du profil
    Informations personnelles :
    Sexe : Homme

    Informations forums :
    Inscription : Juillet 2005
    Messages : 29 208
    Par défaut
    Tu peux tout faire en UTF8.
    N'oubliez pas de consulter les FAQ PHP et les cours et tutoriels PHP

+ Répondre à la discussion
Cette discussion est résolue.

Discussions similaires

  1. Comment insérer de l'unicode dans un Richedit ?
    Par DanaKil dans le forum C++Builder
    Réponses: 6
    Dernier message: 30/03/2004, 01h43
  2. Utilisation de l'unicode dans un algo de cryptage
    Par Zazeglu dans le forum Algorithmes et structures de données
    Réponses: 2
    Dernier message: 28/10/2003, 15h38
  3. [Unicode] Internationalisation d'une application
    Par Thierry Laborde dans le forum Langage
    Réponses: 4
    Dernier message: 21/10/2003, 21h15
  4. conversion Unicode -> ASCII
    Par juzam dans le forum C
    Réponses: 8
    Dernier message: 24/07/2003, 11h07
  5. [debutant] unicode
    Par dadou91 dans le forum XML/XSL et SOAP
    Réponses: 7
    Dernier message: 23/05/2003, 11h12

Partager

Partager
  • Envoyer la discussion sur Viadeo
  • Envoyer la discussion sur Twitter
  • Envoyer la discussion sur Google
  • Envoyer la discussion sur Facebook
  • Envoyer la discussion sur Digg
  • Envoyer la discussion sur Delicious
  • Envoyer la discussion sur MySpace
  • Envoyer la discussion sur Yahoo