Hướng dẫn iconv in php

[PHP 4 >= 4.0.5, PHP 5, PHP 7, PHP 8]

iconvConvert a string from one character encoding to another

Description

iconv[string $from_encoding, string $to_encoding, string $string]: string|false

Parameters

from_encoding

The current encoding used to interpret string.

to_encoding

The desired encoding of the result.

If the string //TRANSLIT is appended to to_encoding, then transliteration is activated. This means that when a character can't be represented in the target charset, it may be approximated through one or several similarly looking characters. If the string //IGNORE is appended, characters that cannot be represented in the target charset are silently discarded. Otherwise, E_NOTICE is generated and the function will return false.

Caution

If and how //TRANSLIT works exactly depends on the system's iconv[] implementation [cf. ICONV_IMPL]. Some implementations are known to ignore //TRANSLIT, so the conversion is likely to fail for characters which are illegal for the to_encoding.

string

The string to be converted.

Return Values

Returns the converted string, or false on failure.

Examples

Example #1 iconv[] example

The above example will output something similar to:

Original : This is the Euro symbol '€'.
TRANSLIT : This is the Euro symbol 'EUR'.
IGNORE   : This is the Euro symbol ''.
Plain    :
Notice: iconv[]: Detected an illegal character in input string in .\iconv-example.php on line 7

Notes

Note:

The character encodings and options available depend on the installed implementation of iconv. If the argument to from_encoding or to_encoding is not supported on the current system, false will be returned.

See Also

  • mb_convert_encoding[] - Convert a string from one character encoding to another
  • UConverter::transcode[] - Convert a string from one character encoding to another

orrd101 at gmail dot com

10 years ago

The "//ignore" option doesn't work with recent versions of the iconv library.  So if you're having trouble with that option, you aren't alone. 

That means you can't currently use this function to filter invalid characters.  Instead it silently fails and returns an empty string [or you'll get a notice but only if you have E_NOTICE enabled].

This has been a known bug with a known solution for at least since 2009 years but no one seems to be willing to fix it [PHP must pass the -c option to iconv].  It's still broken as of the latest release 5.4.3.

//bugs.php.net/bug.php?id=48147
//bugs.php.net/bug.php?id=52211
//bugs.php.net/bug.php?id=61484

[UPDATE 15-JUN-2012]
Here's a workaround...

  ini_set['mbstring.substitute_character', "none"];
  $text= mb_convert_encoding[$text, 'UTF-8', 'UTF-8'];

That will strip invalid characters from UTF-8 strings [so that you can insert it into a database, etc.].  Instead of "none" you can also use the value 32 if you want it to insert spaces in place of the invalid characters.

Ritchie

15 years ago

Please note that iconv['UTF-8', 'ASCII//TRANSLIT', ...] doesn't work properly when locale category LC_CTYPE is set to C or POSIX. You must choose another locale otherwise all non-ASCII characters will be replaced with question marks. This is at least true with glibc 2.5.

Example:

daniel dot rhodes at warpasylum dot co dot uk

11 years ago

Interestingly, setting different target locales results in different, yet appropriate, transliterations. For example:

annuaireehtp at gmail dot com

12 years ago

to test different combinations of convertions between charsets [when we don't know the source charset and what is the convenient destination charset] this is an example :



then after displaying, you use the $i$j that shows good displaying.
NB: you can add other charsets to $tab  to test other cases.

Daniel Klein

2 years ago

If you want to convert to a Unicode encoding without the byte order mark [BOM], add the endianness to the encoding, e.g. instead of "UTF-16" which will add a BOM to the start of the string, use "UTF-16BE" which will convert the string without adding a BOM.

i.e.



workaround suggested here and elsewhere will also break when encountering illegal characters, at least dropping a useful note ["htmlentities[]: Invalid multibyte sequence in argument in..."]

I have found a lot of hints, suggestions and alternative methods [it's scary and in my opinion no good sign how many ways PHP natively provides to convert the encoding of strings], but none of them really worked, except for this one:

zhawari at hotmail dot com

17 years ago

Here is how to convert UCS-2 numbers to UTF-8 numbers in hex:



Input:
06450631062D
Output:
D985D8B1D8AD

regards,
Ziyad

Leigh Morresi

14 years ago

If you are getting question-marks in your iconv output when transliterating, be sure to 'setlocale' to something your system supports.

Some PHP CMS's will default setlocale to 'C', this can be a problem.

use the "locale" command to find out a list..

$ locale -a
C
en_AU.utf8
POSIX

Chủ Đề