Hướng dẫn php utf8_decode question mark

I have this simple PHP-script, which searches a mySQL database and outputs the result to the user. I used to use ISO-8859-1 as my charset, but was advised to use UTF-8. But I have trouble going from my former charset to the new one.

To clarify some things, I have:

Created a database and table encoded in UTF-8 with collation utf8_unicode_ci.
Encoded my PHP-file in UTF-8.
Set meta charset to UTF-8.
Set all text mime-types to UTF-8 through create-mime.assign.pl in Lighty [Lighttpd].

Now, the problem arises when I retrieve data from the database with characters like ö, ü etc. If I just do echo "ö"; without retrieving it from the database, it works fine. I guess there must be something wrong with the database then?

I've tried the following, and they've solved my problem:

Set meta charset to ISO-8859-1 [which, for some strange reason works, but breaks the echo'd "ö"].
Set a utf8_decode[] function around the output.
After mysql_select_db[] declared the following mysql_set_charset['utf8'];.

I know that I've found multiple solutions, but I just don't know why it wont work without them? And is it bad practice to use utf8_decode[] on output, or the mysql_set_charset[] function?

[PHP 4, PHP 5, PHP 7, PHP 8]

utf8_decode — Converts a string from UTF-8 to ISO-8859-1, replacing invalid or unrepresentable characters

Warning

This function has been DEPRECATED as of PHP 8.2.0. Relying on this function is highly discouraged.

Description

utf8_decode[string $string]: string

Note:
Many web pages marked as using the ISO-8859-1 character encoding actually use the similar Windows-1252 encoding, and web browsers will interpret ISO-8859-1 web pages as Windows-1252. Windows-1252 features additional printable characters, such as the Euro sign [€] and curly quotes [“ ”], instead of certain ISO-8859-1 control characters. This function will not convert such Windows-1252 characters correctly. Use a different function if Windows-1252 conversion is required.

Parameters

string

A UTF-8 encoded string.

Return Values

Returns the ISO-8859-1 translation of string.

Changelog

VersionDescription

8.2.0	This function has been deprecated.
7.2.0	This function has been moved from the XML extension to the core of PHP. In previous versions, it was only available if the XML extension was installed.

Examples

Example #1 Basic examples

. The iconv[] C library fails if it's told a string is UTF-8 and it isn't; the PHP one doesn't, it just returns the conversion up to the point of failure, so you have to compare the result to the input to find out if the conversion succeeded.

deceze at gmail dot com ¶

11 years ago

Please note that utf8_decode simply converts a string encoded in UTF-8 to ISO-8859-1. A more appropriate name for it would be utf8_to_iso88591. If your text is already encoded in ISO-8859-1, you do not need this function. If you don't want to use ISO-8859-1, you do not need this function.

Note that UTF-8 can represent many more characters than ISO-8859-1. Trying to convert a UTF-8 string that contains characters that can't be represented in ISO-8859-1 to ISO-8859-1 will garble your text and/or cause characters to go missing. Trying to convert text that is not encoded in UTF-8 using this function will most likely garble the text.

If you need to convert any text from any encoding to any other encoding, look at iconv[] instead.

info at vanylla dot it ¶

13 years ago

IMPORTANT: when converting UTF8 data that contains the EURO sign DON'T USE utf_decode function.

utf_decode converts the data into ISO-8859-1 charset. But ISO-8859-1 charset does not contain the EURO sign, therefor the EURO sign will be converted into a question mark character '?'

In order to convert properly UTF8 data with EURO sign you must use:

iconv["UTF-8", "CP1252", $data]

gabriel arobase gabsoftware dot com ¶

11 years ago

If you want to retrieve some UTF-8 data from your database, you don't need utf8_decode[].

Simply do the following query before any SELECT :

$result = mysql_query["SET NAMES utf8"];

christoffer ¶

9 years ago

The preferred way to use this on an array would be with the built in PHP function "array_map[]", as for example: $array = array_map["utf8_decode", $array];

lukasz dot mlodzik at gmail dot com ¶

14 years ago

Update to MARC13 function utf2iso[] I'm using it to handle AJAX POST calls. Despite using http.setRequestHeader['Content-Type', 'application/x-www-form-urlencoded'; charset='utf-8']; it still code Polish letters using UTF-16

This is only for Polish letters:

Everything goes smooth, but it doesn't change '%u00D3','Ó' and '%u00F3','ó'. I dont have idea what to do with that.

Remember! File must be saved in UTF-8 coding.

Aleksandr ¶

5 years ago

In addition to note by yannikh at gmeil dot com, another way to decode strings with non-latin chars from unix console like

C=RU, L=\xD0\x9C\xD0\xBE\xD1\x81\xD0\xBA\xD0\xB2\xD0\xB0,

The code above will output: C=RU, L=Москва,

sashott at gmail dot com ¶

7 years ago

Use of utf8_decode was not enough for me by get page content from another site. Problem appear by different alphabet from standard latin. As example some chars [corresponding to HTML codes „ , and others] are converted to "?" or "xA0" [hex value]. You need to make some conversion before execute utf8_decode. And you can not replace simple, that they can be part of 2 bytes code for a char [UTF-8 use 2 bytes]. Next is for cyrillic alphabet, but for other must be very close.

function convertMethod[$text]{ //Problem is that utf8_decode convert HTML chars for „ and other to ? or to \xA0. And you can not replace, that they are in some char bytes and you broke cyrillic [or other alphabet] chars. $problem_enc=array[ 'euro', 'sbquo', 'bdquo', 'hellip', 'dagger', 'Dagger', 'permil', 'lsaquo', 'lsquo', 'rsquo', 'ldquo', 'rdquo', 'bull', 'ndash', 'mdash', 'trade', 'rsquo', 'brvbar', 'copy', 'laquo', 'reg', 'plusmn', 'micro', 'para', 'middot', 'raquo', 'nbsp' ]; $text=mb_convert_encoding[$text,'HTML-ENTITIES','UTF-8']; $text=preg_replace['#[?


 kode68 ¶
6 years ago

Update Answer from okx dot oliver dot koenig at gmail dot com for PHP 5.6 since e/ modifier is depreciated
// This finally helped me to do the job, thanks to Blackbit, had to modify deprecated ereg:
// original comment: "Squirrelmail contains a nice function in the sources to convert unicode to entities:"
function charset_decode_utf_8[$string]
    {
        /* Only do the slow convert if there are 8-bit characters */
        if [ !preg_match["/[\200-\237]/", $string] && !preg_match["/[\241-\377]/", $string] ]
               return $string;
        // decode three byte unicode characters
          $string = preg_replace_callback["/[[\340-\357]][[\200-\277]][[\200-\277]]/",
                    create_function ['$matches', 'return \'&#\'.[[ord[$matches[1]]-224]*4096+[ord[$matches[2]]-128]*64+[ord[$matches[3]]-128]].\';\';'],
                    $string];
        // decode two byte unicode characters
          $string = preg_replace_callback["/[[\300-\337]][[\200-\277]]/",
                    create_function ['$matches', 'return \'&#\'.[[ord[$matches[1]]-192]*64+[ord[$matches[2]]-128]].\';\';'],
                    $string];

  
        return $string;
    }
Enjoy

 visus at portsonline dot net ¶
15 years ago

Following code helped me with mixed [UTF8+ISO-8859-1[x]] encodings. In this case, I have template files made and maintained by designers who do not care about encoding and MySQL data in utf8_binary_ci encoded tables.

 php-net at ---NOSPAM---lc dot yi dot org ¶
16 years ago

I've just created this code snippet to improve the user-customizable emails sent by one of my websites.
The goal was to use UTF-8 [Unicode] so that non-english users have all the Unicode benefits, BUT also make life seamless for English [or specifically, English MS-Outlook users].  The niggle: Outlook prior to 2003 [?]  does not properly detect unicode emails.  When "smart quotes" from MS Word were pasted into a rich text area and saved in Unicode, then sent by email to an Outlook user, more often than not, these characters were wrongly rendered as "greek". 
So, the following code snippet replaces a few strategic characters into html entities which Outlook XP [and possibly earlier] will render as expected.  [Code based on bits of code from previous posts on this and the htmlenties page]

 rasmus at flajm dot
se ¶
17 years ago

If you don't have the multibyte extension installed, here's a function to decode UTF-16 encoded strings. It support both BOM-less and BOM'ed strings, [big- and little-endian byte order.]

 thierry.bo # netcourrier point com ¶
17 years
ago

In response to fhoech [22-Sep-2005 11:55], I just tried a simultaneous test with the file UTF-8-test.txt using your regexp, 'j dot dittmer' [20-Sep-2005 06:30] regexp [message #56962], `php-note-2005` [17-Feb-2005 08:57] regexp in his message on `mb-detect-encoding` page [//us3.php.net/manual/en/function.mb-detect-encoding.php#50087] who is using a regexp from the W3C [//w3.org/International/questions/qa-forms-utf-8.html], and PHP mb_detect_encoding function.
Here are a summarize of the results :
201 lines are valid UTF8 strings using phpnote regexp
203 lines are valid UTF8 strings using j.dittmer regexp
200 lines are valid UTF8 strings using fhoech regexp
239 lines are valid  UTF8 strings using using mb_detect_encoding
Here are the lines with differences [left to right, phpnote, j.dittmer and fhoech] :
Line #70 : NOT UTF8|IS UTF8!|IS UTF8! :2.1.1 1 byte [U-00000000]: "" 
Line #79 : NOT UTF8|IS UTF8!|IS UTF8! :2.2.1 1 byte [U-0000007F]: "" 
Line #81 : IS UTF8!|IS UTF8!|NOT UTF8 :2.2.3 3 bytes [U-0000FFFF]: "" | 
Line #267 : IS UTF8!|IS UTF8!|NOT UTF8 :5.3.1 U+FFFE = ef bf be = "" |
Line #268 : IS UTF8!|IS UTF8!|NOT UTF8 :5.3.2 U+FFFF = ef bf bf = "" | 
Interesting is that you said that your regexp corrected j.dittmer regexp that failed on 5.3 section, but it my test I have the opposite result ?!
I ran this test on windows XP with PHP 4.3.11dev. Maybe these differences come from operating system, or PHP version. 
For mb_detect_encoding I used the command :
mb_detect_encoding[$line, 'UTF-8, ISO-8859-1, ASCII'];

 punchivan at gmail dot com ¶
14 years ago

  

EY! the bug is not in the function 'utf8_decode'. The bug is in the function 'mb_detect_encoding'. If you put a word with a special char at the end like this 'accentué', that will lead to a wrong result [UTF-8] but if you put another char at the end like this 'accentuée' you will get it right. So you should always add a ISO-8859-1 character to your string for this check. My advise is to use a blank space.
I´ve tried it and it works! 
function ISO_convert[$array]
{
    $array_temp = array[];
         foreach[$array as $name => $value]
    {
        if[is_array[$value]]
          $array_temp[[mb_detect_encoding[$name." ",'UTF-8,ISO-8859-1'] == 'UTF-8' ? utf8_decode[$name] : $name ]] = ISO_convert[$value];
        else
          $array_temp[[mb_detect_encoding[$name." ",'UTF-8,ISO-8859-1'] == 'UTF-8' ? utf8_decode[$name] : $name ]] = [mb_detect_encoding[$value." ",'UTF-8,ISO-8859-1'] == 'UTF-8' ? utf8_decode[$value] : $value ];
    }
    return $array_temp; 
}


luka8088 at gmail dot com ¶
15 years ago

simple UTF-8 to HTML conversion:
function utf8_to_html [$data]
    {
    return preg_replace["/[[\\xC0-\\xF7]{1,1}[\\x80-\\xBF]+]/e", '_utf8_to_html["\\1"]', $data];
    }
function _utf8_to_html [$data]
    {
    $ret = 0;
    foreach[[str_split[strrev[chr[[ord[$data{0}] % 252 % 248 % 240 % 224 % 192] + 128] . substr[$data, 1]]]] as $k => $v]
        $ret += [ord[$v] % 128] * pow[64, $k];
    return "&#$ret;";
    }
Example:
echo utf8_to_html["a b č ć ž こ に ち わ [][]{}!#$?*"];
Output:
a b č ć ž こ に ち わ [][]{}!#$?*

 j dot dittmer at portrix dot net ¶
17 years ago

The regex in the last comment has some typos. This is a
syntactically valid one, don't know if it's correct though.
You've to concat the expression in one long line.
^[
[\x00-\x7f]|
[\xc2-\xdf][\x80-\xbf]|
[\xe0][\xa0-\xbf][\x80-\xbf]|
[\xe1-\xec][\x80-\xbf]{2}|
[\xed][\x80-\x9f][\x80-\xbf]|
[\xee-\xef][\x80-\xbf]{2}|
[\xf0][\x90-\xbf][\x80-\xbf]{2}|
[\xf1-\xf3][\x80-\xbf]{3}|
[\xf4][\x80-\x8f][\x80-\xbf]{2}
]*$

 haugas at gmail dot com ¶
14 years ago

If you don't know exactly, how many times your string is encoded, you can use this function:

 Ajgor ¶
15 years ago

small upgrade for polish decoding:
function utf82iso88592[$text] {
 $text = str_replace["\xC4\x85", 'ą', $text];
 $text = str_replace["\xC4\x84", 'Ą', $text];
 $text = str_replace["\xC4\x87", 'ć', $text];
 $text = str_replace["\xC4\x86", 'Ć', $text];
 $text = str_replace["\xC4\x99", 'ę', $text];
 $text = str_replace["\xC4\x98", 'Ę', $text];
 $text = str_replace["\xC5\x82", 'ł', $text];
 $text = str_replace["\xC5\x81", 'Ł', $text];
 $text = str_replace["\xC3\xB3", 'ó', $text];
 $text = str_replace["\xC3\x93", 'Ó', $text];
 $text = str_replace["\xC5\x9B", 'ś', $text];
 $text = str_replace["\xC5\x9A", 'Ś', $text];
 $text = str_replace["\xC5\xBC", 'ż', $text];
 $text = str_replace["\xC5\xBB", 'Ż', $text];
 $text = str_replace["\xC5\xBA", 'ż', $text];
 $text = str_replace["\xC5\xB9", 'Ż', $text];
 $text = str_replace["\xc5\x84", 'ń', $text];
 $text = str_replace["\xc5\x83", 'Ń', $text];
return $text;
} // utf82iso88592

 yannikh at gmeil dot
com ¶
16 years ago

I had to tackle a very interesting problem:
I wanted to replace all \xXX in a text by it's letters. Unfortunatelly XX were ASCII and not utf8. I solved my problem that way:


				
					

                 
	Bài Viết Liên Quan
	
	 	
		
		   
		   
		   
		
		
			Hướng dẫn dùng female validation trong PHP

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn javascript print variable

		
	

		
		
		   
		   
		   
		
		
			How do you break a line twice in html?

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng commitit trong PHP

		
	

		
		
		   
		   
		   
		
		
			Is python easier than r?

		
	

		
		
		   
		   
		   
		
		
			Python for accounting and finance

		
	

		
		
		   
		   
		   
		
		
			Is idle part of python?

		
	

		
		
		   
		   
		   
		
		
			Php id add to cart

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn install web3 python

		
	

		
		
		   
		   
		   
		
		
			Giá xe vision 2023 màu đen

		
	

		
		
		   
		   
		   
		
		
			How do i read a specific column in excel using pandas?

		
	

		
		
		   
		   
		   
		
		
			Javascript ternary operator empty string

		
	

		
		
		   
		   
		   
		
		
			Can javascript write to an html file?

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn đổ bóng trong html

		
	

		
		
		   
		   
		   
		
		
			How to create calendar using php?

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn php array to string

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn upload image nodejs multer

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn python get parent directory

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn frame trong python

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng python array python

		
	

	
	




Toplist mới

 
	
	 
		#1
		
			Top 4 uống nước chanh sả mật ong có tác dụng gì 2023
			7 tháng trước
		
	



	
	 
		#2
		
			Top 10 bài tập làm văn số 5 lớp 7 de 4 2023
			7 tháng trước
		
	



	
	 
		#3
		
			Top 3 vừa chơi đã có tài khoản vương giả chap 1 2023
			7 tháng trước
		
	



	
	 
		#4
		
			Top 6 anh sẽ on thôi cover phạm nguyên ngọc lyrics 2023
			7 tháng trước
		
	



	
	 
		#5
		
			Top 7 tài liệu quản lý nhà nước và quản lý ngành giáo dục đào tạo 2023
			7 tháng trước
		
	



	
	 
		#6
		
			Top 7 hãy ra khỏi người đó đi hợp âm 2023
			7 tháng trước
		
	



	
	 
		#7
		
			Top 6 giáo án thơ về thăm nhà bác 2023
			7 tháng trước
		
	



	
	 
		#8
		
			Top 8 giáo án ngữ văn 6 cánh diều 2023
			7 tháng trước
		
	



	
	 
		#9
		
			Top 9 tinh bột tham gia phản ứng nào 2023
			7 tháng trước
		
	






		


	Bài mới nhất
	
	 	
		
		   
		   
		   
		
		
			Bài 18 trinh bày văn bản khi in năm 2024

		
	

		
		
		   
		   
		   
		
		
			Bút toán tính tiền thuế gtgt phải nộp năm 2024

		
	

		
		
		   
		   
		   
		
		
			Tội vu khống người khác bị phạt như thế nào năm 2024

		
	

		
		
		   
		   
		   
		
		
			Cá nhân tự quyết toán thuế 2 tháng thử việc năm 2024

		
	

		
		
		   
		   
		   
		
		
			Tiêm vitamin k cho trẻ sơ sinh khi nào năm 2024

		
	

		
		
		   
		   
		   
		
		
			Bài tập đồng vị nâng cao có đáp án năm 2024

		
	

		
		
		   
		   
		   
		
		
			Top game nhẹ cho laptop cấu hình yếu năm 2024

		
	

		
		
		   
		   
		   
		
		
			Hệ điều hành 12.3.1 dùng cho điên thoại nào năm 2024

		
	

		
		
		   
		   
		   
		
		
			250 phạm văn đồng phường 1 q.gò vấp năm 2024

		
	

		
		
		   
		   
		   
		
		
			Top 10 nhân vật mạnh nhất trong fairy tail năm 2024

		
	

	
	
                 
	Chủ Đề
	
	
	
		  programming
		  Hỏi Đáp
		  Mẹo Hay
		  Toplist
		  Là gì
		  Địa Điểm Hay
		  mẹo hay
		  Học Tốt
		  Công Nghệ
		  Nghĩa của từ
		  Khỏe Đẹp
		  Bao nhiêu
		  đánh giá
		  Top List
		  bao nhieu
		  bao nhiêu
		  hướng dẫn
		  Bài tập
		  So Sánh
		  Tiếng anh
		  So sánh
		  Xây Đựng
		  Sản phẩm tốt
		  Ngôn ngữ
		  Bài Tập
		  Máy tính
		  javascript
		  Ở đâu
		  Thế nào
		  Hướng dẫn
		  Dịch 
		  Tại sao
		  Đại học
		  Món Ngon
		  Facebook
		  Khoa Học