Hướng dẫn what is htmlentities ()?

I have seen a lot of conflicting answers about this. Many people love to quote that php functions alone will not protect you from xss.

Nội dung chính

  • Definition and Usage
  • Parameter Values
  • Technical Details
  • More Examples
  • What does Htmlspecialchars return?
  • What's the difference between HTML entities () and htmlspecialchars ()?
  • Does Htmlspecialchars prevent XSS?
  • What is use of HTML entities in PHP?

Nội dung chính

  • Definition and Usage
  • Parameter Values
  • Technical Details
  • More Examples
  • What does Htmlspecialchars return?
  • What's the difference between HTML entities () and htmlspecialchars ()?
  • Does Htmlspecialchars prevent XSS?
  • What is use of HTML entities in PHP?

What XSS exactly can make it through htmlspecialchars and what can make it through htmlentities?

I understand the difference between the functions but not the different levels of xss protection you are left with. Could anyone explain?

asked Sep 2, 2010 at 1:30

1

htmlspecialchars() will NOT protect you against UTF-7 XSS exploits, that still plague Internet Explorer, even in IE 9: http://securethoughts.com/2009/05/exploiting-ie8-utf-7-xss-vulnerability-using-local-redirection/

For instance:

You should always use htmlentities and very rarely use htmlspecialchars when sanitizing user input. ALso, you should always strip tags before. And for really important and secure sites, you should NEVER trust strip_tags(). Use HTMLPurifier for PHP.

answered Sep 2, 2010 at 1:47

Theodore R. SmithTheodore R. Smith

20.9k12 gold badges60 silver badges89 bronze badges

10

If PHP's header command is used to set the charset

header('Content-Type: text/html; charset=utf-8');

then htmlspecialchars and htmlentities should both be safe for output of HTML because XSS cannot then be achieved using UTF-7 encodings.

Please note that these functions should not be used for output of values into JavaScript or CSS, because it would be possible to enter characters that enable the JavaScript or CSS to be escaped and put your site at risk. Please see the XSS Prevention Cheat Sheet on how to appropriately handle these situations.

Hướng dẫn what is htmlentities ()?

SherylHohman

15k16 gold badges83 silver badges88 bronze badges

answered Jan 1, 2014 at 17:50

SilverlightFoxSilverlightFox

31.3k11 gold badges74 silver badges143 bronze badges

I'm not sure if you have found the answer you were looking for, but, I am also looking for an HTML cleaner. I have an application I am building and want to be able to take HTML code, possibly even Javascript, or other languages and put them into a MySQL DB without causing issues nor allowing for XSS issues. I've found HTML Purifier and it appears to be the most developed and still maintained tool for cleaning up user submitted information on a PHP system. The page linked is their compairison page which can yield reasoning as to why their's or another tool could be useful. Hope this helps!

answered Dec 19, 2012 at 15:32

You can't sanitize all type of XSS with htmlspecialchars. htmlspecialchars may help you to protect against XSS inside HTML tags or some quoted HTML attributes.

Nội dung chính

  • Definition and Usage
  • Parameter Values
  • Technical Details
  • More Examples
  • What does Htmlspecialchars return?
  • What's the difference between HTML entities () and htmlspecialchars ()?
  • Does Htmlspecialchars prevent XSS?
  • What is use of HTML entities in PHP?

You have to sanitize the different type of XSS with their own sanitization method.


  1. User input placed inside HTML:

Attack vector:

This type of XSS can be sanitized using htmlspecialchars function because attacker need to use < and > to create new HTML tag.

Solution:


  1. User input placed inside single quoted attribute:

    
    Link

    
    

Attack vector: javascript:alert(1), javscript://alert(1)

htmlspecialchars Document

This function will not prevent those vectors because they haven't any HTML special character. To prevent such attacks, you need to validate input as a URL.

Solution:

 
    
    
    Link

    
    


  1. User input placed inside JavaScript tag without any quote

Attack vector: 1;alert(1)

in some cases, we can easily quote input and prevent attack by sanitizing it using htmlspecialchars but if we need input to be integer we can prevent XSS by using input validation.

Solution:



Always quote variables when it placed inside a HTML attribute and do a proper sanitization.

❮ PHP String Reference

Nội dung chính

  • Definition and Usage
  • Parameter Values
  • Technical Details
  • More Examples
  • What does Htmlspecialchars return?
  • What's the difference between HTML entities () and htmlspecialchars ()?
  • Does Htmlspecialchars prevent XSS?
  • What is use of HTML entities in PHP?

Example

Convert the predefined characters "<" (less than) and ">" (greater than) to HTML entities:

$str = "This is some bold text.";
echo htmlspecialchars($str);
?>

The HTML output of the code above will be (View Source):




This is some <b>bold</b> text.

The browser output of the code above will be:

This is some bold text.

Try it Yourself »


Definition and Usage

The htmlspecialchars() function converts some predefined characters to HTML entities.

The predefined characters are:

  • & (ampersand) becomes &
  • " (double quote) becomes "
  • ' (single quote) becomes '
  • < (less than) becomes <
  • > (greater than) becomes >

Tip: To convert special HTML entities back to characters, use the htmlspecialchars_decode() function.


Syntax

htmlspecialchars(string,flags,character-set,double_encode)

Parameter Values

ParameterDescription
string Required. Specifies the string to convert
flags Optional. Specifies how to handle quotes, invalid encoding and the used document type.

The available quote styles are:

  • ENT_COMPAT - Default. Encodes only double quotes
  • ENT_QUOTES - Encodes double and single quotes
  • ENT_NOQUOTES - Does not encode any quotes

Invalid encoding:

  • ENT_IGNORE - Ignores invalid encoding instead of having the function return an empty string. Should be avoided, as it may have security implications.
  • ENT_SUBSTITUTE - Replaces invalid encoding for a specified character set with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; instead of returning an empty string.
  • ENT_DISALLOWED - Replaces code points that are invalid in the specified doctype with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD;

Additional flags for specifying the used doctype:

  • ENT_HTML401 - Default. Handle code as HTML 4.01
  • ENT_HTML5 - Handle code as HTML 5
  • ENT_XML1 - Handle code as XML 1
  • ENT_XHTML - Handle code as XHTML
character-set Optional. A string that specifies which character-set to use.

Allowed values are:

  • UTF-8 - Default. ASCII compatible multi-byte 8-bit Unicode
  • ISO-8859-1 - Western European
  • ISO-8859-15 - Western European (adds the Euro sign + French and Finnish letters missing in ISO-8859-1)
  • cp866 - DOS-specific Cyrillic charset
  • cp1251 - Windows-specific Cyrillic charset
  • cp1252 - Windows specific charset for Western European
  • KOI8-R - Russian
  • BIG5 - Traditional Chinese, mainly used in Taiwan
  • GB2312 - Simplified Chinese, national standard character set
  • BIG5-HKSCS - Big5 with Hong Kong extensions
  • Shift_JIS - Japanese
  • EUC-JP - Japanese
  • MacRoman - Character-set that was used by Mac OS

Note: Unrecognized character-sets will be ignored and replaced by ISO-8859-1 in versions prior to PHP 5.4. As of PHP 5.4, it will be ignored an replaced by UTF-8.

double_encode Optional. A boolean value that specifies whether to encode existing html entities or not.
  • TRUE - Default. Will convert everything
  • FALSE - Will not encode existing html entities


Technical Details

Return Value:Returns the converted string

If the string contains invalid encoding, it will return an empty string, unless either the ENT_IGNORE or ENT_SUBSTITUTE flags are set

PHP Version:4+
Changelog:PHP 5.6 - Changed the default value for the character-set parameter to the value of the default charset (in configuration).
PHP 5.4 - Changed the default value for the character-set parameter to UTF-8.
PHP 5.4 - Added ENT_SUBSTITUTE, ENT_DISALLOWED, ENT_HTML401, ENT_HTML5, ENT_XML1 and ENT_XHTML
PHP 5.3 - Added ENT_IGNORE constant.
PHP 5.2.3 - Added the double_encode parameter.
PHP 4.1 - Added the character-set parameter.

More Examples

Example

Convert some predefined characters to HTML entities:

$str = "Jane & 'Tarzan'";
echo htmlspecialchars($str, ENT_COMPAT); // Will only convert double quotes
echo "
";
echo htmlspecialchars($str, ENT_QUOTES); // Converts double and single quotes
echo "
";
echo htmlspecialchars($str, ENT_NOQUOTES); // Does not convert any quotes
?>

The HTML output of the code above will be (View Source):




Jane & 'Tarzan'

Jane & 'Tarzan'

Jane & 'Tarzan'

The browser output of the code above will be:

Jane & 'Tarzan'
Jane & 'Tarzan'
Jane & 'Tarzan'

Try it Yourself »

Example

Convert double quotes to HTML entities:

$str = 'I love "PHP".';
echo htmlspecialchars($str, ENT_QUOTES); // Converts double and single quotes
?>

The HTML output of the code above will be (View Source):




I love "PHP".

The browser output of the code above will be:

I love "PHP".

Try it Yourself »


❮ PHP String Reference


What does Htmlspecialchars return?

The htmlspecialchars() function returns the converted string.

What's the difference between HTML entities () and htmlspecialchars ()?

Difference between htmlentities() and htmlspecialchars() function: The only difference between these function is that htmlspecialchars() function convert the special characters to HTML entities whereas htmlentities() function convert all applicable characters to HTML entities.

Does Htmlspecialchars prevent XSS?

Using htmlspecialchars() function – The htmlspecialchars() function converts special characters to HTML entities. For a majority of web-apps, we can use this method and this is one of the most popular methods to prevent XSS. This process is also known as HTML Escaping.

What is use of HTML entities in PHP?

Definition and Usage The htmlentities() function converts characters to HTML entities. Tip: To convert HTML entities back to characters, use the html_entity_decode() function. Tip: Use the get_html_translation_table() function to return the translation table used by htmlentities().