View Discussion
Improve Article
Save Article
View Discussion
Improve Article
Save Article
In this article, we will see how to remove HTML tags from data in PHP. PHP provides an inbuilt function to remove the HTML tags from the data. The strip_tags[] function is an inbuilt function in PHP that removes the strings form HTML, XML and PHP tags. It accepts two parameters. This function returns a string with all NULL bytes, HTML, and PHP tags stripped from a given $str.
Syntax:
strip_tags[string, allowed_tags]
Parameters Values:
- string: It is a required parameter that specifies the string to check.
- allowed_tags: Itis an optional parameter that specifies the allowable tags which will not be removed from the returned result.
Return Value: It returnsa string where HTML tags are removed except for the allowed tags.
Example 1: In this example, we passed a string containing HTML tags to the strip_tags[] function and checked the returned string whether all HTML tags are removed or not. All the HTML tags in the string are stripped from the string by the strip_tags[] function.
PHP
Output:
Example 2: In this code, we specified the allowed_tags parameter along with the string to strip_tags[] method, so that we can allow a few tags in the string and strip the unallowed tags from the input string. In the allowed_tags section, we specified
tag. So it did not strip the tag in the string and stripped the rest of the other tags i.e.
italic tag.
PHP
Output:
Hypertext Markup Language [HTML] is a markup language that web pages on the internet use. In HTML, elements describe how a browser should display the content on a web page. HTML tags wrap elements, with the tags telling the page where an element starts and ends.
With Formatter, you can use the Text extract function to remove any HTML tags from your data. This is useful if an app needs plain language instead of HTML.
After removing HTML tags from your data, you can use the plain text in further actions in your Zap.
Instantly remove html tags from a string of content with this online tool. Enter all of the code for a web page or just a part of a web page and this tool will automatically remove all the HTML elements leaving just the text content you want.
This JavaScript based tool will also extract the text for the HTML button element and the title metatag alongside regular text content.
If you need to remove HTML tags then give it a whirl - it works pretty darn well at stripping out those unwanted HTML elements.
How to Remove HTML Tags from Text
This is just a bit of a technical note about removing html elements using JavaScript code so if you're not into the technical details then just skip this part and use the html stripper tool above.
Generally it's preferable to use an approach that leverages the DOM in a graceful way to find and remove the HTML content over an approach that just uses Regular Expressions to find and remove HTML tags.
Because you will encounter malformed HTML, the regex approach can fail in spectacular ways so here I tried to leverage the javascript innerText property to get the job done in a more dependable way.
The Problem with Using InnerText
Using the jaavscript innertext property to remove HTML tags unfortunately doesn't work exactly how I wanted it too so I had to sweeten the deal with some regular expressions to get the text output I wanted.
The big problem, for me, with using innertext to remove html tags was that it would remove script tags but leave the contents in-between the opening and closing script tag in your text content. It also did the same for style tags in those instances where you might encounter some on page style rules.
Although optional, I also added a regex to make the output more readable by getting rid of excess multiple line breaks. It just made the output format a bit more readable.
Anyways if none of these are deal breakers for you then I would just say use the innerText property to remove html tags from your web content. Otherwise you'll need to use some regex to remove the HTML tags.