Is there any easy way to remove all HTML tags or ANYTHING HTML related from a string?
For example:
string title = " Hulk Hogan's Celebrity Championship Wrestling [Proj # 206010] [Reality Series, ]"
The above should really be:
"Hulk Hogan's Celebrity Championship Wrestling [Proj # 206010] [Reality Series]"
asked Aug 9, 2013 at 19:12
3
You can parse the string using Html Agility pack and get the InnerText.
HtmlDocument htmlDoc = new HtmlDocument[];
htmlDoc.LoadHtml[@" Hulk Hogan's Celebrity Championship Wrestling [Proj # 206010] [Reality Series, ]"];
string result = htmlDoc.DocumentNode.InnerText;
answered Aug 9, 2013 at 19:21
ssilas777ssilas777
9,5524 gold badges43 silver badges67 bronze badges
2
You can use the below code on your string and you will get the complete string without html part.
string title = " Hulk Hogan's Celebrity Championship Wrestling [Proj # 206010] [Reality Series, ]".Replace[" ",string.Empty];
string s = Regex.Replace[title, "", String.Empty];
answered Aug 9, 2013 at 20:50
VinayVinay
6752 gold badges7 silver badges19 bronze badges
0
I built a small function to remove HTML tags.
public static string RemoveHtmlTags[string text]
{
List openTagIndexes = Regex.Matches[text, ""].Cast[].Select[m => m.Index].ToList[];
if [closeTagIndexes.Count > 0]
{
StringBuilder sb = new StringBuilder[];
int previousIndex = 0;
foreach [int closeTagIndex in closeTagIndexes]
{
var openTagsSubset = openTagIndexes.Where[x => x >= previousIndex && x < closeTagIndex];
if [openTagsSubset.Count[] > 0 && closeTagIndex - openTagsSubset.Max[] > 1 ]
{
sb.Append[text.Substring[previousIndex, openTagsSubset.Max[] - previousIndex]];
}
else
{
sb.Append[text.Substring[previousIndex, closeTagIndex - previousIndex + 1]];
}
previousIndex = closeTagIndex + 1;
}
if [closeTagIndexes.Max[] < text.Length]
{
sb.Append[text.Substring[closeTagIndexes.Max[] + 1]];
}
return sb.ToString[];
}
else
{
return text;
}
}
answered Jul 6 at 16:09
public static string StripHTML[string input]
{
if [input==null]
{
return string.Empty;
}
return Regex.Replace[input, "", String.Empty];
}
Shunya
2,2234 gold badges15 silver badges27 bronze badges
answered Jul 27 at 6:25
1
How to remove specific HTML tags from string in c#?
By using Regex: public static string RemoveHTMLTags[string html] { return Regex.Replace[html, "", string.Empty]; } ... .
By using Compiled Regex for better performance: ... .
By using Char Array for faster performance for several HTML files:.
How do you remove HTML from text?
Removing HTML Tags from Text.
Press Ctrl+H. ... .
Click the More button, if it is available. ... .
Make sure the Use Wildcards check box is selected..
In the Find What box, enter the following: \[[!