Remove html tags from string c#

Is there any easy way to remove all HTML tags or ANYTHING HTML related from a string?

For example:

string title = " Hulk Hogan's Celebrity Championship Wrestling    [Proj # 206010]    [Reality Series,  ]"

The above should really be:

"Hulk Hogan's Celebrity Championship Wrestling [Proj # 206010] [Reality Series]"

asked Aug 9, 2013 at 19:12

3

You can parse the string using Html Agility pack and get the InnerText.

    HtmlDocument htmlDoc = new HtmlDocument[];
    htmlDoc.LoadHtml[@" Hulk Hogan's Celebrity Championship Wrestling    [Proj # 206010]    [Reality Series,  ]"];
    string result = htmlDoc.DocumentNode.InnerText;

answered Aug 9, 2013 at 19:21

ssilas777ssilas777

9,5524 gold badges43 silver badges67 bronze badges

2

You can use the below code on your string and you will get the complete string without html part.

string title = " Hulk Hogan's Celebrity Championship Wrestling    [Proj # 206010]    [Reality Series,  ]".Replace[" ",string.Empty];            
        string s = Regex.Replace[title, "", String.Empty];

answered Aug 9, 2013 at 20:50

VinayVinay

6752 gold badges7 silver badges19 bronze badges

0

I built a small function to remove HTML tags.

public static string RemoveHtmlTags[string text]
        {
            List openTagIndexes = Regex.Matches[text, ""].Cast[].Select[m => m.Index].ToList[];
            if [closeTagIndexes.Count > 0]
            {
                StringBuilder sb = new StringBuilder[];
                int previousIndex = 0;
                foreach [int closeTagIndex in closeTagIndexes]
                {
                    var openTagsSubset = openTagIndexes.Where[x => x >= previousIndex && x < closeTagIndex];
                    if [openTagsSubset.Count[] > 0 && closeTagIndex - openTagsSubset.Max[] > 1 ]
                    {
                        sb.Append[text.Substring[previousIndex, openTagsSubset.Max[] - previousIndex]];
                    }
                    else
                    {
                        sb.Append[text.Substring[previousIndex, closeTagIndex - previousIndex + 1]];
                    }
                    previousIndex = closeTagIndex + 1;
                }
                if [closeTagIndexes.Max[] < text.Length]
                {
                    sb.Append[text.Substring[closeTagIndexes.Max[] + 1]];
                }
                return sb.ToString[];
            }
            else
            {
                return text;
            }
        }

answered Jul 6 at 16:09

public static string StripHTML[string input]
{
    if [input==null]
    {
        return string.Empty;
    }
    return Regex.Replace[input, "", String.Empty];

}

Shunya

2,2234 gold badges15 silver badges27 bronze badges

answered Jul 27 at 6:25

1

How to remove specific HTML tags from string in c#?

By using Regex: public static string RemoveHTMLTags[string html] { return Regex.Replace[html, "", string.Empty]; } ... .
By using Compiled Regex for better performance: ... .
By using Char Array for faster performance for several HTML files:.

How do you remove HTML from text?

Removing HTML Tags from Text.
Press Ctrl+H. ... .
Click the More button, if it is available. ... .
Make sure the Use Wildcards check box is selected..
In the Find What box, enter the following: \[[!

Chủ Đề