How do you replace html in python?

From this HTML code:

Name is a fine man.

I'm looking for replacing "Name" using the following code:

target = soup.find_all(text="Name")
for v in target:
    v.replace_with('Id')

The output I would like to have is:

Id is a fine man.

When I:

print target
[]

Why doesn't it find the "Name"?

Thanks!

asked Jul 4, 2015 at 11:33

DiegoDiego

6273 gold badges9 silver badges24 bronze badges

5

The text node in your HTML contains some other text besides "Name". In this case, you need to relax search criteria to use contains instead of exact match, for example, by using regex. Then you can replace matched text nodes with the original text except for "Name" part should be replaced with "Id" by using simple string.replace() method, for example :

from bs4 import BeautifulSoup
import re

html = """

Name is a fine man.

""" soup = BeautifulSoup(html) target = soup.find_all(text=re.compile(r'Name')) for v in target: v.replace_with(v.replace('Name','Id')) print soup

output :

Id is a fine man.

answered Jul 4, 2015 at 12:02

How do you replace html in python?

har07har07

86.8k12 gold badges78 silver badges130 bronze badges

1

It returns an empty list because searching for text like this must match the whole text in a tag, so use regular expression instead.

From the official docs: BeautifulSoup - Search text

text is an argument that lets you search for NavigableString objects instead of Tags. Its value can be a string, a regular expression, a list or dictionary, True or None, or a callable that takes a NavigableString object as its argument:

soup.findAll(text="one")
# [u'one']
soup.findAll(t ext=re.compile("paragraph"))
# [u'This is paragraph ', u'This is paragraph ']
soup.findAll(text=lambda(x): len(x) < 12)
# [u'Page title', u'one', u'.', u'two', u'.']

P.S.: Already already discussed answers are here and here.

answered Jul 4, 2015 at 12:38

devautordevautor

2,4462 gold badges19 silver badges31 bronze badges

29 Python code examples are found related to " replace html". You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example.

Example 1

def replaceHTMLCodes(txt):

    txt = re.sub("(&#[0-9]+)([^;^0-9]+)", "\\1;\\2", txt)
    txt = HTMLParser.HTMLParser().unescape(txt)
    txt = txt.replace(""", "\"")
    txt = txt.replace("&", "&")
    txt = txt.replace("&", "&")
    txt = txt.replace(" ", "")
    return txt 

Example 2

def replaceHTMLCodes(txt):
    # Fix missing ; in &#;
    txt = re.sub("(&#[0-9]+)([^;^0-9]+)", "\\1;\\2", txt)

    txt = HTMLParser.HTMLParser().unescape(txt)
    txt = txt.replace("&", "&")
    return txt 

Example 3

def replace_html_codes(txt):
    txt = re.sub("(&#[0-9]+)([^;^0-9]+)", "\\1;\\2", txt)
    txt = HTMLParser.HTMLParser().unescape(txt)
    txt = txt.replace(""", "\"")
    txt = txt.replace("&", "&")
    return txt 

Example 4

def replaceHTMLEntity(t):
    """Helper parser action to replace common HTML entities with their special characters"""
    return _htmlEntityMap.get(t.entity)


# Chad Vernon -- Add Python 2.x support for Maya
# opAssoc = types.SimpleNamespace() 

Example 5

def replaceHTMLEntity(t):
    """Helper parser action to replace common HTML entities with their special characters"""
    return _htmlEntityMap.get(t.entity)

# it's easy to get these comment structures wrong - they're very common, so may as well make them available 

Example 6

def replace_html(s):
    s = s.replace('"', '"')
    s = s.replace('&', '&')
    s = s.replace('<', '<')
    s = s.replace('>', '>')
    s = s.replace(' ', ' ')
    s = s.replace("“", "")
    s = s.replace("”", "")
    s = s.replace("—", "")
    s = s.replace("\xa0", " ")
    return(s) 

Example 7

def replaceHTMLCodes(txt):
    txt = re.sub("(&#[0-9]+)([^;^0-9]+)", "\\1;\\2", txt)
    txt = HTMLParser.HTMLParser().unescape(txt)
    txt = txt.replace(""", "\"")
    txt = txt.replace("&", "&")
    return txt 

Example 8

def replaceHTMLCodes(txt):
    txt = re.sub("(&#[0-9]+)([^;^0-9]+)", "\\1;\\2", txt)
    txt = HTMLParser.HTMLParser().unescape(txt)
    txt = txt.replace(""", "\"")
    txt = txt.replace("&", "&")
    return txt 

Example 9

def replaceHTMLCodes(txt):
    log(repr(txt), 5)

    # Fix missing ; in &#;
    txt = re.sub("(&#[0-9]+)([^;^0-9]+)", "\\1;\\2", makeUTF8(txt))

    txt = HTMLParser.HTMLParser().unescape(txt)
    txt = txt.replace("&", "&")
    log(repr(txt), 5)
    return txt 

Example 10

def replaceHTMLEntity(t):
    """Helper parser action to replace common HTML entities with their special characters"""
    return _htmlEntityMap.get(t.entity)

# it's easy to get these comment structures wrong - they're very common, so may as well make them available 

Example 11

def replace_html_char_entity(htmlstr):
	pass



## 恢复常用HTML字符实体.
# 使用正常的字符替换HTML中特殊的字符实体.
# 你可以添加新的实体字符到CHAR_ENTITIES中,处理更多HTML字符实体.
# @param htmlstr HTML字符串. 

Example 12

def replaceHTMLCodes(txt):
    txt = re.sub("(&#[0-9]+)([^;^0-9]+)", "\\1;\\2", txt)
    txt = HTMLParser.HTMLParser().unescape(txt)
    txt = txt.replace(""", "\"")
    txt = txt.replace("&", "&")
    txt = txt.strip()
    return txt 

Example 13

def replaceHTMLEntity(t):
    """Helper parser action to replace common HTML entities with their special characters"""
    return _htmlEntityMap.get(t.entity)

# it's easy to get these comment structures wrong - they're very common, so may as well make them available 

Example 14

def replaceSmartHTMLEntities(self, s):
        """
        Replaces well known "smart" HTML entities with ASCII characters (mainly aimed at smartquotes)
        """
        ENTITIES = {
            "180":  "'", # spacing acute
            "8211": "-", # endash
            "8212": "--", # emdash
            "8216": "'", # left single quote
            "8217": "'", # right single quote
            "8218": ",", # single low quote (comma)
            "8220": "\"", # left double quotes
            "8221": "\"", # right double quotes
            "8222": ",,", # double low quote (comma comma)
            "8226": "*", # bullet
            "8230": "...", # ellipsis
            "8242": "'", # prime (stopwatch)
            "8243": "\"", # double prime,
            "10003": "/", # check
            "10004": "/", # heavy check
            "10005": "x", # multiplication x
            "10006": "x", # heavy multiplication x
            "10007": "x", # ballot x
            "10008": "x"  # heavy ballot x
        }
        for k, v in ENTITIES.items():
            s = s.replace("&#" + k + ";", v)
        return s 

Example 15

def replace_html(s):
    s = s.replace('"','"')
    s = s.replace('&','&')
    s = s.replace('<','<')
    s = s.replace('>','>')
    s = s.replace(' ',' ')
    s = s.replace("“", "“")
    s = s.replace("”", "”")
    s = s.replace("—","")
    s = s.replace("\xa0", " ")
    return(s) 

Example 16

def replace_html(s):
    s = s.replace('"','"')
    s = s.replace('&','&')
    s = s.replace('<','<')
    s = s.replace('>','>')
    s = s.replace(' ',' ')
    s = s.replace("“", "")
    s = s.replace("”", "")
    s = s.replace("—","")
    s = s.replace("\xa0", " ")
    return(s) 

Example 17

def replace_html_content():
    for html_path in templates_dir.glob('**/*.html'):
        with html_path.open('r') as html_file:
            index_content = html_file.read()

        index_content = re.sub('.*', ' {{ title }} ', index_content)
        index_content = re.sub('src="\\.(/dist)', 'src="{{ url_prefix }}', index_content)
        index_content = re.sub('href="\\.(/dist)', 'href="{{ url_prefix }}', index_content)
        index_content = re.sub('src="\\.', 'src="{{ url_prefix }}', index_content)
        index_content = re.sub('href="\\.', 'href="{{ url_prefix }}', index_content)
        index_content = re.sub('https://petstore.swagger.io/v[1-9]/swagger.json',
                               '{{ config_url }}', index_content)

        with html_path.open('w') as html_file:
            html_file.write(index_content) 

Example 18

def replace_html(s):
    s = s.replace('"','"')
    s = s.replace('&','&')
    s = s.replace('<','<')
    s = s.replace('>','>')
    s = s.replace(' ',' ')
    s = s.replace("“", "“")
    s = s.replace("”", "”")
    s = s.replace("—","")
    s = s.replace("\xa0", " ")
    return(s) 

Example 19

def replaceHTMLEntity(t):
    """Helper parser action to replace common HTML entities with their special characters"""
    return _htmlEntityMap.get(t.entity)

# it's easy to get these comment structures wrong - they're very common, so may as well make them available 

Example 20

def replaceHTMLEntity(t):
    """Helper parser action to replace common HTML entities with their special characters"""
    return _htmlEntityMap.get(t.entity)

# it's easy to get these comment structures wrong - they're very common, so may as well make them available 

Example 21

def replace_html_chars(to_be_replaced):
    return to_be_replaced.replace("\n", "")\
                         .replace("{", "{{")\
                         .replace("}", "}}")\
                         .replace("”","\"")\
                         .replace("“","\"") 

Example 22

def replace_html(data):
    if isinstance(data, dict):
        return dict([(replace_html(k), replace_html(v)) for k, v in data.items()])
    elif isinstance(data, list):
        return [replace_html(l) for l in data]
    elif isinstance(data, str) or isinstance(data, unicode):
        return _replace_str_html(data)
    else:
        return data 

Example 23

def replaceHTMLCodes(txt):
    log(repr(txt), 5)

    # Fix missing ; in &#;
    txt = re.sub("(&#[0-9]+)([^;^0-9]+)", "\\1;\\2", makeUTF8(txt))

    txt = HTMLParser.HTMLParser().unescape(txt)
    txt = txt.replace("&", "&")
    log(repr(txt), 5)
    return txt 

Example 24

def replaceHTMLEntity(t):
    return t.entity in _htmlEntityMap and _htmlEntityMap[t.entity] or None


# it's easy to get these comment structures wrong - they're very common, so may as well make them available 

Example 25

def replaceHTMLEntity(t):
    """Helper parser action to replace common HTML entities with their special characters"""
    return _htmlEntityMap.get(t.entity)

# it's easy to get these comment structures wrong - they're very common, so may as well make them available 

Example 26

def replace_html_main(self, template, html_main):
        if not isinstance(template, Template):
            template = self.app.template(template)

        if template:
            if html_main:
                html_main = template.replace(self.html_main_key, html_main)
            else:
                html_main = template

        return Template(html_main)

    # Template redering 

Example 27

def replaceHTMLEntity(t):
    """Helper parser action to replace common HTML entities with their special characters"""
    return _htmlEntityMap.get(t.entity)

# it's easy to get these comment structures wrong - they're very common, so may as well make them available 

Example 28

def replaceHTML(self, text, matches):
        if matches:
            for match in matches:
                text = text.replace(HTML_REPLACER, match, 1)
        return text 

Example 29

def replaceHTMLCodes(text):
	text = re.sub("(&#[0-9]+)([^;^0-9]+)", "\\1;\\2", text)
	text = HTMLParser().unescape(text)
	text = text.replace(""", "\"")
	text = text.replace("&", "&")
	text = text.replace("%2B", "+")
	text = text.replace("\/", "/")
	text = text.replace("\\", "")
	text = text.strip()
	return text 

How do you replace text in HTML in Python?

If the text and the string to replace is simple then use str. replace().

How do you change the HTML element in Python?

This question already has answers here:.
take html document..
find every occurrence of 'img' tag..
take their 'src' attribute..
pass founded url to processing..
change the 'src' attribute to the new one..
do all this stuff with Python 2.7..

How do you replace a code in Python?

Syntax of replace().
Syntax: string.replace(old, new, count).
Parameters:.
Return Value : It returns a copy of the string where all occurrences of a substring are replaced with another substring..

How do I find and replace a pattern in Python?

To replace a string in Python, the regex sub() method is used. It is a built-in Python method in re module that returns replaced string. Don't forget to import the re module. This method searches the pattern in the string and then replace it with a new given expression.