Hướng dẫn remove html tags beautifulsoup

The expected result is:

Signal et Communication
Ingénierie Réseaux et Télécommunications

Here is the source code:

#!/usr/bin/env python3
from bs4 import BeautifulSoup

text = '''
''' soup = BeautifulSoup(text) print(soup.get_text())

answered Jul 20, 2015 at 16:37

SparkAndShineSparkAndShine

15.9k19 gold badges86 silver badges129 bronze badges

You can use the decompose method in bs4:

soup = bs4.BeautifulSoup('I linked to example.com')

for a in soup.find('a').children:
    if isinstance(a,bs4.element.Tag):
        a.decompose()

print soup

Out: I linked to 

answered Oct 17, 2013 at 22:37

danblackdanblack

1111 silver badge2 bronze badges

Code to simply get the contents as text instead of html:

'html_text' parameter is the string which you will pass in this function to get the text

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_text, 'lxml')
text = soup.get_text()
print(text)

answered May 18, 2020 at 8:53

1

it looks like this is the way to do! as simple as that

with this line you are joining together the all text parts within the current element

''.join(htmlelement.find(text=True))

answered Apr 25, 2013 at 4:46

Daniele BDaniele B

18.7k23 gold badges106 silver badges165 bronze badges

Here is the source code: you can get the text which is exactly in the URL

URL = ''
page = requests.get(URL)
soup = bs4.BeautifulSoup(page.content,'html.parser').get_text()
print(soup)

answered Mar 10, 2020 at 15:08

Not the answer you're looking for? Browse other questions tagged python beautifulsoup or ask your own question.

I am crawling data from a website. This website has code like this:


    Tag b: 
    Hello 
     world!

This is what I tried:

new_data = data.find("span",{"class":"demo-span"})
print(new_data.get_text())

Expected output:

Hello world!

But the actual output is:

Tag b: Hello world!

Hướng dẫn remove html tags beautifulsoup

asked Jun 12, 2018 at 7:32

Hướng dẫn remove html tags beautifulsoup

2

You can use decompose() to delete a tag.

html = '''

    Tag b: 
    Hello 
     world!
'''

soup = BeautifulSoup(html, 'html.parser')

new_data = soup.find("span", {"class": "demo-span"})
new_data.b.decompose()
print(new_data.get_text(' ', strip=True))
# Hello world!

answered Jun 12, 2018 at 8:00

Keyur PotdarKeyur Potdar

7,0096 gold badges25 silver badges39 bronze badges

2

Not the answer you're looking for? Browse other questions tagged python beautifulsoup web-crawler or ask your own question.

How can I simply strip all tags from an element I find in BeautifulSoup?

Hugo

26.1k7 gold badges80 silver badges95 bronze badges

asked Apr 25, 2013 at 4:26

Hướng dẫn remove html tags beautifulsoup

Daniele BDaniele B

18.7k23 gold badges106 silver badges165 bronze badges

With BeautifulStoneSoup gone in bs4, it's even simpler in Python3

from bs4 import BeautifulSoup

soup = BeautifulSoup(html)
text = soup.get_text()
print(text)

Hugo

26.1k7 gold badges80 silver badges95 bronze badges

answered Jan 27, 2015 at 2:47

4

answered Apr 29, 2014 at 0:40

BobbyBobby

6,7101 gold badge20 silver badges25 bronze badges

Use get_text(), it returns all the text in a document or beneath a tag, as a single Unicode string.

For instance, remove all different script tags from the following text:

Signal et Communication
Ingénierie Réseaux et Télécommunications
Signal et Communication
Ingénierie Réseaux et Télécommunications