Hướng dẫn remove html tags beautifulsoup

Question

The expected result is:

Signal et Communication
Ingénierie Réseaux et Télécommunications

Here is the source code:

#!/usr/bin/env python3
from bs4 import BeautifulSoup

text = '''


'''
soup = BeautifulSoup(text)

print(soup.get_text())

answered Jul 20, 2015 at 16:37

SparkAndShineSparkAndShine

15.9k19 gold badges86 silver badges129 bronze badges

You can use the decompose method in bs4:

soup = bs4.BeautifulSoup('I linked to example.com')

for a in soup.find('a').children:
    if isinstance(a,bs4.element.Tag):
        a.decompose()

print soup

Out: I linked to

answered Oct 17, 2013 at 22:37

danblackdanblack

1111 silver badge2 bronze badges

Code to simply get the contents as text instead of html:

'html_text' parameter is the string which you will pass in this function to get the text

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_text, 'lxml')
text = soup.get_text()
print(text)

answered May 18, 2020 at 8:53

1

it looks like this is the way to do! as simple as that

with this line you are joining together the all text parts within the current element

''.join(htmlelement.find(text=True))

answered Apr 25, 2013 at 4:46

Daniele BDaniele B

18.7k23 gold badges106 silver badges165 bronze badges

Here is the source code: you can get the text which is exactly in the URL

URL = ''
page = requests.get(URL)
soup = bs4.BeautifulSoup(page.content,'html.parser').get_text()
print(soup)

answered Mar 10, 2020 at 15:08

Not the answer you're looking for? Browse other questions tagged python beautifulsoup or ask your own question.

I am crawling data from a website. This website has code like this:


    Tag b: 
    Hello 
     world!

This is what I tried:

new_data = data.find("span",{"class":"demo-span"})
print(new_data.get_text())

Expected output:

Hello world!

But the actual output is:

Tag b: Hello world!

Hướng dẫn remove html tags beautifulsoup

asked Jun 12, 2018 at 7:32

2

You can use decompose() to delete a tag.

html = '''

    Tag b: 
    Hello 
     world!
'''

soup = BeautifulSoup(html, 'html.parser')

new_data = soup.find("span", {"class": "demo-span"})
new_data.b.decompose()
print(new_data.get_text(' ', strip=True))
# Hello world!

answered Jun 12, 2018 at 8:00

Keyur PotdarKeyur Potdar

7,0096 gold badges25 silver badges39 bronze badges

2

Not the answer you're looking for? Browse other questions tagged python beautifulsoup web-crawler or ask your own question.

How can I simply strip all tags from an element I find in BeautifulSoup?

Hugo

26.1k7 gold badges80 silver badges95 bronze badges

asked Apr 25, 2013 at 4:26

Daniele BDaniele B

18.7k23 gold badges106 silver badges165 bronze badges

With BeautifulStoneSoup gone in bs4, it's even simpler in Python3

from bs4 import BeautifulSoup

soup = BeautifulSoup(html)
text = soup.get_text()
print(text)

Hugo

26.1k7 gold badges80 silver badges95 bronze badges

answered Jan 27, 2015 at 2:47

4

answered Apr 29, 2014 at 0:40

BobbyBobby

6,7101 gold badge20 silver badges25 bronze badges

Use get_text(), it returns all the text in a document or beneath a tag, as a single Unicode string.

For instance, remove all different script tags from the following text:

Signal et Communication
Ingénierie Réseaux et Télécommunications

programming html Remove tag BeautifulSoup BeautifulSoup get innerHTML BeautifulSoup remove tags BeautifulSoup Python example

Hướng dẫn remove html tags beautifulsoup

Not the answer you're looking for? Browse other questions tagged python beautifulsoup or ask your own question.

Not the answer you're looking for? Browse other questions tagged python beautifulsoup web-crawler or ask your own question.

Bài Viết Liên Quan

Quảng Cáo

Có thể bạn quan tâm

Toplist được quan tâm

Quảng cáo

Xem Nhiều

Quảng cáo

Chúng tôi

Điều khoản

Trợ giúp

Mạng xã hội