I am trying to remove words from a string if they match a list.
x = "How I Met Your Mother 7x17 [HDTV-LOL] [VTV] - Mon, 20 Feb 2012"
tags = ['HDTV', 'LOL', 'VTV', 'x264', 'DIMENSION', 'XviD', '720P', 'IMMERSE']
print x
for tag in tags:
if tag in x:
print x.replace[tag, '']
It produces this output:
How I Met Your Mother 7x17 [HDTV-LOL] [VTV] - Mon, 20 Feb 2012
How I Met Your Mother 7x17 [-LOL] [VTV] - Mon, 20 Feb 2012
How I Met Your Mother 7x17 [HDTV-] [VTV] - Mon, 20 Feb 2012
How I Met Your Mother 7x17 [HDTV-LOL] [] - Mon, 20 Feb 2012
I want it to remove all the words matching the list.
asked Feb 22, 2012 at 14:01
1
You are not keeping the result of x.replace[]
. Try the following instead:
for tag in tags:
x = x.replace[tag, '']
print x
Note that your approach matches any substring, and not just full words. For example, it would remove the LOL
in RUN LOLA RUN
.
One way to address this would be to enclose each tag in a pair of r'\b'
strings, and look for the resulting
regular expression. The r'\b'
would only match at word boundaries:
for tag in tags:
x = re.sub[r'\b' + tag + r'\b', '', x]
answered Feb 22, 2012 at 14:03
NPENPE
471k104 gold badges923 silver badges998 bronze badges
3
The method str.replace[]
does not change the string in place -- strings are immutable in Python. You have to bind x
to the new string returned by replace[]
in each iteration:
for tag in tags:
x = x.replace[tag, ""]
Note that the if
statement is redundant; str.replace[]
won't do anything if it doesn't find a match.
answered Feb 22, 2012 at 14:04
Sven MarnachSven Marnach
544k114 gold badges914 silver badges816 bronze badges
0
Using your variables tags
and x
, you can use this:
output = reduce[lambda a,b: a.replace[b, ''], tags, x]
returns:
'How I Met Your Mother 7x17 [-] [] - Mon, 20 Feb 2012'
answered Feb 22, 2012 at 14:11
eumiroeumiro
198k33 gold badges294 silver badges259 bronze badges
1
[1] x.replace[tag, '']
does not modify x
, but rather returns a new string with the replacement.
[2] why are you printing on each iteration?
The simplest modification you could do would be:
for tag in tags:
x = x.replace[tag, '']
answered Feb 22, 2012 at 14:06
MarcinMarcin
47.3k17 gold badges122 silver badges198 bronze badges
0
Introduction
Replacing all or n occurrences of a substring in a given string is a fairly common problem of string manipulation and text processing in general. Luckily, most of these tasks are made easy in Python by its vast array of built-in functions, including this one.
Let's say, we have a string that contains the following sentence:
The brown-eyed man drives a brown car.
Our goal is to replace the word "brown"
with the word "blue"
:
The blue-eyed man drives a blue car.
In this
article, we'll be using the replace[]
function as well as the sub[]
and subn[]
functions with patterns to replace all occurrences of a substring from a string.
replace[]
The simplest way to do this is by using the built-in function - replace[]
:
string.replace[oldStr, newStr, count]
The first two parameters are required, while the third one is optional. oldStr
is the substring we want to replace with the newStr
. What's worth noting is that the function returns a new string,
with the performed transformation, without affecting the original one.
Let's give it a try:
string_a = "The brown-eyed man drives a brown car."
string_b = string_a.replace["brown", "blue"]
print[string_a]
print[string_b]
We've performed the operation on string_a
, packed the result into string_b
and printed them both.
This code results in:
The brown-eyed man drives a brown car.
The blue-eyed man drives a blue car.
Again, the string in memory that string_a
is pointing to remains unchanged. Strings in Python are immutable, which simply means you can't change a string. However, you can re-assign the reference variable to a new value.
To seemingly perform
this operation in-place, we can simply re-assign string_a
to itself after the operation:
string_a = string_a.replace["brown", "blue"]
print[string_a]
Here, the new string generated by the replace[]
method is assigned to the string_a
variable.
Replace n Occurrences of a Substring
Now, what if we don't wish to change all occurrences of a substring? What if we want to replace the first n?
That's where the third parameter of the replace[]
function comes in.
It represents the number of substrings that are going to be replaced. The following code only replaces the first occurrence of the word "brown"
with the word "blue"
:
string_a = "The brown-eyed man drives a brown car."
string_a = string_a.replace["brown", "blue", 1]
print[string_a]
And this prints:
The blue-eyed man drives a brown car.
By default, the third parameter is set to change all occurrences.
Substring Occurrences with Regular Expressions
To escalate the problem even further, let's say we want to not only replace all occurrences of a certain
substring, but replace all substrings that fit a certain pattern. Even this can be done with a one-liner, using regular expressions, and the standard library's re
module.
Regular expressions are a complex topic with a wide range of use in computer science, so we won't go too much in-depth in this article but if you need a quick start you can check out our guide on Regular Expressions in Python.
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
In its essence, a regular expression defines a pattern. For example, let's say we have a text about people who own cats and dogs, and we want to change both terms with the word "pet"
. First, we need to define a pattern that matches both terms like -
[cat|dog]
.
Using the sub[] Function
With the pattern sorted out, we're going to use the re.sub[]
function which has the following syntax:
re.sub[pattern, repl, string, count, flags]
The first argument is the pattern we're searching for [a string or a Pattern
object], repl
is what we're going to insert [can be a string or a function; if it is a string, any backslash escapes in it are processed] and string
is the string we're searching in.
Optional arguments are
count
and flags
which indicate how many occurrences need to be replaced and the flags used to process the regular expression, respectively.
If the pattern doesn't match any substring, the original string will be returned unchanged:
import re
string_a = re.sub[r'[cat|dog]', 'pet', "Mark owns a dog and Mary owns a cat."]
print[string_a]
This code prints:
Mark owns a pet and Mary owns a pet.
Case-Insensitive Pattern Matching
To perform case-insensitive pattern matching, for example, we'll set the flag parameter to re.IGNORECASE
:
import re
string_a = re.sub[r'[cats|dogs]', "Pets", "DoGs are a man's best friend", flags=re.IGNORECASE]
print[string_a]
Now any
case-combination of "dogs"
will also be included. When matching the pattern against multiple strings, to avoid copying it in multiple places, we can define a Pattern
object. They also have a sub[]
function with the syntax:
Pattern.sub[repl, string, count]
Using Pattern Objects
Let's define a Pattern
for cats and dogs and check a couple of sentences:
import re
pattern = re.compile[r'[Cats|Dogs]']
string_a = pattern.sub["Pets", "Dogs are a man's best friend."]
string_b = pattern.sub["Animals", "Cats enjoy sleeping."]
print[string_a]
print[string_b]
Which gives us the output:
Pets are a man's best friend.
Animals enjoy sleeping.
The subn[] Function
There's
also a subn[]
method with the syntax:
re.subn[pattern, repl, string, count, flags]
The subn[]
function returns a tuple with the string and number of matches in the String we've searched:
import re
string_a = re.subn[r'[cats|dogs]', 'Pets', "DoGs are a mans best friend", flags=re.IGNORECASE]
print[string_a]
The tuple looks like:
['Pets are a mans best friend', 1]
A Pattern
object contains a similar subn[]
function:
Pattern.subn[repl, string, count]
And it's used in a very similar way:
import re
pattern = re.compile[r'[Cats|Dogs]']
string_a = pattern.subn["Pets", "Dogs are a man's best friend."]
string_b = pattern.subn["Animals", "Cats enjoy sleeping."]
print[string_a]
print[string_b]
This results in:
["Pets are a man's best friend.", 1]
['Animals enjoy sleeping.', 1]
Conclusion
Python offers easy and simple functions for string handling. The easiest way to replace all occurrences of a
given substring in a string is to use the replace[]
function.
If needed, the standard library's re
module provides a more diverse toolset that can be used for more niche problems like finding patterns and case-insensitive searches.