How do i remove duplicates from a word in python?

Following example:

string1 = "calvin klein design dress calvin klein"

How can I remove the second two duplicates "calvin" and "klein"?

The result should look like

string2 = "calvin klein design dress"

only the second duplicates should be removed and the sequence of the words should not be changed!

Martin Thoma

113k148 gold badges570 silver badges875 bronze badges

asked Oct 17, 2011 at 13:08

string1 = "calvin klein design dress calvin klein"
words = string1.split[]
print [" ".join[sorted[set[words], key=words.index]]]

This sorts the set of all the [unique] words in your string by the word's index in the original list of words.

answered Oct 17, 2011 at 13:40

MarkusMarkus

3,3173 gold badges23 silver badges25 bronze badges

def unique_list[l]:
    ulist = []
    [ulist.append[x] for x in l if x not in ulist]
    return ulist

a="calvin klein design dress calvin klein"
a=' '.join[unique_list[a.split[]]]

answered Oct 17, 2011 at 13:12

spicavigospicavigo

3,99021 silver badges28 bronze badges

7

In Python 2.7+, you could use collections.OrderedDict for this:

from collections import OrderedDict
s = "calvin klein design dress calvin klein"
print ' '.join[OrderedDict[[w,w] for w in s.split[]].keys[]]

answered Oct 17, 2011 at 13:21

NPENPE

471k104 gold badges923 silver badges998 bronze badges

1

Cut and paste from the itertools recipes

from itertools import ifilterfalse

def unique_everseen[iterable, key=None]:
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen['AAAABBBCCDAABBB'] --> A B C D
    # unique_everseen['ABBCcAD', str.lower] --> A B C D
    seen = set[]
    seen_add = seen.add
    if key is None:
        for element in ifilterfalse[seen.__contains__, iterable]:
            seen_add[element]
            yield element
    else:
        for element in iterable:
            k = key[element]
            if k not in seen:
                seen_add[k]
                yield element

I really wish they could go ahead and make a module out of those recipes soon. I'd very much like to be able to do from itertools_recipes import unique_everseen instead of using cut-and-paste every time I need something.

Use like this:

def unique_words[string, ignore_case=False]:
    key = None
    if ignore_case:
        key = str.lower
    return " ".join[unique_everseen[string.split[], key=key]]

string2 = unique_words[string1]

answered Oct 17, 2011 at 13:22

3

string2 = ' '.join[set[string1.split[]]]

Explanation:

.split[] - it is a method to split string to list [without params it split by spaces]
set[] - it is type of unordered collections that exclude dublicates
'separator'.join[list] - mean that you want to join list from params to string with 'separator' between elements

answered Nov 9, 2018 at 8:33

3

string = 'calvin klein design dress calvin klein'

def uniquify[string]:
    output = []
    seen = set[]
    for word in string.split[]:
        if word not in seen:
            output.append[word]
            seen.add[word]
    return ' '.join[output]

print uniquify[string]

answered Oct 17, 2011 at 13:27

ekhumoroekhumoro

111k19 gold badges215 silver badges322 bronze badges

You can use a set to keep track of already processed words.

words = set[]
result = ''
for word in string1.split[]:
    if word not in words:
        result = result + word + ' '
        words.add[word]
print result

answered Oct 17, 2011 at 13:10

Pablo Santa CruzPablo Santa Cruz

172k31 gold badges237 silver badges290 bronze badges

2

Several answers are pretty close to this but haven't quite ended up where I did:

def uniques[ your_string ]:    
    seen = set[]
    return ' '.join[ seen.add[i] or i for i in your_string.split[] if i not in seen ]

Of course, if you want it a tiny bit cleaner or faster, we can refactor a bit:

def uniques[ your_string ]:    
    words = your_string.split[]

    seen = set[]
    seen_add = seen.add

    def add[x]:
        seen_add[x]  
        return x

    return ' '.join[ add[i] for i in words if i not in seen ]

I think the second version is about as performant as you can get in a small amount of code. [More code could be used to do all the work in a single scan across the input string but for most workloads, this should be sufficient.]

answered Oct 17, 2011 at 22:13

Chris PhillipsChris Phillips

10.8k3 gold badges33 silver badges45 bronze badges

Use numpy function make an import its better to have an alias for the import [as np]

import numpy as np

and then you can bing it like this for removing duplicates from array you can use it this way

no_duplicates_array = np.unique[your_array]

for your case if you want result in string you can use

no_duplicates_string = ' '.join[np.unique[your_string.split[]]]

answered Jun 8, 2020 at 12:04

Sulman MalikSulman Malik

1351 gold badge1 silver badge7 bronze badges

11 and 2 work perfectly:

    s="the sky is blue very blue"
    s=s.lower[]
    slist = s.split[]
    print " ".join[sorted[set[slist], key=slist.index]]

and 2

    s="the sky is blue very blue"
    s=s.lower[]
    slist = s.split[]
    print " ".join[sorted[set[slist], key=slist.index]]

answered Apr 17, 2016 at 16:38

1

Question: Remove the duplicates in a string

 from _collections import OrderedDict

    a = "Gina Gini Gini Protijayi"

    aa = OrderedDict[].fromkeys[a.split[]]
    print[' '.join[aa]]
   # output => Gina Gini Protijayi

answered Jun 16, 2018 at 23:44

You can remove duplicate or repeated words from a text file or string using following codes -

from collections import Counter
for lines in all_words:

    line=''.join[lines.lower[]]
    new_data1=' '.join[lemmatize_sentence[line]]
    new_data2 = word_tokenize[new_data1]
    new_data3=nltk.pos_tag[new_data2]

    # below code is for removal of repeated words

    for i in range[0, len[new_data3]]:
        new_data3[i] = "".join[new_data3[i]]
    UniqW = Counter[new_data3]
    new_data5 = " ".join[UniqW.keys[]]
    print [new_data5]


    new_data.append[new_data5]


print [new_data]

P.S. -Do identations as per required. Hope this helps!!!

answered Jun 25, 2018 at 7:22

Without using the split function [will help in interviews]

def unique_words2[a]:
    words = []
    spaces = ' '
    length = len[a]
    i = 0
    while i < length:
        if a[i] not in spaces:
            word_start = i
            while i < length and a[i] not in spaces:
                i += 1
            words.append[a[word_start:i]]
        i += 1
    words_stack = []
    for val in words:  #
        if val not in words_stack:  # We can replace these three lines with this one -> [words_stack.append[val] for val in words if val not in words_stack]
            words_stack.append[val]  #
    print[' '.join[words_stack]]  # or return, your choice


unique_words2['calvin klein design dress calvin klein'] 

Taazar

1,54518 silver badges27 bronze badges

answered Mar 6, 2020 at 12:06

initializing list

listA = [ 'xy-xy', 'pq-qr', 'xp-xp-xp', 'dd-ee']

print["Given list : ",listA]

using set[] and split[]

res = [set[sub.split['-']] for sub in listA]

Result

print["List after duplicate removal :", res] 

Peter Csala

12.3k15 gold badges25 silver badges55 bronze badges

answered Oct 17, 2021 at 12:39

1

You can do that simply by getting the set associated to the string, which is a mathematical object containing no repeated elements by definition. It suffices to join the words in the set back into a string:

def remove_duplicate_words[string]:
        x = string.split[]
        x = sorted[set[x], key = x.index]
        return ' '.join[x]

answered Nov 9, 2018 at 8:28

3

What is the easiest way to remove duplicates in Python?

5 Ways to Remove Duplicates from a List in Python.
Method 1: Naïve Method..
Method 2: Using a list comprehensive..
Method 3: Using set[].
Method 4: Using list comprehensive + enumerate[].
Method 5: Using collections. OrderedDict. fromkeys[].

How do I remove duplicates from a string in Python?

Given a string S, the task is to remove all the duplicates in the given string..
Sort the elements..
Now in a loop, remove duplicates by comparing the current character with previous character..
Remove extra characters at the end of the resultant string..

How do you find duplicate words in a string in Python?

Python.
string = "big black bug bit a big black dog on his big black nose";.
#Converts the string into lowercase..
string = string.lower[];.
#Split the string into words using built-in function..
words = string.split[" "];.
print["Duplicate words in a given string : "];.
for i in range[0, len[words]]:.
count = 1;.

How do I remove a duplicate character from a string?

We should have to use the following steps for removing duplicates..
In the first step, we have to convert the string into a character array..
Calculate the size of the array..
Call removeDuplicates[] method by passing the character array and the length..
Traverse all the characters present in the character array..

Chủ Đề