Python find duplicate index in list

The Way of the Serpent

Pythonic Tips & Tricks Identifying and Indexing Duplicates

How to Identify and Index Duplicates in a List

Tonichi Edeza

Feb 21·5 min read

Photo by Wolfgang Hasselmann on Unsplash

Duplicate detecting and indexing is a fundamental skill every data scientist should have. When dealing with any dataset, it is important to identify and locate values that are identical. In this article, we shall take a look at several techniques you can use.

Lets get started!

So lets say you were given the below question.

Create a function that would check if a list has duplicate values, if it does have duplicate values list them all. If the list has no duplicates return No Duplicates.

Fairly simple question, let us see how to code it.

def duplicates[example_list]:
return len[example_list] != len[set[example_list]]
result_1 = duplicates[['1','2','3','5','5','6','6']]
result_2 = duplicates[['1','2','3','5','6']]
print[result_1]
print[result_2]
Output 1

The code above is able to check if the inputted list contains any duplicates. It will return True if the list does indeed have duplicates and False if it does not. However this only answers half of our question.

Let us find a way for it to return the values that are duplicated. To do this we can make use of an If-Else statement coupled with some list comprehension.

def duplicates_1[example_list]:
if [len[example_list] != len[set[example_list]]] is False:
return 'No Duplicates'
else:
return set[[i for i in example_list
if example_list.count[i] > 1]]

result_1 = duplicates_1[['1','2','3','5','5','6','6']]
result_2 = duplicates_1[['1','2','3','5','6']]
print[result_1]
print[result_2]
Output 2

Excellent, our function does exactly what the problem requires. But what if we wanted to extract more data from the list?

Create a function that would check if a list has duplicate values, if it does have duplicate values list them all along with their corresponding frequency. Return the results as a list of tuples arranged from most frequent to least frequent. If the list has no duplicates return No Duplicates.

This is a slightly more complicated issue than you may think. The below code gets us halfway there but has a noticeable flaw.

def duplicates_2[example_list]:
if [len[example_list] != len[set[example_list]]] is False:
return 'No Duplicates'
else:
return sorted[[[i, example_list.count[i]]
for i in example_list if
example_list.count[i] > 1],
key = lambda x:x[1], reverse = True]

result_1 = duplicates_2[['1','2','3','5',
'5','5','5','6','6','6','7','7','7','7','7']]
result_2 = duplicates_2[['1','2','3','5','6']]
print[result_1]
print[result_2]
Output 3

We can see that the function returns all the instances of the duplicated items when in fact we only need one tuple each. Let us try to address this by using by converting the list into a set.

Output 4

We can see that though we are able to generate the set, the order has now been changed. To go around this we recode our function as below.

def duplicates_2[example_list]:
if [len[example_list] != len[set[example_list]]] is False:
return 'No Duplicates'
else:
return sorted[list[set[[[i, example_list.count[i]]
for i in example_list if
example_list.count[i] > 1]]],
key = lambda x:x[1], reverse = True]

result_1 = duplicates_2[['1','2','3','5','5','5','5','6','6','6','7','7','7','7','7']]
result_2 = duplicates_2[['1','2','3','5','6']]
print[result_1]
print[result_2]
Output 5

We see that by simply changing the order of the operations we can arrive at our intended list. Lets try to increase the complexity of the function one more time.

Create a function that would check if a list has duplicate values, if it does have duplicate values list them all along with their corresponding frequency and the indexes of each duplicate value.

Return the results as a list of tuples arranged from most frequent to least frequent. If the list has no duplicates return No Duplicates.

Again, this is slightly more difficult than you may think.

def duplicates_3[example_list]:
if [len[example_list] != len[set[example_list]]] is False:
return 'No Duplicates'
else:
final_list = sorted[list[[[i, example_list.count[i],
[n for n, j in enumerate[sample] if j == i]]
for i in example_list if
example_list.count[i] > 1]],
key = lambda x:x[1], reverse = True]

return final_list
result_1 = duplicates_3[['1','2','3','5','5','5','5',
'6','6','6','7','7','7','7','7']]
result_2 = duplicates_3[['1','2','3','5','6']]
print[result_1]
print[result_2]
Output 6

Again we see that the function does what we want, the only issue is that it returns all the duplicated values when in fact we only need one of them. The intuitive fix is to simply convert the output into a set. However, we encounter this issue.

Output 7

Given the nature of the error, we have to find another way to create the function. The below code makes significant changes to the function.

def duplicates_3[example_list]:
if [len[example_list] != len[set[example_list]]] is False:
return 'No Duplicates'
else:
filtered_elements = set[[i for i in example_list if
example_list.count[i] > 1]]
filtered_counts = [example_list.count[i] for i in
filtered_elements]
filtered_indexes = []
for i in filtered_elements:
a = [n for n, j in enumerate[example_list] if j == i]
filtered_indexes.append[a]
final_list = list[zip[filtered_elements, filtered_counts,
filtered_indexes]]
return sorted[final_list, key = lambda x:x[1],
reverse = True]

result_1 = duplicates_3[['1','2','3','5','5','5','5',
'6','6','6','7','7','7','7','7']]
result_2 = duplicates_3[['1','2','3','5','5','6','6','6','7','7','7','7','7']]
result_3 = duplicates_3[['1','2','3','5','6']]
print[result_1]
print[result_2]
print[result_3]
Output 8

Excellent, our function now returns to us the data we need.

In Conclusion

Though seemingly simple, locating and indexing duplicates in a list can be quite tricky. We have seen that unexpected error can arise if we do not structure the function correctly. I hope that this article has helped give you a better idea of how you can best tackle problems that require duplicate identification. Though the examples we worked with in this article seem trivial, they do showcase the challenges you will encounter in your data science journey. In future articles, we shall learn how to apply these function to actual data.

Video liên quan

Chủ Đề