Hướng dẫn how does python handle duplicate rows? - python xử lý các hàng trùng lặp như thế nào?

Question

Phương thức pandas.dataFrame.dplated () được sử dụng để tìm các hàng trùng lặp trong một khung dữ liệu. Nó trả về một chuỗi Boolean xác định xem một hàng là trùng lặp hay duy nhất.

Nội dung chính Show

Tạo một khung dữ liệu
pandas.DataFrame.duplicated()
Sử dụng hàm dataFrame.dplicdy ()
Sử dụng tham số Keep
Cài đặt giữ như ’đầu tiên
Cài đặt giữ như ’cuối cùng
Cài đặt giữ như ’cuối cùng
Using the subset parameter
Practical Tips
Tham số tập hợp con được sử dụng để chỉ định các cột trong đó các bản sao sẽ được tìm kiếm. Sau khi bạn đã chỉ định các cột, phương thức sẽ tìm kiếm các hàng trùng lặp bằng cách so sánh các giá trị của các cột được chỉ định giữa các hàng.
Tạo một khung dữ liệu
pandas.DataFrame.duplicated()
Sử dụng hàm dataFrame.dplicdy ()
Sử dụng tham số Keep
Cài đặt giữ như ’đầu tiên
Cài đặt giữ như ’cuối cùng
Cài đặt giữ như ’cuối cùng
Using the subset parameter
Practical Tips
Tham số tập hợp con được sử dụng để chỉ định các cột trong đó các bản sao sẽ được tìm kiếm. Sau khi bạn đã chỉ định các cột, phương thức sẽ tìm kiếm các hàng trùng lặp bằng cách so sánh các giá trị của các cột được chỉ định giữa các hàng.

Trong bài viết này, bạn sẽ tìm hiểu cách sử dụng phương pháp này để xác định các hàng trùng lặp trong một khung dữ liệu. Bạn cũng sẽ biết một vài mẹo thực tế để sử dụng phương pháp này.

Tạo một khung dữ liệu

# Create a DataFrame
import pandas as pd
data_df = {'Name': ['Arpit', 'Riya', 'Priyanka', 'Aman', 'Arpit', 'Rohan', 'Riya', 'Sakshi'],

           'Employment Type': ['Full-time Employee', 'Part-time Employee', 'Intern', 'Intern',
                               'Full-time Employee', 'Part-time Employee', 'Part-time Employee', 'Full-time Employee'],

           'Department': ['Administration', 'Marketing', 'Technical', 'Marketing',
                          'Administration', 'Technical', 'Marketing', 'Administration']}

df = pd.DataFrame(data_df)
df

Also Đọc: Tạo và tải DataFrames.

Also read: creating and loading DataFrames.

pandas.DataFrame.duplicated()

Cú pháp: pandas.dataframe.duplicated (tập hợp con = không, giữ = ’đầu tiên) Mục đích: Để xác định các hàng trùng lặp trong một khung dữ liệu pandas.DataFrame.duplicated(subset=None, keep= ‘first’)Purpose: To identify duplicate rows in a DataFrame
Parameters:
- Tập hợp con: (mặc định: Không có). Nó được sử dụng để chỉ định các cột cụ thể trong đó các giá trị trùng lặp sẽ được tìm kiếm.(default: None). It is used to specify the particular columns in which duplicate values are to be searched.
- Giữ: ’đầu tiên hoặc’ cuối cùng hoặc sai (mặc định: ’đầu tiên). Nó được sử dụng để chỉ định thể hiện của các hàng lặp lại sẽ được xác định là một hàng duy nhất.‘first’ or ‘last’ or False (default: ‘first’). It is used to specify which instance of the repeated rows is to be identified as a unique row.
Trả về: Một chuỗi boolean trong đó giá trị true chỉ ra rằng hàng ở chỉ mục tương ứng là một bản sao và sai chỉ ra rằng hàng là duy nhất. A Boolean series where the value True indicates that the row at the corresponding index is a duplicate and False indicates that the row is unique.

Sử dụng hàm dataFrame.dplicdy ()

Khi bạn trực tiếp sử dụng hàm

0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool

6, các giá trị mặc định sẽ được chuyển đến các tham số để tìm kiếm các hàng trùng lặp trong DataFrame.

# Use the DataFrame.duplicated() method to return a series of boolean values
bool_series = df.duplicated()

0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool

Sử dụng tham số Keep

Bạn có thể sử dụng tham số Keep để chỉ định phiên bản nào của lặp lại nên được coi là duy nhất và các trường hợp còn lại sẽ được coi là trùng lặp.

Cài đặt giữ như ’đầu tiên

Giá trị mặc định của tham số Keep là ‘đầu tiên. Điều đó có nghĩa là phương pháp sẽ coi trường hợp đầu tiên của một hàng là duy nhất và các trường hợp còn lại là sao chép.

Hãy để cố gắng loại bỏ các hàng trùng lặp.

# Use the keep parameter to consider only the first instance of a duplicate row to be unique
bool_series = df.duplicated(keep='first')
print('Boolean series:')
print(bool_series)
print('\n')
print('DataFrame after keeping only the first instance of the duplicate rows:')

# The `~` sign is used for negation. It changes the boolean value True to False and False to True.

df[~bool_series]

Boolean series:
0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool


DataFrame after keeping only the first instance of the duplicate rows:

As Bạn có thể thấy, hàng thứ năm và thứ bảy đã được xác định là trùng lặp. Hàng thứ năm là bản sao của hàng đầu tiên và hàng thứ bảy là bản sao của hàng thứ hai. Do đó, chúng đã bị xóa khỏi DataFrame

Hướng dẫn how does python handle duplicate rows? - python xử lý các hàng trùng lặp như thế nào?

As you can see, the fifth and the seventh row have been identified as duplicates. The fifth row is a duplicate of the first row and the seventh row is a duplicate of the second row. Hence, they have been removed from the DataFrame

Cài đặt giữ như ’cuối cùng

Khi bạn đặt giá trị của tham số này là ‘cuối cùng, phương thức sẽ coi phiên bản cuối cùng của một hàng là duy nhất và các trường hợp còn lại là trùng lặp.

Hãy để cố gắng loại bỏ các hàng trùng lặp.

# Use the keep parameter to consider only the first instance of a duplicate row to be unique
bool_series = df.duplicated(keep='first')
print('Boolean series:')
print(bool_series)
print('\n')
print('DataFrame after keeping only the first instance of the duplicate rows:')

# The `~` sign is used for negation. It changes the boolean value True to False and False to True.

df[~bool_series]

Boolean series:
0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool


DataFrame after keeping only the first instance of the duplicate rows:

As Bạn có thể thấy, hàng thứ năm và thứ bảy đã được xác định là trùng lặp. Hàng thứ năm là bản sao của hàng đầu tiên và hàng thứ bảy là bản sao của hàng thứ hai. Do đó, chúng đã bị xóa khỏi DataFrame

Here, the first and the second rows have been identified as duplicates while the fifth and the seventh rows have been considered to be unique.

Cài đặt giữ như ’cuối cùng

Khi bạn đặt giá trị của tham số này là ‘cuối cùng, phương thức sẽ coi phiên bản cuối cùng của một hàng là duy nhất và các trường hợp còn lại là trùng lặp.

Hãy để cố gắng loại bỏ các hàng trùng lặp.

# Use the keep parameter to consider only the first instance of a duplicate row to be unique
bool_series = df.duplicated(keep='first')
print('Boolean series:')
print(bool_series)
print('\n')
print('DataFrame after keeping only the first instance of the duplicate rows:')

# The `~` sign is used for negation. It changes the boolean value True to False and False to True.

df[~bool_series]

Boolean series:
0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool


DataFrame after keeping only the first instance of the duplicate rows:

As Bạn có thể thấy, hàng thứ năm và thứ bảy đã được xác định là trùng lặp. Hàng thứ năm là bản sao của hàng đầu tiên và hàng thứ bảy là bản sao của hàng thứ hai. Do đó, chúng đã bị xóa khỏi DataFrame

Cài đặt giữ như ’cuối cùng

Khi bạn đặt giá trị của tham số này là ‘cuối cùng, phương thức sẽ coi phiên bản cuối cùng của một hàng là duy nhất và các trường hợp còn lại là trùng lặp.

# Use the keep parameter to consider only the first instance of a duplicate row to be unique
bool_series = df.duplicated(keep='first')
print('Boolean series:')
print(bool_series)
print('\n')
print('DataFrame after keeping only the first instance of the duplicate rows:')

# The `~` sign is used for negation. It changes the boolean value True to False and False to True.

df[~bool_series]

Boolean series:
0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool


DataFrame after keeping only the first instance of the duplicate rows:

As Bạn có thể thấy, hàng thứ năm và thứ bảy đã được xác định là trùng lặp. Hàng thứ năm là bản sao của hàng đầu tiên và hàng thứ bảy là bản sao của hàng thứ hai. Do đó, chúng đã bị xóa khỏi DataFrame

Khi bạn đặt giá trị của tham số này là ‘cuối cùng, phương thức sẽ coi phiên bản cuối cùng của một hàng là duy nhất và các trường hợp còn lại là trùng lặp.

# Use the keep parameter to consider only the last instance of a duplicate row to be unique
bool_series = df.duplicated(keep='last')
print('Boolean series:')
print(bool_series)
print('\n')
print('DataFrame after keeping only the last instance of the duplicate rows:')

# The `~` sign is used for negation. It changes the boolean value True to False and False to True
df[~bool_series]

Boolean series:
0     True
1     True
2    False
3    False
4    False
5    False
6    False
7    False
dtype: bool


DataFrame after keeping only the last instance of the duplicate rows:

HER, hàng thứ nhất và thứ hai đã được xác định là trùng lặp trong khi các hàng thứ năm và thứ bảy được coi là duy nhất.

Using the subset parameter

Cài đặt giữ sai
After you have specified the columns, the method will search for duplicate rows by comparing the values of only the specified columns between the rows.

Nếu bạn đặt giá trị của Keep là giá trị boolean sai, thì phương thức sẽ xem xét tất cả các trường hợp của một hàng là trùng lặp.

Nhận khóa học Python hoàn thành miễn phí

Practical Tips

Đối mặt với tình huống tương tự như mọi người khác?

Xây dựng sự nghiệp khoa học dữ liệu của bạn với trình độ được công nhận trên toàn cầu, được công nghiệp phê duyệt. Có được suy nghĩ, sự tự tin và các kỹ năng làm cho nhà khoa học dữ liệu trở nên có giá trị.all the values of the specified columns must be same in the rows to consider them as duplicates.

# Use the keep parameter to consider all instances of a row to be duplicates
bool_series = df.duplicated(keep=False)
print('Boolean series:')
print(bool_series)
print('\n')
print('DataFrame after removing all the instances of the duplicate rows:')
# The `~` sign is used for negation. It changes the boolean value True to False and False to True
df[~bool_series]

Boolean series:
0     True
1     True
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool


DataFrame after removing all the instances of the duplicate rows:

sử dụng tham số tập hợp con

Although this method returns a series which only identifies the duplicate rows in a DataFrame, you can use this series to subset the DataFrame so that it contains only unique values.

Tham số tập hợp con được sử dụng để chỉ định các cột trong đó các bản sao sẽ được tìm kiếm. Sau khi bạn đã chỉ định các cột, phương thức sẽ tìm kiếm các hàng trùng lặp bằng cách so sánh các giá trị của các cột được chỉ định giữa các hàng.

Điều này cực kỳ hữu ích vì bạn có thể chỉ tìm thấy trong việc tìm kiếm các giá trị trùng lặp chỉ trong một vài cột. The False value for the keep parameter is used to remove all the duplicate rows from the DataFrame. True or False?

Answer:

# Use the subset parameter to search for duplicate values only in the Name column of the DataFrame

bool_series = df.duplicated(subset='Name')

print('Boolean series:')
print(bool_series)
print('\n')
print('DataFrame after removing duplicates found in the Name column:')
df[~bool_series]

____10 10 Lời khuyên False. The False value identifies all the instances of a row to be duplicates but it does not remove them

Nếu bạn không sử dụng tham số tập hợp con, thì tất cả các giá trị trong các hàng cần phải được xác định là sao chép. How are the duplicate rows identified when multiple columns are passed to the subset parameter?

Answer:

Bạn cũng có thể chuyển nhiều cột vào tham số tập hợp con. Tuy nhiên, hãy nhớ rằng, tất cả các giá trị của các cột được chỉ định phải giống nhau trong các hàng để coi chúng là trùng lặp. When multiple columns are passed to the subset parameter, the method will consider a row to be a duplicate only if the values of all the specified columns in that row matches with the values of the specified columns in another row.

# Use the DataFrame.duplicated() method to return a series of boolean values
bool_series = df.duplicated()

1
# Use the DataFrame.duplicated() method to return a series of boolean values bool_series = df.duplicated()
2 Trong khi phương thức này trả về một chuỗi chỉ xác định các hàng trùng lặp trong khung dữ liệu, bạn có thể sử dụng chuỗi này để tập hợp dữ liệu để nó chỉ chứa các giá trị duy nhất. Write the code to remove all the instances of duplicate rows from the DataFrame df apart from the first instance of the rows.

Answer:

Answer:

0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool

7

0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool

8

Câu 4: Viết mã để xóa tất cả các phiên bản của các hàng trùng lặp khỏi DataFrame DF ngoài phiên bản cuối cùng của các hàng. Write the code to remove all the instances of duplicate rows from the DataFrame df apart from the last instance of the rows.

Answer:

Answer:

0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool

9

0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool

8

Câu 5: Viết mã để tìm kiếm các giá trị trùng lặp trong các cột COL_1 và COL_2 trong DataFrame DF. Write the code to search for duplicate values in the columns col_1 and col_2 in the DataFrame df.

Answer:

Trả lời:

# Use the keep parameter to consider only the first instance of a duplicate row to be unique
bool_series = df.duplicated(keep='first')
print('Boolean series:')
print(bool_series)
print('\n')
print('DataFrame after keeping only the first instance of the duplicate rows:')

# The `~` sign is used for negation. It changes the boolean value True to False and False to True.

df[~bool_series]

1

# Use the keep parameter to consider only the first instance of a duplicate row to be unique
bool_series = df.duplicated(keep='first')
print('Boolean series:')
print(bool_series)
print('\n')
print('DataFrame after keeping only the first instance of the duplicate rows:')

# The `~` sign is used for negation. It changes the boolean value True to False and False to True.

df[~bool_series]

1

Bài báo được đóng góp bởi Shreyansh B và Shri Varsheni

Trong bài viết này, bạn sẽ tìm hiểu cách sử dụng phương pháp này để xác định các hàng trùng lặp trong một khung dữ liệu. Bạn cũng sẽ biết một vài mẹo thực tế để sử dụng phương pháp này.

Tạo một khung dữ liệu

# Create a DataFrame
import pandas as pd
data_df = {'Name': ['Arpit', 'Riya', 'Priyanka', 'Aman', 'Arpit', 'Rohan', 'Riya', 'Sakshi'],

           'Employment Type': ['Full-time Employee', 'Part-time Employee', 'Intern', 'Intern',
                               'Full-time Employee', 'Part-time Employee', 'Part-time Employee', 'Full-time Employee'],

           'Department': ['Administration', 'Marketing', 'Technical', 'Marketing',
                          'Administration', 'Technical', 'Marketing', 'Administration']}

df = pd.DataFrame(data_df)
df

Also Đọc: Tạo và tải DataFrames.

Also read: creating and loading DataFrames.

pandas.DataFrame.duplicated()

Cú pháp: pandas.dataframe.duplicated (tập hợp con = không, giữ = ’đầu tiên) Mục đích: Để xác định các hàng trùng lặp trong một khung dữ liệu pandas.DataFrame.duplicated(subset=None, keep= ‘first’)Purpose: To identify duplicate rows in a DataFrame
Parameters:
- Tập hợp con: (mặc định: Không có). Nó được sử dụng để chỉ định các cột cụ thể trong đó các giá trị trùng lặp sẽ được tìm kiếm.(default: None). It is used to specify the particular columns in which duplicate values are to be searched.
- Giữ: ’đầu tiên hoặc’ cuối cùng hoặc sai (mặc định: ’đầu tiên). Nó được sử dụng để chỉ định thể hiện của các hàng lặp lại sẽ được xác định là một hàng duy nhất.‘first’ or ‘last’ or False (default: ‘first’). It is used to specify which instance of the repeated rows is to be identified as a unique row.
Trả về: Một chuỗi boolean trong đó giá trị true chỉ ra rằng hàng ở chỉ mục tương ứng là một bản sao và sai chỉ ra rằng hàng là duy nhất. A Boolean series where the value True indicates that the row at the corresponding index is a duplicate and False indicates that the row is unique.

Sử dụng hàm dataFrame.dplicdy ()

Khi bạn trực tiếp sử dụng hàm

0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool

6, các giá trị mặc định sẽ được chuyển đến các tham số để tìm kiếm các hàng trùng lặp trong DataFrame.

# Use the DataFrame.duplicated() method to return a series of boolean values
bool_series = df.duplicated()

0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool

Sử dụng tham số Keep

Bạn có thể sử dụng tham số Keep để chỉ định phiên bản nào của lặp lại nên được coi là duy nhất và các trường hợp còn lại sẽ được coi là trùng lặp.

Cài đặt giữ như ’đầu tiên

Giá trị mặc định của tham số Keep là ‘đầu tiên. Điều đó có nghĩa là phương pháp sẽ coi trường hợp đầu tiên của một hàng là duy nhất và các trường hợp còn lại là sao chép.

Hãy để cố gắng loại bỏ các hàng trùng lặp.

# Use the keep parameter to consider only the first instance of a duplicate row to be unique
bool_series = df.duplicated(keep='first')
print('Boolean series:')
print(bool_series)
print('\n')
print('DataFrame after keeping only the first instance of the duplicate rows:')

# The `~` sign is used for negation. It changes the boolean value True to False and False to True.

df[~bool_series]

Boolean series:
0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool


DataFrame after keeping only the first instance of the duplicate rows:

As Bạn có thể thấy, hàng thứ năm và thứ bảy đã được xác định là trùng lặp. Hàng thứ năm là bản sao của hàng đầu tiên và hàng thứ bảy là bản sao của hàng thứ hai. Do đó, chúng đã bị xóa khỏi DataFrame

As you can see, the fifth and the seventh row have been identified as duplicates. The fifth row is a duplicate of the first row and the seventh row is a duplicate of the second row. Hence, they have been removed from the DataFrame

Cài đặt giữ như ’cuối cùng

Khi bạn đặt giá trị của tham số này là ‘cuối cùng, phương thức sẽ coi phiên bản cuối cùng của một hàng là duy nhất và các trường hợp còn lại là trùng lặp.

Hãy để cố gắng loại bỏ các hàng trùng lặp.

# Use the keep parameter to consider only the first instance of a duplicate row to be unique
bool_series = df.duplicated(keep='first')
print('Boolean series:')
print(bool_series)
print('\n')
print('DataFrame after keeping only the first instance of the duplicate rows:')

# The `~` sign is used for negation. It changes the boolean value True to False and False to True.

df[~bool_series]

Boolean series:
0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool


DataFrame after keeping only the first instance of the duplicate rows:

As Bạn có thể thấy, hàng thứ năm và thứ bảy đã được xác định là trùng lặp. Hàng thứ năm là bản sao của hàng đầu tiên và hàng thứ bảy là bản sao của hàng thứ hai. Do đó, chúng đã bị xóa khỏi DataFrame

Here, the first and the second rows have been identified as duplicates while the fifth and the seventh rows have been considered to be unique.

Cài đặt giữ như ’cuối cùng

Khi bạn đặt giá trị của tham số này là ‘cuối cùng, phương thức sẽ coi phiên bản cuối cùng của một hàng là duy nhất và các trường hợp còn lại là trùng lặp.

Hãy để cố gắng loại bỏ các hàng trùng lặp.

# Use the keep parameter to consider only the first instance of a duplicate row to be unique
bool_series = df.duplicated(keep='first')
print('Boolean series:')
print(bool_series)
print('\n')
print('DataFrame after keeping only the first instance of the duplicate rows:')

# The `~` sign is used for negation. It changes the boolean value True to False and False to True.

df[~bool_series]

Boolean series:
0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool


DataFrame after keeping only the first instance of the duplicate rows:

As Bạn có thể thấy, hàng thứ năm và thứ bảy đã được xác định là trùng lặp. Hàng thứ năm là bản sao của hàng đầu tiên và hàng thứ bảy là bản sao của hàng thứ hai. Do đó, chúng đã bị xóa khỏi DataFrame

Using the subset parameter

Cài đặt giữ như ’cuối cùng
After you have specified the columns, the method will search for duplicate rows by comparing the values of only the specified columns between the rows.

Khi bạn đặt giá trị của tham số này là ‘cuối cùng, phương thức sẽ coi phiên bản cuối cùng của một hàng là duy nhất và các trường hợp còn lại là trùng lặp.

# Use the keep parameter to consider only the last instance of a duplicate row to be unique
bool_series = df.duplicated(keep='last')
print('Boolean series:')
print(bool_series)
print('\n')
print('DataFrame after keeping only the last instance of the duplicate rows:')

# The `~` sign is used for negation. It changes the boolean value True to False and False to True
df[~bool_series]

Boolean series:
0     True
1     True
2    False
3    False
4    False
5    False
6    False
7    False
dtype: bool


DataFrame after keeping only the last instance of the duplicate rows:

HER, hàng thứ nhất và thứ hai đã được xác định là trùng lặp trong khi các hàng thứ năm và thứ bảy được coi là duy nhất.

Practical Tips

Cài đặt giữ sai

Nếu bạn đặt giá trị của Keep là giá trị boolean sai, thì phương thức sẽ xem xét tất cả các trường hợp của một hàng là trùng lặp.all the values of the specified columns must be same in the rows to consider them as duplicates.

# Use the keep parameter to consider all instances of a row to be duplicates
bool_series = df.duplicated(keep=False)
print('Boolean series:')
print(bool_series)
print('\n')
print('DataFrame after removing all the instances of the duplicate rows:')
# The `~` sign is used for negation. It changes the boolean value True to False and False to True
df[~bool_series]

Boolean series:
0     True
1     True
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool


DataFrame after removing all the instances of the duplicate rows:

sử dụng tham số tập hợp con

Although this method returns a series which only identifies the duplicate rows in a DataFrame, you can use this series to subset the DataFrame so that it contains only unique values.

Tham số tập hợp con được sử dụng để chỉ định các cột trong đó các bản sao sẽ được tìm kiếm. Sau khi bạn đã chỉ định các cột, phương thức sẽ tìm kiếm các hàng trùng lặp bằng cách so sánh các giá trị của các cột được chỉ định giữa các hàng.

Điều này cực kỳ hữu ích vì bạn có thể chỉ tìm thấy trong việc tìm kiếm các giá trị trùng lặp chỉ trong một vài cột. The False value for the keep parameter is used to remove all the duplicate rows from the DataFrame. True or False?

Answer:

# Use the subset parameter to search for duplicate values only in the Name column of the DataFrame

bool_series = df.duplicated(subset='Name')

print('Boolean series:')
print(bool_series)
print('\n')
print('DataFrame after removing duplicates found in the Name column:')
df[~bool_series]

____10 10 Lời khuyên False. The False value identifies all the instances of a row to be duplicates but it does not remove them

Nếu bạn không sử dụng tham số tập hợp con, thì tất cả các giá trị trong các hàng cần phải được xác định là sao chép. How are the duplicate rows identified when multiple columns are passed to the subset parameter?

Answer:

Bạn cũng có thể chuyển nhiều cột vào tham số tập hợp con. Tuy nhiên, hãy nhớ rằng, tất cả các giá trị của các cột được chỉ định phải giống nhau trong các hàng để coi chúng là trùng lặp. When multiple columns are passed to the subset parameter, the method will consider a row to be a duplicate only if the values of all the specified columns in that row matches with the values of the specified columns in another row.

# Use the DataFrame.duplicated() method to return a series of boolean values
bool_series = df.duplicated()

1
# Use the DataFrame.duplicated() method to return a series of boolean values bool_series = df.duplicated()
2 Trong khi phương thức này trả về một chuỗi chỉ xác định các hàng trùng lặp trong khung dữ liệu, bạn có thể sử dụng chuỗi này để tập hợp dữ liệu để nó chỉ chứa các giá trị duy nhất. Write the code to remove all the instances of duplicate rows from the DataFrame df apart from the first instance of the rows.

Answer:

Answer:

0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool

7

0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool

8

Câu 4: Viết mã để xóa tất cả các phiên bản của các hàng trùng lặp khỏi DataFrame DF ngoài phiên bản cuối cùng của các hàng. Write the code to remove all the instances of duplicate rows from the DataFrame df apart from the last instance of the rows.

Answer:

Answer:

0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool

9

0    False
1    False
2    False
3    False
4     True
5    False
6     True
7    False
dtype: bool

8

Câu 5: Viết mã để tìm kiếm các giá trị trùng lặp trong các cột COL_1 và COL_2 trong DataFrame DF. Write the code to search for duplicate values in the columns col_1 and col_2 in the DataFrame df.

Answer:

Trả lời:

# Use the keep parameter to consider only the first instance of a duplicate row to be unique
bool_series = df.duplicated(keep='first')
print('Boolean series:')
print(bool_series)
print('\n')
print('DataFrame after keeping only the first instance of the duplicate rows:')

# The `~` sign is used for negation. It changes the boolean value True to False and False to True.

df[~bool_series]

1

# Use the keep parameter to consider only the first instance of a duplicate row to be unique
bool_series = df.duplicated(keep='first')
print('Boolean series:')
print(bool_series)
print('\n')
print('DataFrame after keeping only the first instance of the duplicate rows:')

# The `~` sign is used for negation. It changes the boolean value True to False and False to True.

df[~bool_series]

1

Bài báo được đóng góp bởi Shreyansh B và Shri Varsheni

programming python Pandas duplicated Duplicate in Python Count duplicate pandas Pandas Drop duplicates Series duplicated

Hướng dẫn how does python handle duplicate rows? - python xử lý các hàng trùng lặp như thế nào?

Tạo một khung dữ liệu

pandas.DataFrame.duplicated()

Sử dụng hàm dataFrame.dplicdy ()

Sử dụng tham số Keep

Cài đặt giữ như ’đầu tiên

Cài đặt giữ như ’cuối cùng

Cài đặt giữ như ’cuối cùng

Using the subset parameter

Practical Tips

Tạo một khung dữ liệu

pandas.DataFrame.duplicated()

Sử dụng hàm dataFrame.dplicdy ()

Sử dụng tham số Keep

Cài đặt giữ như ’đầu tiên

Cài đặt giữ như ’cuối cùng

Cài đặt giữ như ’cuối cùng

Using the subset parameter

Practical Tips

Bài Viết Liên Quan

Quảng Cáo

Có thể bạn quan tâm

Toplist được quan tâm

Quảng cáo

Xem Nhiều

Quảng cáo

Chúng tôi

Điều khoản

Trợ giúp

Mạng xã hội