Hướng dẫn dùng duplicated python
Return DataFrame with duplicate rows removed. Considering certain columns is optional. Indexes, including time indexes are ignored. Only consider certain columns for identifying duplicates, by default use all of the columns. Determines which duplicates (if any) to keep. - Whether to drop duplicates in place or to return a copy. If True, the resulting axis will be labeled 0, 1, …, n - 1. New
in version 1.0.0. DataFrame with duplicates removed or None if Examples Consider dataset containing ramen rating. >>> df = pd.DataFrame({ ... 'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'], ... 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], ... 'rating': [4, 4, 3.5, 15, 5] ... }) >>> df brand style rating 0 Yum Yum cup 4.0 1 Yum Yum cup 4.0 2 Indomie cup 3.5 3 Indomie pack 15.0 4 Indomie pack 5.0 By default, it removes duplicate rows based on all columns. >>> df.drop_duplicates() brand style rating 0 Yum Yum cup 4.0 2 Indomie cup 3.5 3 Indomie pack 15.0 4 Indomie pack 5.0 To remove duplicates on specific column(s), use >>> df.drop_duplicates(subset=['brand']) brand style rating 0 Yum Yum cup 4.0 2 Indomie cup 3.5 To remove duplicates and keep last occurrences, use >>> df.drop_duplicates(subset=['brand', 'style'], keep='last') brand style rating 1 Yum Yum cup 4.0 2 Indomie cup 3.5 4 Indomie pack 5.0 Return boolean Series denoting duplicate rows. Considering certain columns is optional. Parameterssubsetcolumn label or sequence of labels, optionalOnly consider certain columns for identifying duplicates, by default use all of the columns. keep{‘first’, ‘last’, False}, default ‘first’Determines which duplicates (if any) to mark.
Boolean series for each duplicated rows. Examples Consider dataset containing ramen rating. >>> df = pd.DataFrame({ ... 'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'], ... 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], ... 'rating': [4, 4, 3.5, 15, 5] ... }) >>> df brand style rating 0 Yum Yum cup 4.0 1 Yum Yum cup 4.0 2 Indomie cup 3.5 3 Indomie pack 15.0 4 Indomie pack 5.0 By default, for each set of duplicated values, the first occurrence is set on False and all others on True. >>> df.duplicated() 0 False 1 True 2 False 3 False 4 False dtype: bool By using ‘last’, the last occurrence of each set of duplicated values is set on False and all others on True. >>> df.duplicated(keep='last') 0 True 1 False 2 False 3 False 4 False dtype: bool By setting >>> df.duplicated(keep=False) 0 True 1 True 2 False 3 False 4 False dtype: bool To find duplicates on specific column(s), use >>> df.duplicated(subset=['brand']) 0 False 1 True 2 False 3 True 4 True dtype: bool |