Hướng dẫn most_common function in python

Given the data set, we can find k number of most frequent words.

Nội dung chính

  • Recommended: Please try your approach on {IDE} first, before moving on to the solution.
  • How do I find the most frequent words in a Python file?
  • How do I find the most frequent words in a text file?
  • How do I find the most common data in Python?
  • How do you find duplicate words in a text file Python?

The solution of this problem already present as Find the k most frequent words from a file. But we can solve this problem very efficiently in Python with the help of some high performance modules.

In order to do this, we’ll use a high performance data type module, which is collections. This module got some specialized container datatypes and we will use counter class from this module.

Examples :

Input : "John is the son of John second. 
         Second son of John second is William second."
Output : [['second', 4], ['John', 3], ['son', 2], ['is', 2]]

Explanation :
1. The string will converted into list like this :
    ['John', 'is', 'the', 'son', 'of', 'John', 
     'second', 'Second', 'son', 'of', 'John', 
     'second', 'is', 'William', 'second']
2. Now 'most_common[4]' will return four most 
   frequent words and its count in tuple. 


Input : "geeks for geeks is for geeks. By geeks
         and for the geeks."
Output : [['geeks', 5], ['for', 3]]

Explanation :
most_common[2] will return two most frequent words and their count.

Recommended: Please try your approach on {IDE} first, before moving on to the solution.

Approach :

  1. Import Counter class from collections module.
  2. Split the string into list using split[], it will return the lists of words.
  3. Now pass the list to the instance of Counter class
  4. The function 'most-common[]' inside Counter will return the list of most frequent words from list and its count.

Below is Python implementation of above approach :

from collections import Counter

data_set = "Welcome to the world of Geeks " \

"This portal has been created to provide well written well" \

"thought and well explained solutions for selected questions " \

"If you like Geeks for Geeks and would like to contribute " \

"here is your chance You can write article and mail your article " \

" to contribute at geeksforgeeks org See your article appearing on " \

"the Geeks for Geeks main page and help thousands of other Geeks. " \

split_it = data_set.split[]

Counter = Counter[split_it]

most_occur = Counter.most_common[4]

print[most_occur]

Output :

[['Geeks', 5], ['to', 4], ['and', 4], ['article', 3]]

View Discussion

Improve Article

Save Article

  • Read
  • Discuss
  • View Discussion

    Improve Article

    Save Article

    Python provides inbuilt functions for creating, writing, and reading files. Two types of files can be handled in python, normal text files, and binary files [written in binary language,0s and 1s].

    • Text files: In this type of file, Each line of text is terminated with a special character called EOL [End of Line], which is the new line character [‘\n’] in python by default.
    • Binary files: In this type of file, there is no terminator for a line, and the data is stored after converting it into machine-understandable binary language.

    Here we are operating on the .txt file in Python. Through this program, we will find the most repeated word in a file.

    Approach:

    • We will take the content of the file as input.
    • We will save each word in a list after removing spaces and punctuation from the input string.
    • Find the frequency of each word.
    • Print the word which has a maximum frequency.

    Input File:

    Below is the implementation of the above approach:

    Python3

    file = open["gfg.txt","r"]

    frequent_word = ""

    frequency = 0 

    words = []

    for line in file:

        line_word = line.lower[].replace[',',''].replace['.',''].split[" "]; 

        for w in line_word: 

            words.append[w]; 

    for i in range[0, len[words]]: 

        count = 1

        for j in range[i+1, len[words]]: 

            if[words[i] == words[j]]: 

                count = count + 1

        if[count > frequency]: 

            frequency = count; 

            frequent_word = words[i]; 

    print["Most repeated word: " + frequent_word]

    print["Frequency: " + str[frequency]]

    file.close[];

    Output:

    Most repeated word: well
    Frequency: 3

    How do I find the most frequent words in a Python file?

    Approach :.

    Import Counter class from collections module..

    Split the string into list using split[], it will return the lists of words..

    Now pass the list to the instance of Counter class..

    The function 'most-common[]' inside Counter will return the list of most frequent words from list and its count..

    How do I find the most frequent words in a text file?

    This can be done by opening a file in read mode using file pointer. Read the file line by line. Split a line at a time and store in an array. Iterate through the array and find the frequency of each word and compare the frequency with maxcount.

    How do I find the most common data in Python?

    Use the max[] Function of FreqDist[] to Find the Most Common Elements of a List in Python. You can also use the max[] command of FreqDist[] to find the most common list elements in Python. For this, you import the nltk library first.

    How do you find duplicate words in a text file Python?

    In this post, we will learn how to find the duplicate words in a file in Python..

    Open the file in read mode..

    Initialize two empty set. ... .

    Iterate through the lines of the file with a loop..

    For each line, get the list of words by using split..

    Iterate through the words of each line by using a loop..

    Chủ Đề