Find the most frequent words from a file python

View Discussion

Improve Article

Save Article

Read

Discuss

View Discussion

Improve Article

Save Article

Python provides inbuilt functions for creating, writing, and reading files. Two types of files can be handled in python, normal text files, and binary files [written in binary language,0s and 1s].

Text files: In this type of file, Each line of text is terminated with a special character called EOL [End of Line], which is the new line character [‘\n’] in python by default.
Binary files: In this type of file, there is no terminator for a line, and the data is stored after converting it into machine-understandable binary language.

Here we are operating on the .txt file in Python. Through this program, we will find the most repeated word in a file.

Approach:

We will take the content of the file as input.
We will save each word in a list after removing spaces and punctuation from the input string.
Find the frequency of each word.
Print the word which has a maximum frequency.

Input File:

Below is the implementation of the above approach:

Python3

file = open["gfg.txt","r"]

frequent_word = ""

frequency = 0

words = []

for line in file:

line_word = line.lower[].replace[',',''].replace['.',''].split[" "];

for w in line_word:

words.append[w];

for i in range[0, len[words]]:

count = 1;

for j in range[i+1, len[words]]:

if[words[i] == words[j]]:

count = count + 1;

if[count > frequency]:

frequency = count;

frequent_word = words[i];

print["Most repeated word: " + frequent_word]

print["Frequency: " + str[frequency]]

file.close[];

Output:

Most repeated word: well
Frequency: 3

Hello python learners! In this session, we will be learning how to find the most frequent words in a text read from a file. Instead of doing on normal text let us do this on a text read from a file. For better understanding, we need to be familiar with files and the operations on files. So, let’s learn about files

Handling files in python

Data is often stored in text files, which is organized. There are many kinds of files. Text files, music files, videos, and various word processor and presentation documents are those we are familiar with.

Text files only contain characters whereas, all the other file formats include formatting information that is specific to that file format. Operations performed on the data in files include the read and write operations. To perform any operation the program must open the file. The syntax to open a file is given below:

with open[«filename», «mode»] as «variable»:
«block»

Though there are several ways of opening a file I prefer this way because we need not specify the close statement at the end.

For more understanding on files go through this link handling files

Reading a file:

There are several techniques for reading files. One way is reading the overall contents of the file into a string and we also have iterative techniques in which in each iteration one line of text is read. We, can also read each line of text and store them all in a list. The syntax for each technique is given below

#to read the entire contents of text into a single string 
with open['file1.txt', 'r'] as f:
contents = f.read[]
#to read each line and store them as list
with open['file1.txt', 'r'] as f:
lines = f.readlines[]
#for iterative method of reading text in files
with open['planets.txt', 'r'] as f:
    for line in f:
    print[len[line]]

As our job is to just read the contents of the file and then finding the most frequent word in a text read from a file we have no space for the write operation. In case you want to learn it go through this link text file in Python

Now let’s get into our job of finding the most frequent words from a text read from a file.

Most frequent words in a text file with Python

First, you have to create a text file and save the text file in the same directory where you will save your python program. Because once you specify the file name for opening it the interpreter searches the file in the same directory of the program. Make sure you have created and saved the file in proper directory.

The algorithm we are going to follow is quite simple first we open the file then we read the contents we will see how many times each word is repeated and store them in a variable called count. Then we check it with the maximum count which is initialized as zero in the beginning. If count is less than maximum count we ignore the word if it is equal we will place it in a list. Otherwise, if it is greater then we clear the list and place this word in the list.

Let us start with initializing variables and opening file

fname=input["enter file name"]
count=0             #count of a specific word
maxcount=0          #maximum among the count of each words
l=[]                #list to store the words with maximum count
with open[fname,'r'] as f:

we have opened the file as f and we will be using f whenever we have to specify the file.

Now we have to read the contents. We have many techniques for that as we have previously discussed. But, the thing is that we should take the most reliable one for our task. As we are concerned with the words of the file, it would be better if we read the entire contents. And, then we split the string into a list with the words in the string using split method.

Reading contents:

with open[fname,'r'] as f:
    contents=f.read[]
    words=content.split[]

Finding the most frequent word:

Now, we have all the words in a list we will implement the algorithm discussed early

for i in range[len[words]]:
    for j in range[len[words]]:
        if[words[i]==words[j]]:        #finding count of each word
            count+=1
        else:
            count=count
        if[count==maxcount]:          #comparing with maximum count
            l.append[words[i]]
        elif[count>maxcount]:         #if count greater than maxcount
            l.clear[]
            l.append[words[i]]
            maxcount=count
        else:
            l=l
        count=0
print[l]                              #printing contents of l

Now, we have the most frequent words in the list ‘l’ that will be printed at last.

Output:

Let us consider you have a text file with contents like this

Hi, friends this program is found in codespeedy.
This program works perfectly

Then your output will be

[program]

Hope you like this session guys.

How do I find the most frequent words in a file?

This can be done by opening a file in read mode using file pointer. Read the file line by line. Split a line at a time and store in an array. Iterate through the array and find the frequency of each word and compare the frequency with maxcount.

How do I print the most repeated words in Python?

Method #3: Using list[] and Counter[] Append all words to empty list and calculate frequency of all words using Counter[] function. Find max count and print that key.

How do I find the most frequent words in a string?

Check if a large number is divisible by 11 or not..

Difference of two large numbers..

Maximum segment value after putting k breakpoints in a number..

Calculate maximum value using '+' or '*' sign between two numbers in a string..

Multiply Large Numbers represented as Strings..

Most frequent word in an array of strings..

How do I find the most common data in Python?

Use the max[] Function of FreqDist[] to Find the Most Common Elements of a List in Python. You can also use the max[] command of FreqDist[] to find the most common list elements in Python. For this, you import the nltk library first.