This snipped code doesn't use the file, but it's easy to test and study. The main difference is that you must load the file and read per line as you did in your example
example_file = """
This is a text file example
Let's see how many time example is typed.
"""
result = {}
words = example_file.split[]
for word in words:
# if the word is not in the result dictionary, the default value is 0 + 1
result[word] = result.get[word, 0] + 1
for word, occurence in result.items[]:
print["word:%s; occurence:%s" % [word, occurence]]
UPDATE:
As suggested by @khachik a better solution is using the Counter
.
>>> # Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall[r'\w+', open['hamlet.txt'].read[].lower[]]
>>> Counter[words].most_common[10]
[['the', 1143], ['and', 966], ['to', 762], ['of', 669], ['i', 631],
['you', 554], ['a', 546], ['my', 514], ['hamlet', 471], ['in', 451]]
View Discussion
Improve Article
Save Article
View Discussion
Improve Article
Save Article
Python provides inbuilt functions for creating, writing, and reading files. Two types of files can be handled in python, normal text files, and binary files [written in binary language,0s and 1s].
- Text files: In this type of file, Each line of text is terminated with a special character called EOL [End of Line], which is the new line character [‘\n’] in python by default.
- Binary files: In this type of file, there is no terminator for a line, and the data is stored after converting it into machine-understandable binary language.
Here we are operating on the .txt file in Python. Through this program, we will find the most repeated word in a file.
Approach:
- We will take the content of the file as input.
- We will save each word in a list after removing spaces and punctuation from the input string.
- Find the frequency of each word.
- Print the word which has a maximum frequency.
Input File:
Below is the implementation of the above approach:
Python3
file
=
open
[
"gfg.txt"
,
"r"
]
frequent_word
=
""
frequency
=
0
words
=
[]
for
line
in
file
:
line_word
=
line.lower[].replace[
','
,'
'].replace['
.
','
'].split[
" "
];
for
w
in
line_word:
words.append[w];
for
i
in
range
[
0
,
len
[words]]:
count
=
1
;
for
j
in
range
[i
+
1
,
len
[words]]:
if
[words[i]
=
=
words[j]]:
count
=
count
+
1
;
if
[count > frequency]:
frequency
=
count;
frequent_word
=
words[i];
print
[
"Most repeated word: "
+
frequent_word]
print
[
"Frequency: "
+
str
[frequency]]
file
.close[];
Output:
Most repeated word: well Frequency: 3