I have a folder full of files and they don't have an extension. How can I check file types? I want to check the file type and change the filename accordingly. Let's assume a function filetype[x]
returns a file type like png
. I want to do this:
files = os.listdir["."]
for f in files:
os.rename[f, f+filetype[f]]
How do I do this?
martineau
115k25 gold badges160 silver badges283 bronze badges
asked Jun 7, 2012 at 18:06
9
There are Python libraries that can recognize files based on their content [usually a header / magic number] and that don't rely on the file name or extension.
If you're addressing many different file types, you can use python-magic
. That's just a Python binding for the well-established magic
library. This has a good reputation and [small
endorsement] in the limited use I've made of it, it has been solid.
There are also libraries for more specialized file types. For example, the Python standard library has the imghdr
module that does the same thing just for image file types.
If you need dependency-free [pure Python] file type checking, see filetype
.
phoenix
6,3884 gold badges36 silver badges44 bronze badges
answered Jun 7, 2012 at 18:43
Chris JohnsonChris Johnson
19.5k5 gold badges77 silver badges77 bronze badges
3
The Python Magic library provides the functionality you need.
You can install the library with pip install python-magic
and use it as follows:
>>> import magic
>>> magic.from_file['iceland.jpg']
'JPEG image data, JFIF standard 1.01'
>>> magic.from_file['iceland.jpg', mime=True]
'image/jpeg'
>>> magic.from_file['greenland.png']
'PNG image data, 600 x 1000, 8-bit colormap, non-interlaced'
>>> magic.from_file['greenland.png', mime=True]
'image/png'
The Python
code in this case is calling to libmagic beneath the hood, which is the same library used by the *NIX file
command. Thus, this does the same thing as the subprocess/shell-based answers, but without that overhead.
answered Jun 26, 2014 at 14:51
RichardRichard
51.6k30 gold badges168 silver badges243 bronze badges
3
On unix and linux there is the file
command to guess file types. There's even a windows port.
From the man page:
File tests each argument in an attempt to classify it. There are three sets of tests, performed in this order: filesystem tests, magic number tests, and language tests. The first test that succeeds causes the file type to be printed.
You would need to run the file
command with the subprocess
module and then parse the results to figure out an extension.
edit: Ignore my answer. Use Chris Johnson's answer instead.
answered Jun 7, 2012 at 18:12
Steven RumbalskiSteven Rumbalski
43k8 gold badges85 silver badges117 bronze badges
3
In the case of images, you can use the imghdr
module.
>>> import imghdr
>>> imghdr.what['8e5d7e9d873e2a9db0e31f9dfc11cf47'] # You can pass a file name or a file object as first param. See doc for optional 2nd param.
'png'
Python 2 imghdr doc
Python 3 imghdr doc
phoenix
6,3884 gold badges36 silver badges44 bronze badges
answered Oct 7, 2014 at 16:00
Lewis DiamondLewis Diamond
21.1k2 gold badges21 silver badges30 bronze badges
import subprocess as sub
p = sub.Popen['file yourfile.txt', stdout=sub.PIPE, stderr=sub.PIPE]
output, errors = p.communicate[]
print[output]
As Steven pointed out, subprocess
is the way. You can get the command output by the way above as this post said
answered Jun 7, 2012 at 18:25
xvatarxvatar
3,11915 silver badges19 bronze badges
3
You can also install the official file
binding for Python, a library called file-magic
[it does not use ctypes, like python-magic
].
It's available on PyPI as file-magic and on Debian as python-magic. For me this library is the best to use since it's available on PyPI and on Debian [and probably other distributions], making the process of deploying your software easier. I've blogged about how to use it, also.
answered Aug 5, 2016 at 0:43
With newer subprocess library, you can now use the following code [*nix only solution]:
import subprocess
import shlex
filename = 'your_file'
cmd = shlex.split['file --mime-type {0}'.format[filename]]
result = subprocess.check_output[cmd]
mime_type = result.split[][-1]
print mime_type
answered Jun 6, 2014 at 3:14
bernieyberniey
2,6521 gold badge16 silver badges8 bronze badges
2
also you can use this code [pure python by 3 byte of header file]:
full_path = os.path.join[MEDIA_ROOT, pathfile]
try:
image_data = open[full_path, "rb"].read[]
except IOError:
return "Incorrect Request :[ !!!"
header_byte = image_data[0:3].encode["hex"].lower[]
if header_byte == '474946':
return "image/gif"
elif header_byte == '89504e':
return "image/png"
elif header_byte == 'ffd8ff':
return "image/jpeg"
else:
return "binary file"
without any package install [and update version]
answered Jul 6, 2019 at 10:36
evergreenevergreen
7,1612 gold badges15 silver badges25 bronze badges
2
Only works for Linux but Using the "sh" python module you can simply call any shell command
//pypi.org/project/sh/
pip install sh
import sh
sh.file["/root/file"]
Output: /root/file: ASCII text
answered Feb 2, 2019 at 18:45
LelouchLelouch
4696 silver badges6 bronze badges
This code list all files of a given extension in a given folder recursively
import magic
import glob
from os.path import isfile
ROOT_DIR = 'backup'
WANTED_EXTENSION = 'sqlite'
for filename in glob.iglob[ROOT_DIR + '/**', recursive=True]:
if isfile[filename]:
extension = magic.from_file[filename, mime = True]
if WANTED_EXTENSION in extension:
print[filename]
//gist.github.com/izmcm/6a5d6fa8d4ec65fd9851a1c06c8946ac
answered Aug 30, 2021 at 21:28
0