如何在python中找到文件的mime类型？

Question 1

Let’s say you want to save a bunch of files somewhere, for instance in BLOBs. Let’s say you want to dish these files out via a web page and have the client automatically open the correct application/viewer.

Assumption: The browser figures out which application/viewer to use by the mime-type (content-type?) header in the HTTP response.

Based on that assumption, in addition to the bytes of the file, you also want to save the MIME type.

How would you find the MIME type of a file? I’m currently on a Mac, but this should also work on Windows.

Does the browser add this information when posting the file to the web page?

Is there a neat python library for finding this information? A WebService or (even better) a downloadable database?

Question 2

The python-magic method suggested by toivotuo is outdated. Python-magic’s current trunk is at Github and based on the readme there, finding the MIME-type, is done like this.

# For MIME types
import magic
mime = magic.Magic(mime=True)
mime.from_file("testdata/test.pdf") # 'application/pdf'

Question 3

The mimetypes module in the standard library will determine/guess the MIME type from a file extension.

If users are uploading files the HTTP post will contain the MIME type of the file alongside the data. For example, Django makes this data available as an attribute of the UploadedFile object.

Question 4

More reliable way than to use the mimetypes library would be to use the python-magic package.

import magic
m = magic.open(magic.MAGIC_MIME)
m.load()
m.file("/tmp/document.pdf")

This would be equivalent to using file(1).

On Django one could also make sure that the MIME type matches that of UploadedFile.content_type.

Question 5

This seems to be very easy

>>> from mimetypes import MimeTypes
>>> import urllib 
>>> mime = MimeTypes()
>>> url = urllib.pathname2url('Upload.xml')
>>> mime_type = mime.guess_type(url)
>>> print mime_type
('application/xml', None)

Please refer Old Post

Update – In python 3+ version, it’s more convenient now:

import mimetypes
print(mimetypes.guess_type("sample.html"))

Question 6

There are 3 different libraries that wraps libmagic.

2 of them are available on pypi (so pip install will work):

filemagic
python-magic

And another, similar to python-magic is available directly in the latest libmagic sources, and it is the one you probably have in your linux distribution.

In Debian the package python-magic is about this one and it is used as toivotuo said and it is not obsoleted as Simon Zimmermann said (IMHO).

It seems to me another take (by the original author of libmagic).

Too bad is not available directly on pypi.

Question 7

in python 2.6:

mime = subprocess.Popen("/usr/bin/file --mime PATH", shell=True, \
    stdout=subprocess.PIPE).communicate()[0]

Question 8

2017 Update

No need to go to github, it is on PyPi under a different name:

pip3 install --user python-magic
# or:
sudo apt install python3-magic  # Ubuntu distro package

The code can be simplified as well:

>>> import magic

>>> magic.from_file('/tmp/img_3304.jpg', mime=True)
'image/jpeg'

Question 9

Python bindings to libmagic

All the different answers on this topic are very confusing, so I’m hoping to give a bit more clarity with this overview of the different bindings of libmagic. Previously mammadori gave a short answer listing the available option.

libmagic

module name: magic
pypi: file-magic
source: https://github.com/file/file/tree/master/python

When determining a files mime-type, the tool of choice is simply called file and its back-end is called libmagic. (See the Project home page.) The project is developed in a private cvs-repository, but there is a read-only git mirror on github.

Now this tool, which you will need if you want to use any of the libmagic bindings with python, already comes with its own python bindings called file-magic. There is not much dedicated documentation for them, but you can always have a look at the man page of the c-library: man libmagic. The basic usage is described in the readme file:

import magic

detected = magic.detect_from_filename('magic.py')
print 'Detected MIME type: {}'.format(detected.mime_type)
print 'Detected encoding: {}'.format(detected.encoding)
print 'Detected file type name: {}'.format(detected.name)

Apart from this, you can also use the library by creating a Magic object using magic.open(flags) as shown in the example file.

Both toivotuo and ewr2san use these file-magic bindings included in the file tool. They mistakenly assume, they are using the python-magic package. This seems to indicate, that if both file and python-magic are installed, the python module magic refers to the former one.

python-magic

module name: magic
pypi: python-magic
source: https://github.com/ahupp/python-magic

This is the library that Simon Zimmermann talks about in his answer and which is also employed by Claude COULOMBE as well as Gringo Suave.

filemagic

module name: magic
pypi: filemagic
source: https://github.com/aliles/filemagic

Note: This project was last updated in 2013!

Due to being based on the same c-api, this library has some similarity with file-magic included in libmagic. It is only mentioned by mammadori and no other answer employs it.

Question 10

@toivotuo ‘s method worked best and most reliably for me under python3. My goal was to identify gzipped files which do not have a reliable .gz extension. I installed python3-magic.

import magic

filename = "./datasets/test"

def file_mime_type(filename):
    m = magic.open(magic.MAGIC_MIME)
    m.load()
    return(m.file(filename))

print(file_mime_type(filename))

for a gzipped file it returns: application/gzip; charset=binary

for an unzipped txt file (iostat data): text/plain; charset=us-ascii

for a tar file: application/x-tar; charset=binary

for a bz2 file: application/x-bzip2; charset=binary

and last but not least for me a .zip file: application/zip; charset=binary

Question 11

You didn’t state what web server you were using, but Apache has a nice little module called Mime Magic which it uses to determine the type of a file when told to do so. It reads some of the file’s content and tries to figure out what type it is based on the characters found. And as Dave Webb Mentioned the MimeTypes Module under python will work, provided an extension is handy.

Alternatively, if you are sitting on a UNIX box you can use sys.popen('file -i ' + fileName, mode='r') to grab the MIME type. Windows should have an equivalent command, but I’m unsure as to what it is.

Question 12

python 3 ref: https://docs.python.org/3.2/library/mimetypes.html

mimetypes.guess_type(url, strict=True) Guess the type of a file based on its filename or URL, given by url. The return value is a tuple (type, encoding) where type is None if the type can’t be guessed (missing or unknown suffix) or a string of the form ‘type/subtype’, usable for a MIME content-type header.

encoding is None for no encoding or the name of the program used to encode (e.g. compress or gzip). The encoding is suitable for use as a Content-Encoding header, not as a Content-Transfer-Encoding header. The mappings are table driven. Encoding suffixes are case sensitive; type suffixes are first tried case sensitively, then case insensitively.

The optional strict argument is a flag specifying whether the list of known MIME types is limited to only the official types registered with IANA. When strict is True (the default), only the IANA types are supported; when strict is False, some additional non-standard but commonly used MIME types are also recognized.

import mimetypes
print(mimetypes.guess_type("sample.html"))

Question 13

In Python 3.x and webapp with url to the file which couldn’t have an extension or a fake extension. You should install python-magic, using

pip3 install python-magic

For Mac OS X, you should also install libmagic using

brew install libmagic

Code snippet

import urllib
import magic
from urllib.request import urlopen

url = "http://...url to the file ..."
request = urllib.request.Request(url)
response = urlopen(request)
mime_type = magic.from_buffer(response.readline())
print(mime_type)

alternatively you could put a size into the read

import urllib
import magic
from urllib.request import urlopen

url = "http://...url to the file ..."
request = urllib.request.Request(url)
response = urlopen(request)
mime_type = magic.from_buffer(response.read(128))
print(mime_type)

Question 14

I try mimetypes library first. If it’s not working, I use python-magic libary instead.

import mimetypes
def guess_type(filename, buffer=None):
mimetype, encoding = mimetypes.guess_type(filename)
if mimetype is None:
    try:
        import magic
        if buffer:
            mimetype = magic.from_buffer(buffer, mime=True)
        else:
            mimetype = magic.from_file(filename, mime=True)
    except ImportError:
        pass
return mimetype

Question 15

The mimetypes module just recognise an file type based on file extension. If you will try to recover a file type of a file without extension, the mimetypes will not works.

Question 16

I ‘ve tried a lot of examples but with Django mutagen plays nicely.

Example checking if files is mp3

from mutagen.mp3 import MP3, HeaderNotFoundError  

try:
    audio = MP3(file)
except HeaderNotFoundError:
    raise ValidationError('This file should be mp3')

The downside is that your ability to check file types is limited, but it’s a great way if you want not only check for file type but also to access additional information.

Question 17

This may be old already, but why not use UploadedFile.content_type directly from Django? Is not the same?(https://docs.djangoproject.com/en/1.11/ref/files/uploads/#django.core.files.uploadedfile.UploadedFile.content_type)

Question 18

For byte Array type data you can use magic.from_buffer(_byte_array,mime=True)

Question 19

you can use imghdr Python module.

如何在python中找到文件的mime类型？

问题：如何在python中找到文件的mime类型？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

Python绑定到libmagic

魔力

Python魔术

魔术师

Python bindings to libmagic

libmagic

python-magic

filemagic

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

回答 16

回答 17

相关文章

排行榜展示

文章展示