问题:如何在python中找到文件的mime类型?
假设您要将一堆文件保存在某个地方,例如在BLOB中。假设您想通过网页分发这些文件,并让客户端自动打开正确的应用程序/查看器。
假设:浏览器通过HTTP响应中的mime-type(content-type?)标头找出要使用哪个应用程序/查看器。
基于此假设,除了文件的字节外,您还希望保存MIME类型。
您将如何找到文件的MIME类型?我目前在Mac上,但是在Windows上也应该可以使用。
将文件发布到网页时,浏览器是否添加此信息?
是否有一个整洁的python库来查找此信息?WebService还是(甚至更好的)可下载数据库?
Let’s say you want to save a bunch of files somewhere, for instance in BLOBs. Let’s say you want to dish these files out via a web page and have the client automatically open the correct application/viewer.
Assumption: The browser figures out which application/viewer to use by the mime-type (content-type?) header in the HTTP response.
Based on that assumption, in addition to the bytes of the file, you also want to save the MIME type.
How would you find the MIME type of a file? I’m currently on a Mac, but this should also work on Windows.
Does the browser add this information when posting the file to the web page?
Is there a neat python library for finding this information? A WebService or (even better) a downloadable database?
回答 0
toivotuo建议的python-magic方法已过时。Python-magic的当前主干位于Github上,并根据该自述文件找到MIME类型,是这样完成的。
# For MIME types
import magic
mime = magic.Magic(mime=True)
mime.from_file("testdata/test.pdf") # 'application/pdf'
The python-magic method suggested by toivotuo is outdated. Python-magic’s current trunk is at Github and based on the readme there, finding the MIME-type, is done like this.
# For MIME types
import magic
mime = magic.Magic(mime=True)
mime.from_file("testdata/test.pdf") # 'application/pdf'
回答 1
The mimetypes module in the standard library will determine/guess the MIME type from a file extension.
If users are uploading files the HTTP post will contain the MIME type of the file alongside the data. For example, Django makes this data available as an attribute of the UploadedFile object.
回答 2
与使用mimetypes库相比,更可靠的方法是使用python-magic软件包。
import magic
m = magic.open(magic.MAGIC_MIME)
m.load()
m.file("/tmp/document.pdf")
这等同于使用file(1)。
在Django上,还可以确保MIME类型与UploadedFile.content_type相匹配。
More reliable way than to use the mimetypes library would be to use the python-magic package.
import magic
m = magic.open(magic.MAGIC_MIME)
m.load()
m.file("/tmp/document.pdf")
This would be equivalent to using file(1).
On Django one could also make sure that the MIME type matches that of UploadedFile.content_type.
回答 3
这似乎很容易
>>> from mimetypes import MimeTypes
>>> import urllib
>>> mime = MimeTypes()
>>> url = urllib.pathname2url('Upload.xml')
>>> mime_type = mime.guess_type(url)
>>> print mime_type
('application/xml', None)
请参考旧帖子
更新 -根据@Garrets注释,在python 3中更简单:
import mimetypes
print(mimetypes.guess_type("sample.html"))
This seems to be very easy
>>> from mimetypes import MimeTypes
>>> import urllib
>>> mime = MimeTypes()
>>> url = urllib.pathname2url('Upload.xml')
>>> mime_type = mime.guess_type(url)
>>> print mime_type
('application/xml', None)
Please refer Old Post
Update – In python 3+ version, it’s more convenient now:
import mimetypes
print(mimetypes.guess_type("sample.html"))
回答 4
有3种包装libmagic的库。
其中2个在pypi上可用(因此pip安装将起作用):
另外一个与python-magic类似的东西可直接从最新的libmagic来源中获得,它可能是您在Linux发行版中拥有的一个。
在Debian中,python-magic软件包就是关于这一软件包的,它被toivotuo所使用,并且并未像Simon Zimmermann(IMHO)所指出的那样被淘汰。
在我看来,这是另一种观点(由libmagic的原始作者编写)。
太糟糕了,不能直接在pypi上使用。
There are 3 different libraries that wraps libmagic.
2 of them are available on pypi (so pip install will work):
And another, similar to python-magic is available directly in the latest libmagic sources, and it is the one you probably have in your linux distribution.
In Debian the package python-magic is about this one and it is used as toivotuo said and it is not obsoleted as Simon Zimmermann said (IMHO).
It seems to me another take (by the original author of libmagic).
Too bad is not available directly on pypi.
回答 5
在python 2.6中:
mime = subprocess.Popen("/usr/bin/file --mime PATH", shell=True, \
stdout=subprocess.PIPE).communicate()[0]
in python 2.6:
mime = subprocess.Popen("/usr/bin/file --mime PATH", shell=True, \
stdout=subprocess.PIPE).communicate()[0]
回答 6
2017更新
无需转到github,它以其他名称位于PyPi上:
pip3 install --user python-magic
# or:
sudo apt install python3-magic # Ubuntu distro package
代码也可以简化:
>>> import magic
>>> magic.from_file('/tmp/img_3304.jpg', mime=True)
'image/jpeg'
2017 Update
No need to go to github, it is on PyPi under a different name:
pip3 install --user python-magic
# or:
sudo apt install python3-magic # Ubuntu distro package
The code can be simplified as well:
>>> import magic
>>> magic.from_file('/tmp/img_3304.jpg', mime=True)
'image/jpeg'
回答 7
Python绑定到libmagic
关于该主题的所有不同答案都非常令人困惑,因此我希望通过对libmagic的不同绑定的概述来更加清楚。以前,mammadori给出了一个简短的答案,列出了可用的选项。
魔力
确定文件mime-type时,简称为选择的工具,file
其后端称为libmagic
。(请参阅Project主页。)该项目是在私有cvs存储库中开发的,但是github上有一个只读的git mirror。
现在,如果您想将任何libmagic绑定与python一起使用,则需要使用此工具,该工具已经附带了自己的python绑定,称为file-magic
。它们没有太多专用的文档,但是您可以随时查看c-library的手册页:man libmagic
。自述文件中描述了基本用法:
import magic
detected = magic.detect_from_filename('magic.py')
print 'Detected MIME type: {}'.format(detected.mime_type)
print 'Detected encoding: {}'.format(detected.encoding)
print 'Detected file type name: {}'.format(detected.name)
除此之外,您还可以通过Magic
使用示例文件中magic.open(flags)
所示的创建对象来使用库。
无论toivotuo和ewr2san使用这些file-magic
绑定包含在file
工具。他们错误地假设,他们正在使用该python-magic
程序包。这似乎表明,如果同时安装file
和python-magic
,则python模块将magic
引用前者。
Python魔术
这是西蒙·齐默尔曼(Simon Zimmermann)在回答中谈到的图书馆,该图书馆也由克劳德·库洛姆贝(Claude COULOMBE)和Gringo Suave雇用。
魔术师
注意:该项目的最新更新时间为2013年!
由于基于相同的c-api,该库与中file-magic
包含的库有一些相似之处libmagic
。它仅由mammadori提及,没有其他答案使用它。
Python bindings to libmagic
All the different answers on this topic are very confusing, so I’m hoping to give a bit more clarity with this overview of the different bindings of libmagic. Previously mammadori gave a short answer listing the available option.
libmagic
When determining a files mime-type, the tool of choice is simply called file
and its back-end is called libmagic
. (See the Project home page.) The project is developed in a private cvs-repository, but there is a read-only git mirror on github.
Now this tool, which you will need if you want to use any of the libmagic bindings with python, already comes with its own python bindings called file-magic
. There is not much dedicated documentation for them, but you can always have a look at the man page of the c-library: man libmagic
. The basic usage is described in the readme file:
import magic
detected = magic.detect_from_filename('magic.py')
print 'Detected MIME type: {}'.format(detected.mime_type)
print 'Detected encoding: {}'.format(detected.encoding)
print 'Detected file type name: {}'.format(detected.name)
Apart from this, you can also use the library by creating a Magic
object using magic.open(flags)
as shown in the example file.
Both toivotuo and ewr2san use these file-magic
bindings included in the file
tool. They mistakenly assume, they are using the python-magic
package. This seems to indicate, that if both file
and python-magic
are installed, the python module magic
refers to the former one.
python-magic
This is the library that Simon Zimmermann talks about in his answer and which is also employed by Claude COULOMBE as well as Gringo Suave.
filemagic
Note: This project was last updated in 2013!
Due to being based on the same c-api, this library has some similarity with file-magic
included in libmagic
. It is only mentioned by mammadori and no other answer employs it.
回答 8
@toivotuo的方法在python3下对我来说效果最好,最可靠。我的目标是识别没有可靠的.gz扩展名的gzip压缩文件。我安装了python3-magic。
import magic
filename = "./datasets/test"
def file_mime_type(filename):
m = magic.open(magic.MAGIC_MIME)
m.load()
return(m.file(filename))
print(file_mime_type(filename))
对于压缩文件,它返回:application / gzip; 字符集=二进制
对于未压缩的txt文件(iostat数据):文本/纯文本;字符集= us-ascii
对于tar文件:application / x-tar; 字符集=二进制
对于bz2文件:application / x-bzip2; 字符集=二进制
最后但并非最不重要的一个.zip文件:application / zip; 字符集=二进制
@toivotuo ‘s method worked best and most reliably for me under python3. My goal was to identify gzipped files which do not have a reliable .gz extension. I installed python3-magic.
import magic
filename = "./datasets/test"
def file_mime_type(filename):
m = magic.open(magic.MAGIC_MIME)
m.load()
return(m.file(filename))
print(file_mime_type(filename))
for a gzipped file it returns:
application/gzip; charset=binary
for an unzipped txt file (iostat data):
text/plain; charset=us-ascii
for a tar file:
application/x-tar; charset=binary
for a bz2 file:
application/x-bzip2; charset=binary
and last but not least for me a .zip file:
application/zip; charset=binary
回答 9
您没有说明正在使用的Web服务器,但是Apache有一个很好的小模块,称为Mime Magic,用于告知文件类型,该模块用于确定文件的类型。它读取文件的某些内容,并尝试根据找到的字符找出文件的类型。就像Dave Webb提到的那样,只要有扩展名,python下的MimeTypes模块就可以使用。
或者,如果您坐在UNIX机器上,则可以使用它sys.popen('file -i ' + fileName, mode='r')
来获取MIME类型。Windows应该有一个等效的命令,但是我不确定它是什么。
You didn’t state what web server you were using, but Apache has a nice little module called Mime Magic which it uses to determine the type of a file when told to do so. It reads some of the file’s content and tries to figure out what type it is based on the characters found. And as Dave Webb Mentioned the MimeTypes Module under python will work, provided an extension is handy.
Alternatively, if you are sitting on a UNIX box you can use sys.popen('file -i ' + fileName, mode='r')
to grab the MIME type. Windows should have an equivalent command, but I’m unsure as to what it is.
回答 10
python 3参考:https : //docs.python.org/3.2/library/mimetypes.html
mimetypes.guess_type(url,strict = True)根据文件名或URL(由url给定)猜测文件的类型。返回值是一个元组(类型,编码),如果无法猜测类型(缺少或后缀未知)或类型为’type / subtype’的字符串(可用于MIME内容类型标头),则type为None。
如果没有编码或用于编码的程序名称(例如compress或gzip),则encoding为None。该编码适合用作Content-Encoding标头,而不适合用作Content-Transfer-Encoding标头。映射是表驱动的。编码后缀区分大小写;类型后缀首先区分大小写,然后不区分大小写。
可选的strict参数是一个标志,用于指定是否将已知MIME类型的列表限制为仅向IANA注册的正式类型。如果strict为True(默认值),则仅支持IANA类型;否则,不支持。当strict为False时,还将识别一些其他非标准但常用的MIME类型。
import mimetypes
print(mimetypes.guess_type("sample.html"))
python 3 ref: https://docs.python.org/3.2/library/mimetypes.html
mimetypes.guess_type(url, strict=True) Guess the type of a file based
on its filename or URL, given by url. The return value is a tuple
(type, encoding) where type is None if the type can’t be guessed
(missing or unknown suffix) or a string of the form ‘type/subtype’,
usable for a MIME content-type header.
encoding is None for no encoding or the name of the program used to
encode (e.g. compress or gzip). The encoding is suitable for use as a
Content-Encoding header, not as a Content-Transfer-Encoding header.
The mappings are table driven. Encoding suffixes are case sensitive;
type suffixes are first tried case sensitively, then case
insensitively.
The optional strict argument is a flag specifying whether the list of
known MIME types is limited to only the official types registered with
IANA. When strict is True (the default), only the IANA types are
supported; when strict is False, some additional non-standard but
commonly used MIME types are also recognized.
import mimetypes
print(mimetypes.guess_type("sample.html"))
回答 11
在Python 3.x和webapp中,带有url的文件不能具有扩展名或假扩展名。您应该使用以下命令安装python-magic
pip3 install python-magic
对于Mac OS X,还应该使用以下命令安装libmagic
brew install libmagic
程式码片段
import urllib
import magic
from urllib.request import urlopen
url = "http://...url to the file ..."
request = urllib.request.Request(url)
response = urlopen(request)
mime_type = magic.from_buffer(response.readline())
print(mime_type)
或者,您可以将大小放入读取
import urllib
import magic
from urllib.request import urlopen
url = "http://...url to the file ..."
request = urllib.request.Request(url)
response = urlopen(request)
mime_type = magic.from_buffer(response.read(128))
print(mime_type)
In Python 3.x and webapp with url to the file which couldn’t have an extension or a fake extension. You should install python-magic, using
pip3 install python-magic
For Mac OS X, you should also install libmagic using
brew install libmagic
Code snippet
import urllib
import magic
from urllib.request import urlopen
url = "http://...url to the file ..."
request = urllib.request.Request(url)
response = urlopen(request)
mime_type = magic.from_buffer(response.readline())
print(mime_type)
alternatively you could put a size into the read
import urllib
import magic
from urllib.request import urlopen
url = "http://...url to the file ..."
request = urllib.request.Request(url)
response = urlopen(request)
mime_type = magic.from_buffer(response.read(128))
print(mime_type)
回答 12
我首先尝试mimetypes库。如果不起作用,我改用python-magic libary。
import mimetypes
def guess_type(filename, buffer=None):
mimetype, encoding = mimetypes.guess_type(filename)
if mimetype is None:
try:
import magic
if buffer:
mimetype = magic.from_buffer(buffer, mime=True)
else:
mimetype = magic.from_file(filename, mime=True)
except ImportError:
pass
return mimetype
I try mimetypes library first. If it’s not working, I use python-magic libary instead.
import mimetypes
def guess_type(filename, buffer=None):
mimetype, encoding = mimetypes.guess_type(filename)
if mimetype is None:
try:
import magic
if buffer:
mimetype = magic.from_buffer(buffer, mime=True)
else:
mimetype = magic.from_file(filename, mime=True)
except ImportError:
pass
return mimetype
回答 13
mimetypes模块仅基于文件扩展名识别文件类型。如果您尝试恢复不带扩展名的文件的文件类型,则mimetypes将不起作用。
The mimetypes module just recognise an file type based on file extension. If you will try to recover a file type of a file without extension, the mimetypes will not works.
回答 14
我已经尝试了很多示例,但是使用Django 诱变效果很好。
检查文件是否为示例 mp3
from mutagen.mp3 import MP3, HeaderNotFoundError
try:
audio = MP3(file)
except HeaderNotFoundError:
raise ValidationError('This file should be mp3')
缺点是您检查文件类型的能力是有限的,但是如果您不仅要检查文件类型而且要访问其他信息,这是一种很好的方法。
I ‘ve tried a lot of examples but with Django mutagen plays nicely.
Example checking if files is mp3
from mutagen.mp3 import MP3, HeaderNotFoundError
try:
audio = MP3(file)
except HeaderNotFoundError:
raise ValidationError('This file should be mp3')
The downside is that your ability to check file types is limited, but it’s a great way if you want not only check for file type but also to access additional information.
回答 15
回答 16
对于字节数组类型的数据,可以使用magic.from_buffer(_byte_array,mime = True)
For byte Array type data you can use
magic.from_buffer(_byte_array,mime=True)
回答 17
you can use imghdr Python module.