标签归档:file

即使模板文件存在,Flask也会引发TemplateNotFound错误

问题:即使模板文件存在,Flask也会引发TemplateNotFound错误

我正在尝试渲染文件home.html。该文件存在于我的项目中,但是jinja2.exceptions.TemplateNotFound: home.html当我尝试渲染它时,我一直在获取它。Flask为什么找不到我的模板?

from flask import Flask, render_template

app = Flask(__name__)

@app.route('/')
def home():
    return render_template('home.html')
/myproject
    app.py
    home.html

I am trying to render the file home.html. The file exists in my project, but I keep getting jinja2.exceptions.TemplateNotFound: home.html when I try to render it. Why can’t Flask find my template?

from flask import Flask, render_template

app = Flask(__name__)

@app.route('/')
def home():
    return render_template('home.html')
/myproject
    app.py
    home.html

回答 0

您必须在正确的位置创建模板文件。在templatespython模块旁边的子目录中。

该错误表明目录中没有home.html文件templates/。确保在与python模块相同的目录中创建了该目录,并且确实将home.html文件放在该子目录中。如果您的应用是软件包,则应在软件包创建模板文件夹。

myproject/
    app.py
    templates/
        home.html
myproject/
    mypackage/
        __init__.py
        templates/
            home.html

另外,如果您将模板文件夹命名为其他名称,templates并且不想将其重命名为默认名称,则可以告诉Flask使用该其他目录。

app = Flask(__name__, template_folder='template')  # still relative to module

你可以问烧瓶解释它是如何试图找到一个给定的模板,通过设置EXPLAIN_TEMPLATE_LOADING选项True。对于每个加载的模板,您将获得一个报告记录到Flaskapp.logger的级别INFO

搜索成功后的样子:在此示例中,foo/bar.html模板扩展了base.html模板,因此有两个搜索:

[2019-06-15 16:03:39,197] INFO in debughelpers: Locating template "foo/bar.html":
    1: trying loader of application "flaskpackagename"
       class: jinja2.loaders.FileSystemLoader
       encoding: 'utf-8'
       followlinks: False
       searchpath:
         - /.../project/flaskpackagename/templates
       -> found ('/.../project/flaskpackagename/templates/foo/bar.html')
[2019-06-15 16:03:39,203] INFO in debughelpers: Locating template "base.html":
    1: trying loader of application "flaskpackagename"
       class: jinja2.loaders.FileSystemLoader
       encoding: 'utf-8'
       followlinks: False
       searchpath:
         - /.../project/flaskpackagename/templates
       -> found ('/.../project/flaskpackagename/templates/base.html')

蓝图也可以注册自己的模板目录,但这不是必需的,如果您使用蓝图可以更轻松地在逻辑单元之间拆分较大的项目。即使在每个蓝图中使用其他路径,也始终会首先搜索Flask应用程序主模板目录。

You must create your template files in the correct location; in the templates subdirectory next to your python module.

The error indicates that there is no home.html file in the templates/ directory. Make sure you created that directory in the same directory as your python module, and that you did in fact put a home.html file in that subdirectory. If your app is a package, the templates folder should be created inside the package.

myproject/
    app.py
    templates/
        home.html
myproject/
    mypackage/
        __init__.py
        templates/
            home.html

Alternatively, if you named your templates folder something other than templates and don’t want to rename it to the default, you can tell Flask to use that other directory.

app = Flask(__name__, template_folder='template')  # still relative to module

You can ask Flask to explain how it tried to find a given template, by setting the EXPLAIN_TEMPLATE_LOADING option to True. For every template loaded, you’ll get a report logged to the Flask app.logger, at level INFO.

This is what it looks like when a search is successful; in this example the foo/bar.html template extends the base.html template, so there are two searches:

[2019-06-15 16:03:39,197] INFO in debughelpers: Locating template "foo/bar.html":
    1: trying loader of application "flaskpackagename"
       class: jinja2.loaders.FileSystemLoader
       encoding: 'utf-8'
       followlinks: False
       searchpath:
         - /.../project/flaskpackagename/templates
       -> found ('/.../project/flaskpackagename/templates/foo/bar.html')
[2019-06-15 16:03:39,203] INFO in debughelpers: Locating template "base.html":
    1: trying loader of application "flaskpackagename"
       class: jinja2.loaders.FileSystemLoader
       encoding: 'utf-8'
       followlinks: False
       searchpath:
         - /.../project/flaskpackagename/templates
       -> found ('/.../project/flaskpackagename/templates/base.html')

Blueprints can register their own template directories too, but this is not a requirement if you are using blueprints to make it easier to split a larger project across logical units. The main Flask app template directory is always searched first even when using additional paths per blueprint.


回答 1

我认为Flask默认使用目录模板。因此,您的代码应该假设这是您的hello.py

from flask import Flask,render_template

app=Flask(__name__,template_folder='template')


@app.route("/")
def home():
    return render_template('home.html')

@app.route("/about/")
def about():
    return render_template('about.html')

if __name__=="__main__":
    app.run(debug=True)

你的工作空间结构就像

project/
    hello.py        
    template/
         home.html
         about.html    
    static/
           js/
             main.js
           css/
               main.css

您还创建了两个HTML文件,名称分别为home.html和about.html,并将它们放在模板文件夹中。

I think Flask uses the directory templates by default. So your code should be suppose this is your hello.py

from flask import Flask,render_template

app=Flask(__name__,template_folder='template')


@app.route("/")
def home():
    return render_template('home.html')

@app.route("/about/")
def about():
    return render_template('about.html')

if __name__=="__main__":
    app.run(debug=True)

And you work space structure like

project/
    hello.py        
    template/
         home.html
         about.html    
    static/
           js/
             main.js
           css/
               main.css

also you have create two html files with name of home.html and about.html and put those files in template folder.


回答 2

(请注意,上面为文件/项目结构提供的已接受答案是绝对正确的。)

也..

除了正确设置项目文件结构外,我们还必须告诉flask查找目录层次结构的适当级别。

例如..

    app = Flask(__name__, template_folder='../templates')
    app = Flask(__name__, template_folder='../templates', static_folder='../static')

从开始../向后移动一个目录,然后从那里开始。

从开始../../向后移动两个目录,然后从那里开始(依此类推…)。

希望这可以帮助

(Please note that the above accepted Answer provided for file/project structure is absolutely correct.)

Also..

In addition to properly setting up the project file structure, we have to tell flask to look in the appropriate level of the directory hierarchy.

for example..

    app = Flask(__name__, template_folder='../templates')
    app = Flask(__name__, template_folder='../templates', static_folder='../static')

Starting with ../ moves one directory backwards and starts there.

Starting with ../../ moves two directories backwards and starts there (and so on…).

Hope this helps


回答 3

我不知道为什么,但是我不得不使用以下文件夹结构。我将“模板”提高了一层。

project/
    app/
        hello.py
        static/
            main.css
    templates/
        home.html
    venv/

这可能表明在其他地方配置错误,但是我无法弄清楚那是什么,并且这可行。

I don’t know why, but I had to use the following folder structure instead. I put “templates” one level up.

project/
    app/
        hello.py
        static/
            main.css
    templates/
        home.html
    venv/

This probably indicates a misconfiguration elsewhere, but I couldn’t figure out what that was and this worked.


回答 4

检查:

  1. 模板文件具有正确的名称
  2. 模板文件位于一个名为 templates
  3. 您传递给的名称render_template是相对于模板目录的(index.html直接位于模板目录中,位于模板目录中auth/login.html的auth目录下。)
  4. 您可能没有与您的应用同名的子目录,或者模板目录位于该子目录中。

如果这不起作用,请打开调试(app.debug = True),这可能有助于找出问题所在。

Check that:

  1. the template file has the right name
  2. the template file is in a subdirectory called templates
  3. the name you pass to render_template is relative to the template directory (index.html would be directly in the templates directory, auth/login.html would be under the auth directory in the templates directory.)
  4. you either do not have a subdirectory with the same name as your app, or the templates directory is inside that subdir.

If that doesn’t work, turn on debugging (app.debug = True) which might help figure out what’s wrong.


回答 5

如果从已安装的程序包运行代码,请确保目录中存在模板文件<python root>/lib/site-packages/your-package/templates


一些细节:

就我而言,我试图运行项目flask_simple_ui的示例,并且jinja总是会说

jinja2.exceptions.TemplateNotFound:form.html

诀窍是示例程序将导入已安装的软件包flask_simple_ui。而ninja从内部的包装使用的根目录查找包路径,在我的情况下被使用...python/lib/site-packages/flask_simple_ui,而不是os.getcwd() 作为一个期望的那样。

不幸的是,它setup.py存在一个错误,并且不会复制任何html文件,包括missing form.html。一旦修复setup.pyTemplateNotFound的问题就消失了。

希望对您有所帮助。

If you run your code from an installed package, make sure template files are present in directory <python root>/lib/site-packages/your-package/templates.


Some details:

In my case I was trying to run examples of project flask_simple_ui and jinja would always say

jinja2.exceptions.TemplateNotFound: form.html

The trick was that sample program would import installed package flask_simple_ui. And ninja being used from inside that package is using as root directory for lookup the package path, in my case ...python/lib/site-packages/flask_simple_ui, instead of os.getcwd() as one would expect.

To my bad luck, setup.py has a bug and doesn’t copy any html files, including the missing form.html. Once I fixed setup.py, the problem with TemplateNotFound vanished.

I hope it helps someone.


回答 6

我遇到了同样的错误,事实证明,我唯一做错的就是将我的“模板”文件夹命名为“模板”而不命名为“ s”。更改它正常工作后,不知道为什么它是东西,但是它是。

I had the same error turns out the only thing i did wrong was to name my ‘templates’ folder,’template’ without ‘s’. After changing that it worked fine,dont know why its a thing but it is.


回答 7

当使用render_template()函数时,它将尝试在名为template的文件夹中搜索模板,并在以下情况下引发jinja2.exceptions.TemplateNotFound错误:

  1. html文件不存在或
  2. 当模板文件夹不存在时

解决问题:

在python文件所在的目录中创建一个带有名称模板的文件夹,并将创建的html文件放置在模板文件夹中。

When render_template() function is used it tries to search for template in the folder called templates and it throws error jinja2.exceptions.TemplateNotFound when :

  1. the html file do not exist or
  2. when templates folder do not exist

To solve the problem :

create a folder with name templates in the same directory where the python file is located and place the html file created in the templates folder.


回答 8

您需要将所有.html文件放在python模块旁边的模板文件夹中。并且,如果您的html文件中使用了任何图像,则需要将所有文件放在名为static的文件夹中

在以下结构中

project/
    hello.py
    static/
        image.jpg
        style.css
    templates/
        homepage.html
    virtual/
        filename.json

You need to put all you .html files in the template folder next to your python module. And if there are any images that you are using in your html files then you need put all your files in the folder named static

In the following Structure

project/
    hello.py
    static/
        image.jpg
        style.css
    templates/
        homepage.html
    virtual/
        filename.json

回答 9

另一种选择是设置,root_path它可以解决模板和静态文件夹的问题。

root_path = Path(sys.executable).parent if getattr(sys, 'frozen', False) else Path(__file__).parent
app = Flask(__name__.split('.')[0], root_path=root_path)

如果您直接通过渲染模板Jinja2,则可以这样写:

ENV = jinja2.Environment(loader=jinja2.FileSystemLoader(str(root_path / 'templates')))
template = ENV.get_template(your_template_name)

Another alternative is to set the root_path which fixes the problem both for templates and static folders.

root_path = Path(sys.executable).parent if getattr(sys, 'frozen', False) else Path(__file__).parent
app = Flask(__name__.split('.')[0], root_path=root_path)

If you render templates directly via Jinja2, then you write:

ENV = jinja2.Environment(loader=jinja2.FileSystemLoader(str(root_path / 'templates')))
template = ENV.get_template(your_template_name)

如何从Python包内部读取(静态)文件?

问题:如何从Python包内部读取(静态)文件?

您能告诉我如何读取Python包中的文件吗?

我的情况

我加载的程序包具有许多模板(要用作程序的文本文件),我想从程序中加载它们。但是,如何指定此类文件的路径?

想象一下我想从以下位置读取文件:

package\templates\temp_file

某种路径操纵?包基本路径跟踪?

Could you tell me how can I read a file that is inside my Python package?

My situation

A package that I load has a number of templates (text files used as strings) that I want to load from within the program. But how do I specify the path to such file?

Imagine I want to read a file from:

package\templates\temp_file

Some kind of path manipulation? Package base path tracking?


回答 0

[添加2016-06-15:显然,这并非在所有情况下都有效。请参阅其他答案]


import os, mypackage
template = os.path.join(mypackage.__path__[0], 'templates', 'temp_file')

[added 2016-06-15: apparently this doesn’t work in all situations. please refer to the other answers]


import os, mypackage
template = os.path.join(mypackage.__path__[0], 'templates', 'temp_file')

回答 1

TLDR;使用标准库的importlib.resources模块,如下面方法2中所述。

不再推荐使用传统的 pkg_resourcesfromsetuptools,因为新方法:

  • 它的性能明显更高 ;
  • 这样做比较安全,因为使用软件包(而不是路径)会引起编译时错误;
  • 它更直观,因为您不必“加入”路径;
  • 由于不需要额外的依赖项(setuptools),因此开发时速度更快,而仅依赖于Python的标准库。

我将传统列在第一位,以在移植现有代码时解释新方法的区别(此处解释了移植)。



假设您的模板位于模块包内嵌套的文件夹中:

  <your-package>
    +--<module-asking-the-file>
    +--templates/
          +--temp_file                         <-- We want this file.

注意1:当然,我们不应该摆弄这个__file__属性(例如,从zip投放时代码会中断)。

注意2:如果您要构建此程序包,请记住将package_datadata_files中的数据文件隐藏起来setup.py

1)使用pkg_resourcessetuptools(慢)

您可以使用setuptools发行版中的pkg_resources软件包,但这会带来性能方面的成本

import pkg_resources

# Could be any dot-separated package/module name or a "Requirement"
resource_package = __name__
resource_path = '/'.join(('templates', 'temp_file'))  # Do not use os.path.join()
template = pkg_resources.resource_string(resource_package, resource_path)
# or for a file-like stream:
template = pkg_resources.resource_stream(resource_package, resource_path)

提示:

  • 这将读取的数据,即使您的分布压缩,所以你可以设置 zip_safe=True你的setup.py,和/或使用期待已久的zipapp打包机Python- 3.5打造自成体系的分布。

  • 记住要添加setuptools到您的运行时要求中(例如,在install_requires中)。

…,请注意,根据Setuptools / pkg_resourcesdocs,您不应使用os.path.join

基本资源访问

请注意,资源名称必须是- /分隔的路径,并且不能是绝对路径(即,没有前导/)或包含诸如“ ..”的相对名称。千万不能使用os.path程序来操作的资源路径,因为它们不是文件系统路径。

2)Python> = 3.7,或使用反向移植的importlib_resources

使用标准库的importlib.resources模块,该模块setuptools上面的效率更高:

try:
    import importlib.resources as pkg_resources
except ImportError:
    # Try backported to PY<37 `importlib_resources`.
    import importlib_resources as pkg_resources

from . import templates  # relative-import the *package* containing the templates

template = pkg_resources.read_text(templates, 'temp_file')
# or for a file-like stream:
template = pkg_resources.open_text(templates, 'temp_file')

注意:

关于功能read_text(package, resource)

  • package可以是一个字符串或模块。
  • resource不再被一个路径,但资源开放,现有的包内的不仅是文件名; 它可能不包含路径分隔符,并且可能没有子资源(即它不能是目录)。

对于问题中提出的示例,我们现在必须:

  • <your_package>/templates/ 通过__init__.py在其中创建一个空文件,将其制作成适当的软件包,
  • 所以现在我们可以使用一个简单的(可能是相对的)import语句(不再解析包/模块名称),
  • 并索要resource_name = "temp_file"(没有路径)。

提示:

  • 要访问当前模块内部的文件,请将package参数设置为__package__,例如pkg_resources.read_text(__package__, 'temp_file')(感谢@ ben-mares)。
  • 当事情变得有趣的实际文件名被要求用path()的,因为现在用于临时创建的文件(阅读上下文经理这个)。
  • 添加回迁库,有条件地为老年人Python,用install_requires=[" importlib_resources ; python_version<'3.7'"](检查这个,如果你用打包项目setuptools<36.2.1)。
  • 如果从传统方法迁移,请记住setuptools运行时要求中删除库。
  • 记住要定制setup.pyMANIFEST包括任何静态文件
  • 您也可以zip_safe=True在中设置setup.py

TLDR; Use standard-library’s importlib.resources module as explained in the method no 2, below.

The traditional pkg_resources from setuptools is not recommended anymore because the new method:

  • it is significantly more performant;
  • is is safer since the use of packages (instead of path-stings) raises compile-time errors;
  • it is more intuitive because you don’t have to “join” paths;
  • it is faster when developing since you don’t need an extra dependency (setuptools), but rely on Python’s standard-library alone.

I kept the traditional listed first, to explain the differences with the new method when porting existing code (porting also explained here).



Let’s assume your templates are located in a folder nested inside your module’s package:

  <your-package>
    +--<module-asking-the-file>
    +--templates/
          +--temp_file                         <-- We want this file.

Note 1: For sure, we should NOT fiddle with the __file__ attribute (e.g. code will break when served from a zip).

Note 2: If you are building this package, remember to declatre your data files as package_data or data_files in your setup.py.

1) Using pkg_resources from setuptools(slow)

You may use pkg_resources package from setuptools distribution, but that comes with a cost, performance-wise:

import pkg_resources

# Could be any dot-separated package/module name or a "Requirement"
resource_package = __name__
resource_path = '/'.join(('templates', 'temp_file'))  # Do not use os.path.join()
template = pkg_resources.resource_string(resource_package, resource_path)
# or for a file-like stream:
template = pkg_resources.resource_stream(resource_package, resource_path)

Tips:

  • This will read data even if your distribution is zipped, so you may set zip_safe=True in your setup.py, and/or use the long-awaited zipapp packer from python-3.5 to create self-contained distributions.

  • Remember to add setuptools into your run-time requirements (e.g. in install_requires`).

… and notice that according to the Setuptools/pkg_resources docs, you should not use os.path.join:

Basic Resource Access

Note that resource names must be /-separated paths and cannot be absolute (i.e. no leading /) or contain relative names like “..“. Do not use os.path routines to manipulate resource paths, as they are not filesystem paths.

2) Python >= 3.7, or using the backported importlib_resources library

Use the standard library’s importlib.resources module which is more efficient than setuptools, above:

try:
    import importlib.resources as pkg_resources
except ImportError:
    # Try backported to PY<37 `importlib_resources`.
    import importlib_resources as pkg_resources

from . import templates  # relative-import the *package* containing the templates

template = pkg_resources.read_text(templates, 'temp_file')
# or for a file-like stream:
template = pkg_resources.open_text(templates, 'temp_file')

Attention:

Regarding the function read_text(package, resource):

  • The package can be either a string or a module.
  • The resource is NOT a path anymore, but just the filename of the resource to open, within an existing package; it may not contain path separators and it may not have sub-resources (i.e. it cannot be a directory).

For the example asked in the question, we must now:

  • make the <your_package>/templates/ into a proper package, by creating an empty __init__.py file in it,
  • so now we can use a simple (possibly relative) import statement (no more parsing package/module names),
  • and simply ask for resource_name = "temp_file" (no path).

Tips:

  • To access a file inside the current module, set the package argument to __package__, e.g. pkg_resources.read_text(__package__, 'temp_file') (thanks to @ben-mares).
  • Things become interesting when an actual filename is asked with path(), since now context-managers are used for temporarily-created files (read this).
  • Add the backported library, conditionally for older Pythons, with install_requires=[" importlib_resources ; python_version<'3.7'"] (check this if you package your project with setuptools<36.2.1).
  • Remember to remove setuptools library from your runtime-requirements, if you migrated from the traditional method.
  • Remember to customize setup.py or MANIFEST to include any static files.
  • You may also set zip_safe=True in your setup.py.

回答 2

包装前奏:

在甚至不必担心读取资源文件之前,第一步就是要确保首先将数据文件打包到您的发行版中-可以很容易地直接从源代码树中读取它们,但重要的是确保可以从已安装的软件包中的代码访问这些资源文件。

这样构造项目,将数据文件放入包中的子目录

.
├── package
   ├── __init__.py
   ├── templates
      └── temp_file
   ├── mymodule1.py
   └── mymodule2.py
├── README.rst
├── MANIFEST.in
└── setup.py

你应该通过include_package_data=Truesetup()呼叫。仅当您要使用setuptools / distutils并构建源分发版时,才需要清单文件。为了确保templates/temp_file此示例项目结构的打包内容得到打包,请在清单文件中添加如下一行:

recursive-include package *

历史记录注释: 对于 flit,poetry等现代构建后端不需要使用清单文件,默认情况下将包括包数据文件。因此,如果您正在使用pyproject.toml并且没有setup.py文件,则可以忽略有关的所有内容MANIFEST.in

现在,不用包装,放在阅读部分上…

建议:

使用标准库pkgutilAPI。在库代码中将如下所示:

# within package/mymodule1.py, for example
import pkgutil

data = pkgutil.get_data(__name__, "templates/temp_file")
print("data:", repr(data))
text = pkgutil.get_data(__name__, "templates/temp_file").decode()
print("text:", repr(text))

它可以使用拉链。它适用于Python 2和Python3。它不需要第三方依赖。我真的不知道有什么弊端(如果您愿意,请在答案上发表评论)。

避免的坏方法:

坏方法#1:使用源文件中的相对路径

这是目前公认的答案。充其量看起来像这样:

from pathlib import Path

resource_path = Path(__file__).parent / "templates"
data = resource_path.joinpath("temp_file").read_bytes()
print("data", repr(data))

怎么了 您拥有可用文件和子目录的假设是不正确的。如果执行打包在zip或wheel中的代码,则此方法不起作用,并且是否将包完全提取到文件系统中可能完全不受用户控制。

坏方法2:使用pkg_resources API

投票最多的答案对此进行了描述。看起来像这样:

from pkg_resources import resource_string

data = resource_string(__name__, "templates/temp_file")
print("data", repr(data))

怎么了 它在setuptools上添加了运行时依赖关系,最好仅是安装时间依赖关系。即使代码只对您自己的软件包资源感兴趣,导入和使用也会变得非常缓慢,因为代码会建立所有已安装软件包的工作集。在安装时这没什么大不了的(因为安装是一次性的),但是在运行时却很难看。pkg_resources

坏方法#3:使用importlib.resources API

目前,这是投票最多的答案中的建议。这是最近标准库的新增功能(Python 3.7中的新增功能),但是也有一个反向端口。看起来像这样:

try:
    from importlib.resources import read_binary
    from importlib.resources import read_text
except ImportError:
    # Python 2.x backport
    from importlib_resources import read_binary
    from importlib_resources import read_text

data = read_binary("package.templates", "temp_file")
print("data", repr(data))
text = read_text("package.templates", "temp_file")
print("text", repr(text))

怎么了 好吧,不幸的是,这还行不通… 这仍然是一个不完整的API,使用importlib.resources它将需要您添加一个空文件templates/__init__.py,以便数据文件位于子包中而不是子目录中。它还会自行将package/templates子目录显示为可导入package.templates子包。如果这没什么大不了的,并且不会打扰您,那么您可以继续在__init__.py此处添加文件,然后使用导入系统访问资源。但是,当您使用它时,也可以将其放入my_resources.py文件中,只需在模块中定义一些字节或字符串变量,然后将其导入Python代码即可。无论哪种方式,都是进口系统在做繁重的工作。

示例项目:

我已经在github上创建了一个示例项目,并上传到PyPI上,该项目演示了上面讨论的所有四种方法。试试看:

$ pip install resources-example
$ resources-example

有关更多信息,请参见https://github.com/wimglenn/resources-example

A packaging prelude:

Before you can even worry about reading resource files, the first step is to make sure that the data files are getting packaged into your distribution in the first place – it is easy to read them directly from the source tree, but the important part is making sure these resource files are accessible from code within an installed package.

Structure your project like this, putting data files into a subdirectory within the package:

.
├── package
│   ├── __init__.py
│   ├── templates
│   │   └── temp_file
│   ├── mymodule1.py
│   └── mymodule2.py
├── README.rst
├── MANIFEST.in
└── setup.py

You should pass include_package_data=True in the setup() call. The manifest file is only needed if you want to use setuptools/distutils and build source distributions. To make sure the templates/temp_file gets packaged for this example project structure, add a line like this into the manifest file:

recursive-include package *

Historical cruft note: Using a manifest file is not needed for modern build backends such as flit, poetry, which will include the package data files by default. So, if you’re using pyproject.toml and you don’t have a setup.py file then you can ignore all the stuff about MANIFEST.in.

Now, with packaging out of the way, onto the reading part…

Recommendation:

Use standard library pkgutil APIs. It’s going to look like this in library code:

# within package/mymodule1.py, for example
import pkgutil

data = pkgutil.get_data(__name__, "templates/temp_file")

It works in zips. It works on Python 2 and Python 3. It doesn’t require third-party dependencies. I’m not really aware of any downsides (if you are, then please comment on the answer).

Bad ways to avoid:

Bad way #1: using relative paths from a source file

This is currently the accepted answer. At best, it looks something like this:

from pathlib import Path

resource_path = Path(__file__).parent / "templates"
data = resource_path.joinpath("temp_file").read_bytes()

What’s wrong with that? The assumption that you have files and subdirectories available is not correct. This approach doesn’t work if executing code which is packed in a zip or a wheel, and it may be entirely out of the user’s control whether or not your package gets extracted to a filesystem at all.

Bad way #2: using pkg_resources APIs

This is described in the top-voted answer. It looks something like this:

from pkg_resources import resource_string

data = resource_string(__name__, "templates/temp_file")

What’s wrong with that? It adds a runtime dependency on setuptools, which should preferably be an install time dependency only. Importing and using pkg_resources can become really slow, as the code builds up a working set of all installed packages, even though you were only interested in your own package resources. That’s not a big deal at install time (since installation is once-off), but it’s ugly at runtime.

Bad way #3: using importlib.resources APIs

This is currently the recommendation in the top-voted answer. It’s a recent standard library addition (new in Python 3.7). It looks like this:

from importlib.resources import read_binary

data = read_binary("package.templates", "temp_file")

What’s wrong with that? Well, unfortunately, it doesn’t work…yet. This is still an incomplete API, using importlib.resources will require you to add an empty file templates/__init__.py in order that the data files will reside within a sub-package rather than in a subdirectory. It will also expose the package/templates subdirectory as an importable package.templates sub-package in its own right. If that’s not a big deal and it doesn’t bother you, then you can go ahead and add the __init__.py file there and use the import system to access resources. However, while you’re at it you may as well make it into a my_resources.py file instead, and just define some bytes or string variables in the module, then import them in Python code. It’s the import system doing the heavy lifting here either way.

Honorable mention: using newer importlib_resources APIs

This has not been mentioned in any other answers yet, but importlib_resources is more than a simple backport of the Python 3.7+ importlib.resources code. It has traversable APIs which you can use like this:

import importlib_resources

my_resources = importlib_resources.files("package")
data = (my_resources / "templates" / "temp_file").read_bytes()

This works on Python 2 and 3, it works in zips, and it doesn’t require spurious __init__.py files to be added in resource subdirectories. The only downside vs pkgutil that I can see is that these new APIs haven’t yet arrived in stdlib, so there is still a third-party dependency. Newer APIs from importlib_resources should arrive to stdlib importlib.resources in Python 3.9.

Example project:

I’ve created an example project on github and uploaded on PyPI, which demonstrates all five approaches discussed above. Try it out with:

$ pip install resources-example
$ resources-example

See https://github.com/wimglenn/resources-example for more info.


回答 3

如果你有这个结构

lidtk
├── bin
   └── lidtk
├── lidtk
   ├── analysis
      ├── char_distribution.py
      └── create_cm.py
   ├── classifiers
      ├── char_dist_metric_train_test.py
      ├── char_features.py
      ├── cld2
         ├── cld2_preds.txt
         └── cld2wili.py
      ├── get_cld2.py
      ├── text_cat
         ├── __init__.py
         ├── README.md   <---------- say you want to get this
         └── textcat_ngram.py
      └── tfidf_features.py
   ├── data
      ├── __init__.py
      ├── create_ml_dataset.py
      ├── download_documents.py
      ├── language_utils.py
      ├── pickle_to_txt.py
      └── wili.py
   ├── __init__.py
   ├── get_predictions.py
   ├── languages.csv
   └── utils.py
├── README.md
├── setup.cfg
└── setup.py

您需要以下代码:

import pkg_resources

# __name__ in case you're within the package
# - otherwise it would be 'lidtk' in this example as it is the package name
path = 'classifiers/text_cat/README.md'  # always use slash
filepath = pkg_resources.resource_filename(__name__, path)

奇怪的“总是使用斜杠”部分来自setuptoolsAPI

还要注意,如果使用路径,则即使在Windows上,也必须使用正斜杠(/)作为路径分隔符。Setuptools在生成时自动将斜杠转换为适当的特定于平台的分隔符

如果您想知道文档在哪里:

In case you have this structure

lidtk
├── bin
│   └── lidtk
├── lidtk
│   ├── analysis
│   │   ├── char_distribution.py
│   │   └── create_cm.py
│   ├── classifiers
│   │   ├── char_dist_metric_train_test.py
│   │   ├── char_features.py
│   │   ├── cld2
│   │   │   ├── cld2_preds.txt
│   │   │   └── cld2wili.py
│   │   ├── get_cld2.py
│   │   ├── text_cat
│   │   │   ├── __init__.py
│   │   │   ├── README.md   <---------- say you want to get this
│   │   │   └── textcat_ngram.py
│   │   └── tfidf_features.py
│   ├── data
│   │   ├── __init__.py
│   │   ├── create_ml_dataset.py
│   │   ├── download_documents.py
│   │   ├── language_utils.py
│   │   ├── pickle_to_txt.py
│   │   └── wili.py
│   ├── __init__.py
│   ├── get_predictions.py
│   ├── languages.csv
│   └── utils.py
├── README.md
├── setup.cfg
└── setup.py

you need this code:

import pkg_resources

# __name__ in case you're within the package
# - otherwise it would be 'lidtk' in this example as it is the package name
path = 'classifiers/text_cat/README.md'  # always use slash
filepath = pkg_resources.resource_filename(__name__, path)

The strange “always use slash” part comes from setuptools APIs

Also notice that if you use paths, you must use a forward slash (/) as the path separator, even if you are on Windows. Setuptools automatically converts slashes to appropriate platform-specific separators at build time

In case you wonder where the documentation is:


回答 4

David Beazley和Brian K. Jones撰写的Python Cookbook第三版“ 10.8。读取包中的数据文件”中的内容给出了答案。

我将它送到这里:

假设您有一个软件包,其文件组织如下:

mypackage/
    __init__.py
    somedata.dat
    spam.py

现在,假设文件spam.py要读取文件somedata.dat的内容。为此,请使用以下代码:

import pkgutil
data = pkgutil.get_data(__package__, 'somedata.dat')

结果变量数据将是一个字节字符串,其中包含文件的原始内容。

get_data()的第一个参数是包含程序包名称的字符串。您可以直接提供它,也可以使用特殊变量,例如__package__。第二个参数是包中文件的相对名称。如有必要,您可以使用标准Unix文件名约定浏览到其他目录,只要最终目录仍位于包中即可。

这样,该软件包可以安装为目录,.zip或.egg。

The content in “10.8. Reading Datafiles Within a Package” of Python Cookbook, Third Edition by David Beazley and Brian K. Jones giving the answers.

I’ll just get it to here:

Suppose you have a package with files organized as follows:

mypackage/
    __init__.py
    somedata.dat
    spam.py

Now suppose the file spam.py wants to read the contents of the file somedata.dat. To do it, use the following code:

import pkgutil
data = pkgutil.get_data(__package__, 'somedata.dat')

The resulting variable data will be a byte string containing the raw contents of the file.

The first argument to get_data() is a string containing the package name. You can either supply it directly or use a special variable, such as __package__. The second argument is the relative name of the file within the package. If necessary, you can navigate into different directories using standard Unix filename conventions as long as the final directory is still located within the package.

In this way, the package can installed as directory, .zip or .egg.


回答 5

包中的每个python模块都有一个__file__属性

您可以将其用作:

import os 
from mypackage

templates_dir = os.path.join(os.path.dirname(mypackage.__file__), 'templates')
template_file = os.path.join(templates_dir, 'template.txt')

有关鸡蛋资源,请参见:http : //peak.telecommunity.com/DevCenter/PythonEggs#accessing-package-resources

Every python module in your package has a __file__ attribute

You can use it as:

import os 
from mypackage

templates_dir = os.path.join(os.path.dirname(mypackage.__file__), 'templates')
template_file = os.path.join(templates_dir, 'template.txt')

For egg resources see: http://peak.telecommunity.com/DevCenter/PythonEggs#accessing-package-resources


回答 6

假设您使用的是鸡蛋文件;未提取:

我通过使用后安装脚本在最近的项目中“解决”了该问题,该脚本将我的模板从egg(zip文件)提取到文件系统中的正确目录。这是我发现的最快,最可靠的解决方案,因为__path__[0]有时使用会出错(我不记得这个名称了,但是我至少浏览了一个库,在列表的前面增加了一些东西!)。

通常,鸡蛋文件通常也被即时提取到一个称为“鸡蛋缓存”的临时位置。您可以在启动脚本之前甚至以后使用环境变量来更改该位置。

os.environ['PYTHON_EGG_CACHE'] = path

但是,有pkg_resources可能会正确完成此工作。

assuming you are using an egg file; not extracted:

I “solved” this in a recent project, by using a postinstall script, that extracts my templates from the egg (zip file) to the proper directory in the filesystem. It was the quickest, most reliable solution I found, since working with __path__[0] can go wrong sometimes (i don’t recall the name, but i cam across at least one library, that added something in front of that list!).

Also egg files are usually extracted on the fly to a temporary location called the “egg cache”. You can change that location using an environment variable, either before starting your script or even later, eg.

os.environ['PYTHON_EGG_CACHE'] = path

However there is pkg_resources that might do the job properly.


如何在python中检测文件是否为二进制(非文本)?

问题:如何在python中检测文件是否为二进制(非文本)?

如何判断python中的文件是否为二进制(非文本)?

我正在通过python搜索大量文件,并始终在二进制文件中获取匹配项。这使输出看起来异常混乱。

我知道我可以用 grep -I,但是我对数据所做的事情超出了grep所允许的范围。

过去,我只会搜索大于的字符0x7f,但是utf8类似的字符在现代系统上是不可能做到的。理想情况下,解决方案应该很快,但是任何解决方案都可以。

How can I tell if a file is binary (non-text) in Python?

I am searching through a large set of files in Python, and keep getting matches in binary files. This makes the output look incredibly messy.

I know I could use grep -I, but I am doing more with the data than what grep allows for.

In the past, I would have just searched for characters greater than 0x7f, but utf8 and the like, make that impossible on modern systems. Ideally, the solution would be fast.


回答 0

您还可以使用mimetypes模块:

import mimetypes
...
mime = mimetypes.guess_type(file)

编译二进制mime类型列表非常容易。例如,Apache分发了mime.types文件,您可以将其解析为一组列表(二进制和文本),然后检查该mime是否在文本列表或二进制列表中。

You can also use the mimetypes module:

import mimetypes
...
mime = mimetypes.guess_type(file)

It’s fairly easy to compile a list of binary mime types. For example Apache distributes with a mime.types file that you could parse into a set of lists, binary and text and then check to see if the mime is in your text or binary list.


回答 1

基于file(1)行为的另一种方法:

>>> textchars = bytearray({7,8,9,10,12,13,27} | set(range(0x20, 0x100)) - {0x7f})
>>> is_binary_string = lambda bytes: bool(bytes.translate(None, textchars))

例:

>>> is_binary_string(open('/usr/bin/python', 'rb').read(1024))
True
>>> is_binary_string(open('/usr/bin/dh_python3', 'rb').read(1024))
False

Yet another method based on file(1) behavior:

>>> textchars = bytearray({7,8,9,10,12,13,27} | set(range(0x20, 0x100)) - {0x7f})
>>> is_binary_string = lambda bytes: bool(bytes.translate(None, textchars))

Example:

>>> is_binary_string(open('/usr/bin/python', 'rb').read(1024))
True
>>> is_binary_string(open('/usr/bin/dh_python3', 'rb').read(1024))
False

回答 2

如果您将utf-8与python3配合使用,那就很简单了,只要以文本模式打开文件,如果得到,就停止处理UnicodeDecodeError。当以文本模式(和二进制模式的字节数组)处理文件时,Python3将使用unicode-如果您的编码无法解码任意文件,则很可能会得到UnicodeDecodeError

例:

try:
    with open(filename, "r") as f:
        for l in f:
             process_line(l)
except UnicodeDecodeError:
    pass # Fond non-text data

If you’re using python3 with utf-8 it is straight forward, just open the file in text mode and stop processing if you get an UnicodeDecodeError. Python3 will use unicode when handling files in text mode (and bytearray in binary mode) – if your encoding can’t decode arbitrary files it’s quite likely that you will get UnicodeDecodeError.

Example:

try:
    with open(filename, "r") as f:
        for l in f:
             process_line(l)
except UnicodeDecodeError:
    pass # Fond non-text data

回答 3

如果有帮助,许多二进制类型都以魔术数字开头。这是文件签名的列表

If it helps, many many binary types begin with a magic numbers. Here is a list of file signatures.


回答 4

试试这个:

def is_binary(filename):
    """Return true if the given filename is binary.
    @raise EnvironmentError: if the file does not exist or cannot be accessed.
    @attention: found @ http://bytes.com/topic/python/answers/21222-determine-file-type-binary-text on 6/08/2010
    @author: Trent Mick <TrentM@ActiveState.com>
    @author: Jorge Orpinel <jorge@orpinel.com>"""
    fin = open(filename, 'rb')
    try:
        CHUNKSIZE = 1024
        while 1:
            chunk = fin.read(CHUNKSIZE)
            if '\0' in chunk: # found null byte
                return True
            if len(chunk) < CHUNKSIZE:
                break # done
    # A-wooo! Mira, python no necesita el "except:". Achis... Que listo es.
    finally:
        fin.close()

    return False

Try this:

def is_binary(filename):
    """Return true if the given filename is binary.
    @raise EnvironmentError: if the file does not exist or cannot be accessed.
    @attention: found @ http://bytes.com/topic/python/answers/21222-determine-file-type-binary-text on 6/08/2010
    @author: Trent Mick <TrentM@ActiveState.com>
    @author: Jorge Orpinel <jorge@orpinel.com>"""
    fin = open(filename, 'rb')
    try:
        CHUNKSIZE = 1024
        while 1:
            chunk = fin.read(CHUNKSIZE)
            if '\0' in chunk: # found null byte
                return True
            if len(chunk) < CHUNKSIZE:
                break # done
    # A-wooo! Mira, python no necesita el "except:". Achis... Que listo es.
    finally:
        fin.close()

    return False

回答 5

这是使用Unix file命令的建议:

import re
import subprocess

def istext(path):
    return (re.search(r':.* text',
                      subprocess.Popen(["file", '-L', path], 
                                       stdout=subprocess.PIPE).stdout.read())
            is not None)

用法示例:

>>> istext('/ etc / motd') 
真正
>>> istext('/ vmlinuz') 
假
>>> open('/ tmp / japanese')。read()
'\ xe3 \ x81 \ x93 \ xe3 \ x82 \ x8c \ xe3 \ x81 \ xaf \ xe3 \ x80 \ x81 \ xe3 \ x81 \ xbf \ xe3 \ x81 \ x9a \ xe3 \ x81 \ x8c \ xe3 \ x82 \ x81 \ xe5 \ xba \ xa7 \ xe3 \ x81 \ xae \ xe6 \ x99 \ x82 \ xe4 \ xbb \ xa3 \ xe3 \ x81 \ xae \ xe5 \ xb9 \ x95 \ xe9 \ x96 \ x8b \ xe3 \ x81 \ x91 \ xe3 \ x80 \ x82 \ n'
>>> istext('/ tmp / japanese')#适用于UTF-8
真正

它的缺点是无法移植到Windows(除非file那里有类似命令的内容),并且必须为每个文件生成一个外部进程,这可能是不合时宜的。

Here’s a suggestion that uses the Unix file command:

import re
import subprocess

def istext(path):
    return (re.search(r':.* text',
                      subprocess.Popen(["file", '-L', path], 
                                       stdout=subprocess.PIPE).stdout.read())
            is not None)

Example usage:

>>> istext('/etc/motd') 
True
>>> istext('/vmlinuz') 
False
>>> open('/tmp/japanese').read()
'\xe3\x81\x93\xe3\x82\x8c\xe3\x81\xaf\xe3\x80\x81\xe3\x81\xbf\xe3\x81\x9a\xe3\x81\x8c\xe3\x82\x81\xe5\xba\xa7\xe3\x81\xae\xe6\x99\x82\xe4\xbb\xa3\xe3\x81\xae\xe5\xb9\x95\xe9\x96\x8b\xe3\x81\x91\xe3\x80\x82\n'
>>> istext('/tmp/japanese') # works on UTF-8
True

It has the downsides of not being portable to Windows (unless you have something like the file command there), and having to spawn an external process for each file, which might not be palatable.


回答 6

使用binaryornot库(GitHub)。

它非常简单,并且基于此stackoverflow问题中的代码。

您实际上可以用2行代码编写此代码,但是此包使您不必编写和使用各种奇怪的文件类型(跨平台)彻底测试这两行代码。

Use binaryornot library (GitHub).

It is very simple and based on the code found in this stackoverflow question.

You can actually write this in 2 lines of code, however this package saves you from having to write and thoroughly test those 2 lines of code with all sorts of weird file types, cross-platform.


回答 7

通常你不得不猜测。

如果文件中包含扩展名,则可以将其视为一条线索。

您还可以识别已知的二进制格式,而忽略它们。

否则,请查看您拥有不可打印ASCII字节的比例,并从中进行猜测。

您也可以尝试从UTF-8解码,看看是否会产生合理的输出。

Usually you have to guess.

You can look at the extensions as one clue, if the files have them.

You can also recognise know binary formats, and ignore those.

Otherwise see what proportion of non-printable ASCII bytes you have and take a guess from that.

You can also try decoding from UTF-8 and see if that produces sensible output.


回答 8

简短的解决方案,带有UTF-16警告:

def is_binary(filename):
    """ 
    Return true if the given filename appears to be binary.
    File is considered to be binary if it contains a NULL byte.
    FIXME: This approach incorrectly reports UTF-16 as binary.
    """
    with open(filename, 'rb') as f:
        for block in f:
            if b'\0' in block:
                return True
    return False

A shorter solution, with a UTF-16 warning:

def is_binary(filename):
    """ 
    Return true if the given filename appears to be binary.
    File is considered to be binary if it contains a NULL byte.
    FIXME: This approach incorrectly reports UTF-16 as binary.
    """
    with open(filename, 'rb') as f:
        for block in f:
            if b'\0' in block:
                return True
    return False

回答 9

我们可以使用python本身来检查文件是否为二进制文件,因为如果尝试以文本模式打开二进制文件,则文件将失败

def is_binary(file_name):
    try:
        with open(file_name, 'tr') as check_file:  # try open file in text mode
            check_file.read()
            return False
    except:  # if fail then file is non-text (binary)
        return True

We can use python itself to check if a file is binary, because it fails if we try to open binary file in text mode

def is_binary(file_name):
    try:
        with open(file_name, 'tr') as check_file:  # try open file in text mode
            check_file.read()
            return False
    except:  # if fail then file is non-text (binary)
        return True

回答 10

如果您不在Windows上,则可以使用Python Magic确定文件类型。然后,您可以检查它是否为文本/哑剧类型。

If you’re not on Windows, you can use Python Magic to determine the filetype. Then you can check if it is a text/ mime type.


回答 11

这是一个函数,该函数首先检查文件是否以BOM表开头,如果不是,则在初始8192字节中寻找零字节:

import codecs


#: BOMs to indicate that a file is a text file even if it contains zero bytes.
_TEXT_BOMS = (
    codecs.BOM_UTF16_BE,
    codecs.BOM_UTF16_LE,
    codecs.BOM_UTF32_BE,
    codecs.BOM_UTF32_LE,
    codecs.BOM_UTF8,
)


def is_binary_file(source_path):
    with open(source_path, 'rb') as source_file:
        initial_bytes = source_file.read(8192)
    return not any(initial_bytes.startswith(bom) for bom in _TEXT_BOMS) \
           and b'\0' in initial_bytes

从技术上讲,无需检查UTF-8 BOM,因为出于所有实际目的,它不应包含零字节。但是,由于这是一种非常常见的编码,因此可以更快地在开头检查BOM,而不是扫描所有8192字节的0。

Here’s a function that first checks if the file starts with a BOM and if not looks for a zero byte within the initial 8192 bytes:

import codecs


#: BOMs to indicate that a file is a text file even if it contains zero bytes.
_TEXT_BOMS = (
    codecs.BOM_UTF16_BE,
    codecs.BOM_UTF16_LE,
    codecs.BOM_UTF32_BE,
    codecs.BOM_UTF32_LE,
    codecs.BOM_UTF8,
)


def is_binary_file(source_path):
    with open(source_path, 'rb') as source_file:
        initial_bytes = source_file.read(8192)
    return not any(initial_bytes.startswith(bom) for bom in _TEXT_BOMS) \
           and b'\0' in initial_bytes

Technically the check for the UTF-8 BOM is unnecessary because it should not contain zero bytes for all practical purpose. But as it is a very common encoding it’s quicker to check for the BOM in the beginning instead of scanning all the 8192 bytes for 0.


回答 12

尝试使用当前维护的python-magic,它与@Kami Kisiel的答案中的模块不同。这确实支持包括Windows在内的所有平台,但是您将需要libmagic二进制文件。自述文件对此进行了说明。

mimetypes模块不同,它不使用文件扩展名,而是检查文件的内容。

>>> import magic
>>> magic.from_file("testdata/test.pdf", mime=True)
'application/pdf'
>>> magic.from_file("testdata/test.pdf")
'PDF document, version 1.2'
>>> magic.from_buffer(open("testdata/test.pdf").read(1024))
'PDF document, version 1.2'

Try using the currently maintained python-magic which is not the same module in @Kami Kisiel’s answer. This does support all platforms including Windows however you will need the libmagic binary files. This is explained in the README.

Unlike the mimetypes module, it doesn’t use the file’s extension and instead inspects the contents of the file.

>>> import magic
>>> magic.from_file("testdata/test.pdf", mime=True)
'application/pdf'
>>> magic.from_file("testdata/test.pdf")
'PDF document, version 1.2'
>>> magic.from_buffer(open("testdata/test.pdf").read(1024))
'PDF document, version 1.2'

回答 13

我来这里的目的是寻找完全相同的东西-标准库提供的一种全面的解决方案,用于检测二进制或文本。在回顾了人们建议的选项之后,nix file命令似乎是最佳选择(我仅针对Linux boxen开发)。其他一些人使用文件发布了解决方案,但我认为它们不必要地复杂,所以我想出了以下内容:

def test_file_isbinary(filename):
    cmd = shlex.split("file -b -e soft '{}'".format(filename))
    if subprocess.check_output(cmd)[:4] in {'ASCI', 'UTF-'}:
        return False
    return True

不用说,但是调用此函数的代码应确保在测试文件之前可以读取文件,否则会错误地将文件检测为二进制文件。

I came here looking for exactly the same thing–a comprehensive solution provided by the standard library to detect binary or text. After reviewing the options people suggested, the nix file command looks to be the best choice (I’m only developing for linux boxen). Some others posted solutions using file but they are unnecessarily complicated in my opinion, so here’s what I came up with:

def test_file_isbinary(filename):
    cmd = shlex.split("file -b -e soft '{}'".format(filename))
    if subprocess.check_output(cmd)[:4] in {'ASCI', 'UTF-'}:
        return False
    return True

It should go without saying, but your code that calls this function should make sure you can read a file before testing it, otherwise this will be mistakenly detect the file as binary.


回答 14

我猜最好的解决方案是使用guess_type函数。它包含一个具有多个mimetypes的列表,您还可以包括自己的类型。这是我用来解决问题的脚本:

from mimetypes import guess_type
from mimetypes import add_type

def __init__(self):
        self.__addMimeTypes()

def __addMimeTypes(self):
        add_type("text/plain",".properties")

def __listDir(self,path):
        try:
            return listdir(path)
        except IOError:
            print ("The directory {0} could not be accessed".format(path))

def getTextFiles(self, path):
        asciiFiles = []
        for files in self.__listDir(path):
            if guess_type(files)[0].split("/")[0] == "text":
                asciiFiles.append(files)
        try:
            return asciiFiles
        except NameError:
            print ("No text files in directory: {0}".format(path))
        finally:
            del asciiFiles

您可以在代码的结构中看到它在类的内部。但是您几乎可以更改要在应用程序中实现的东西。使用起来非常简单。方法getTextFiles返回带有所有文本文件的列表对象,这些文本文件位于您在path变量中传递的目录中。

I guess that the best solution is to use the guess_type function. It holds a list with several mimetypes and you can also include your own types. Here come the script that I did to solve my problem:

from mimetypes import guess_type
from mimetypes import add_type

def __init__(self):
        self.__addMimeTypes()

def __addMimeTypes(self):
        add_type("text/plain",".properties")

def __listDir(self,path):
        try:
            return listdir(path)
        except IOError:
            print ("The directory {0} could not be accessed".format(path))

def getTextFiles(self, path):
        asciiFiles = []
        for files in self.__listDir(path):
            if guess_type(files)[0].split("/")[0] == "text":
                asciiFiles.append(files)
        try:
            return asciiFiles
        except NameError:
            print ("No text files in directory: {0}".format(path))
        finally:
            del asciiFiles

It is inside of a Class, as you can see based on the ustructure of the code. But you can pretty much change the things you want to implement it inside your application. It`s quite simple to use. The method getTextFiles returns a list object with all the text files that resides on the directory you pass in path variable.


回答 15

在* NIX上:

如果您有权使用fileshell命令,shlex可以帮助使子流程模块更可用:

from os.path import realpath
from subprocess import check_output
from shlex import split

filepath = realpath('rel/or/abs/path/to/file')
assert 'ascii' in check_output(split('file {}'.format(filepth).lower()))

或者,您也可以将其置于for循环中,以使用以下命令获取当前目录中所有文件的输出:

import os
for afile in [x for x in os.listdir('.') if os.path.isfile(x)]:
    assert 'ascii' in check_output(split('file {}'.format(afile).lower()))

或所有子目录:

for curdir, filelist in zip(os.walk('.')[0], os.walk('.')[2]):
     for afile in filelist:
         assert 'ascii' in check_output(split('file {}'.format(afile).lower()))

on *NIX:

If you have access to the file shell-command, shlex can help make the subprocess module more usable:

from os.path import realpath
from subprocess import check_output
from shlex import split

filepath = realpath('rel/or/abs/path/to/file')
assert 'ascii' in check_output(split('file {}'.format(filepth).lower()))

Or, you could also stick that in a for-loop to get output for all files in the current dir using:

import os
for afile in [x for x in os.listdir('.') if os.path.isfile(x)]:
    assert 'ascii' in check_output(split('file {}'.format(afile).lower()))

or for all subdirs:

for curdir, filelist in zip(os.walk('.')[0], os.walk('.')[2]):
     for afile in filelist:
         assert 'ascii' in check_output(split('file {}'.format(afile).lower()))

回答 16

如果大多数程序都包含NULL字符,则认为该文件是二进制文件(该文件不是“面向行的”文件)。

这是用Python实现的perl版本的pp_fttext()pp_sys.c):

import sys
PY3 = sys.version_info[0] == 3

# A function that takes an integer in the 8-bit range and returns
# a single-character byte object in py3 / a single-character string
# in py2.
#
int2byte = (lambda x: bytes((x,))) if PY3 else chr

_text_characters = (
        b''.join(int2byte(i) for i in range(32, 127)) +
        b'\n\r\t\f\b')

def istextfile(fileobj, blocksize=512):
    """ Uses heuristics to guess whether the given file is text or binary,
        by reading a single block of bytes from the file.
        If more than 30% of the chars in the block are non-text, or there
        are NUL ('\x00') bytes in the block, assume this is a binary file.
    """
    block = fileobj.read(blocksize)
    if b'\x00' in block:
        # Files with null bytes are binary
        return False
    elif not block:
        # An empty file is considered a valid text file
        return True

    # Use translate's 'deletechars' argument to efficiently remove all
    # occurrences of _text_characters from the block
    nontext = block.translate(None, _text_characters)
    return float(len(nontext)) / len(block) <= 0.30

还要注意,此代码被编写为可以在Python 2和Python 3上运行而无需更改。

资料来源:Perl的“猜测文件是文本还是二进制”,用Python实现

Most of the programs consider the file to be binary (which is any file that is not “line-oriented”) if it contains a NULL character.

Here is perl’s version of pp_fttext() (pp_sys.c) implemented in Python:

import sys
PY3 = sys.version_info[0] == 3

# A function that takes an integer in the 8-bit range and returns
# a single-character byte object in py3 / a single-character string
# in py2.
#
int2byte = (lambda x: bytes((x,))) if PY3 else chr

_text_characters = (
        b''.join(int2byte(i) for i in range(32, 127)) +
        b'\n\r\t\f\b')

def istextfile(fileobj, blocksize=512):
    """ Uses heuristics to guess whether the given file is text or binary,
        by reading a single block of bytes from the file.
        If more than 30% of the chars in the block are non-text, or there
        are NUL ('\x00') bytes in the block, assume this is a binary file.
    """
    block = fileobj.read(blocksize)
    if b'\x00' in block:
        # Files with null bytes are binary
        return False
    elif not block:
        # An empty file is considered a valid text file
        return True

    # Use translate's 'deletechars' argument to efficiently remove all
    # occurrences of _text_characters from the block
    nontext = block.translate(None, _text_characters)
    return float(len(nontext)) / len(block) <= 0.30

Note also that this code was written to run on both Python 2 and Python 3 without changes.

Source: Perl’s “guess if file is text or binary” implemented in Python


回答 17

您在Unix中吗?如果是这样,请尝试:

isBinary = os.system("file -b" + name + " | grep text > /dev/null")

外壳程序的返回值是反转的(0是可以的,因此,如果找到“文本”,则它将返回0,在Python中是False表达式)。

are you in unix? if so, then try:

isBinary = os.system("file -b" + name + " | grep text > /dev/null")

The shell return values are inverted (0 is ok, so if it finds “text” then it will return a 0, and in Python that is a False expression).


回答 18

比较简单的方法是\x00使用in运算符检查文件是否包含NULL字符(),例如:

b'\x00' in open("foo.bar", 'rb').read()

请参见下面的完整示例:

#!/usr/bin/env python3
import argparse
if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('file', nargs=1)
    args = parser.parse_args()
    with open(args.file[0], 'rb') as f:
        if b'\x00' in f.read():
            print('The file is binary!')
        else:
            print('The file is not binary!')

用法示例:

$ ./is_binary.py /etc/hosts
The file is not binary!
$ ./is_binary.py `which which`
The file is binary!

Simpler way is to check if the file consist NULL character (\x00) by using in operator, for instance:

b'\x00' in open("foo.bar", 'rb').read()

See below the complete example:

#!/usr/bin/env python3
import argparse
if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('file', nargs=1)
    args = parser.parse_args()
    with open(args.file[0], 'rb') as f:
        if b'\x00' in f.read():
            print('The file is binary!')
        else:
            print('The file is not binary!')

Sample usage:

$ ./is_binary.py /etc/hosts
The file is not binary!
$ ./is_binary.py `which which`
The file is binary!

回答 19

from binaryornot.check import is_binary
is_binary('filename')

文献资料

from binaryornot.check import is_binary
is_binary('filename')

Documentation


os.walk,无需深入研究以下目录

问题:os.walk,无需深入研究以下目录

如何限制os.walk仅返回提供的目录中的文件?

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        for f in files:
            if os.path.splitext(f)[1] in whitelist:
                outputList.append(os.path.join(root, f))
            else:
                self._email_to_("ignore")
    return outputList

How do I limit os.walk to only return files in the directory I provide it?

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        for f in files:
            if os.path.splitext(f)[1] in whitelist:
                outputList.append(os.path.join(root, f))
            else:
                self._email_to_("ignore")
    return outputList

回答 0

使用walklevel功能。

import os

def walklevel(some_dir, level=1):
    some_dir = some_dir.rstrip(os.path.sep)
    assert os.path.isdir(some_dir)
    num_sep = some_dir.count(os.path.sep)
    for root, dirs, files in os.walk(some_dir):
        yield root, dirs, files
        num_sep_this = root.count(os.path.sep)
        if num_sep + level <= num_sep_this:
            del dirs[:]

它的工作方式与相似os.walk,但是您可以向其传递一个level参数,该参数指示递归进行的深度。

Use the walklevel function.

import os

def walklevel(some_dir, level=1):
    some_dir = some_dir.rstrip(os.path.sep)
    assert os.path.isdir(some_dir)
    num_sep = some_dir.count(os.path.sep)
    for root, dirs, files in os.walk(some_dir):
        yield root, dirs, files
        num_sep_this = root.count(os.path.sep)
        if num_sep + level <= num_sep_this:
            del dirs[:]

It works just like os.walk, but you can pass it a level parameter that indicates how deep the recursion will go.


回答 1

不要使用os.walk。

例:

import os

root = "C:\\"
for item in os.listdir(root):
    if os.path.isfile(os.path.join(root, item)):
        print item

Don’t use os.walk.

Example:

import os

root = "C:\\"
for item in os.listdir(root):
    if os.path.isfile(os.path.join(root, item)):
        print item

回答 2

我认为解决方案实际上非常简单。

break

仅执行for循环的第一次迭代,必须有一种更优雅的方法。

for root, dirs, files in os.walk(dir_name):
    for f in files:
        ...
        ...
    break
...

首次调用os.walk时,它将返回当前目录的郁金香,然后在下一个循环中循环下一个目录的内容。

使用原始脚本,然后添加一个break即可

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        for f in files:
            if os.path.splitext(f)[1] in whitelist:
                outputList.append(os.path.join(root, f))
            else:
                self._email_to_("ignore")
        break
    return outputList

I think the solution is actually very simple.

use

break

to only do first iteration of the for loop, there must be a more elegant way.

for root, dirs, files in os.walk(dir_name):
    for f in files:
        ...
        ...
    break
...

The first time you call os.walk, it returns tulips for the current directory, then on next loop the contents of the next directory.

Take original script and just add a break.

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        for f in files:
            if os.path.splitext(f)[1] in whitelist:
                outputList.append(os.path.join(root, f))
            else:
                self._email_to_("ignore")
        break
    return outputList

回答 3

使用建议listdir是一个很好的建议。在Python 2中,您的问题的直接答案是root, dirs, files = os.walk(dir_name).next()

等效的Python 3语法是 root, dirs, files = next(os.walk(dir_name))

The suggestion to use listdir is a good one. The direct answer to your question in Python 2 is root, dirs, files = os.walk(dir_name).next().

The equivalent Python 3 syntax is root, dirs, files = next(os.walk(dir_name))


回答 4

您可以使用os.listdir()which返回给定目录中的名称列表(包括文件和目录)。如果需要区分文件和目录,请调用os.stat()每个名称。

You could use os.listdir() which returns a list of names (for both files and directories) in a given directory. If you need to distinguish between files and directories, call os.stat() on each name.


回答 5

如果您的需求不仅仅是顶层目录(例如,忽略VCS目录等),还可以修改目录列表以防止os.walk在目录中递归。

即:

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        dirs[:] = [d for d in dirs if is_good(d)]
        for f in files:
            do_stuff()

注意-请小心更改列表,而不是重新绑定它。显然,os.walk不了解外部重新绑定。

If you have more complex requirements than just the top directory (eg ignore VCS dirs etc), you can also modify the list of directories to prevent os.walk recursing through them.

ie:

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        dirs[:] = [d for d in dirs if is_good(d)]
        for f in files:
            do_stuff()

Note – be careful to mutate the list, rather than just rebind it. Obviously os.walk doesn’t know about the external rebinding.


回答 6

for path, dirs, files in os.walk('.'):
    print path, dirs, files
    del dirs[:] # go only one level deep
for path, dirs, files in os.walk('.'):
    print path, dirs, files
    del dirs[:] # go only one level deep

回答 7

与的想法相同listdir,但更简短:

[f for f in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, f))]

The same idea with listdir, but shorter:

[f for f in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, f))]

回答 8

感觉就像在扔我的2便士。

baselevel = len(rootdir.split("\\"))
for subdirs, dirs, files in os.walk(rootdir):
    curlevel = len(subdirs.split("\\"))
    if curlevel <= baselevel + 1:
        [do stuff]

Felt like throwing my 2 pence in.

baselevel = len(rootdir.split("\\"))
for subdirs, dirs, files in os.walk(rootdir):
    curlevel = len(subdirs.split("\\"))
    if curlevel <= baselevel + 1:
        [do stuff]

回答 9

在Python 3中,我能够做到这一点:

import os
dir = "/path/to/files/"

#List all files immediately under this folder:
print ( next( os.walk(dir) )[2] )

#List all folders immediately under this folder:
print ( next( os.walk(dir) )[1] )

In Python 3, I was able to do this:

import os
dir = "/path/to/files/"

#List all files immediately under this folder:
print ( next( os.walk(dir) )[2] )

#List all folders immediately under this folder:
print ( next( os.walk(dir) )[1] )

回答 10

Python 3.5开始,您可以使用os.scandir代替os.listdir。您将获得DirEntry对象的迭代器,而不是字符串。从文档:

使用scandir()而不是listdir()可以大大提高还需要文件类型或文件属性信息的代码的性能,因为DirEntry如果操作系统在扫描目录时提供此信息,则对象会公开此信息。所有DirEntry方法可以执行系统调用,但is_dir()is_file()通常只需要一个系统调用的符号链接; DirEntry.stat()在Unix上始终需要系统调用,而在Windows上仅需要一个系统调用即可。

您可以访问该对象的名称,DirEntry.name然后该名称就相当于该对象的输出os.listdir

Since Python 3.5 you can use os.scandir instead of os.listdir. Instead of strings you get an iterator of DirEntry objects in return. From the docs:

Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because DirEntry objects expose this information if the operating system provides it when scanning a directory. All DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows.

You can access the name of the object via DirEntry.name which is then equivalent to the output of os.listdir


回答 11

您还可以执行以下操作:

for path, subdirs, files in os.walk(dir_name):
    for name in files:
        if path == ".": #this will filter the files in the current directory
             #code here

You could also do the following:

for path, subdirs, files in os.walk(dir_name):
    for name in files:
        if path == ".": #this will filter the files in the current directory
             #code here

回答 12

这就是我解决的方法

if recursive:
    items = os.walk(target_directory)
else:
    items = [next(os.walk(target_directory))]

...

This is how I solved it

if recursive:
    items = os.walk(target_directory)
else:
    items = [next(os.walk(target_directory))]

...

回答 13

使用listdir时有一个陷阱。os.path.isdir(identifier)必须是绝对路径。要选择子目录,请执行以下操作:

for dirname in os.listdir(rootdir):
  if os.path.isdir(os.path.join(rootdir, dirname)):
     print("I got a subdirectory: %s" % dirname)

替代方法是更改​​目录,以在没有os.path.join()的情况下进行测试。

There is a catch when using listdir. The os.path.isdir(identifier) must be an absolute path. To pick subdirectories you do:

for dirname in os.listdir(rootdir):
  if os.path.isdir(os.path.join(rootdir, dirname)):
     print("I got a subdirectory: %s" % dirname)

The alternative is to change to the directory to do the testing without the os.path.join().


回答 14

您可以使用此代码段

for root, dirs, files in os.walk(directory):
    if level > 0:
        # do some stuff
    else:
        break
    level-=1

You can use this snippet

for root, dirs, files in os.walk(directory):
    if level > 0:
        # do some stuff
    else:
        break
    level-=1

回答 15

创建一个排除列表,使用fnmatch跳过目录结构并执行此过程

excludes= ['a\*\b', 'c\d\e']
for root, directories, files in os.walk('Start_Folder'):
    if not any(fnmatch.fnmatch(nf_root, pattern) for pattern in excludes):
        for root, directories, files in os.walk(nf_root):
            ....
            do the process
            ....

与“包含”相同:

if **any**(fnmatch.fnmatch(nf_root, pattern) for pattern in **includes**):

create a list of excludes, use fnmatch to skip the directory structure and do the process

excludes= ['a\*\b', 'c\d\e']
for root, directories, files in os.walk('Start_Folder'):
    if not any(fnmatch.fnmatch(nf_root, pattern) for pattern in excludes):
        for root, directories, files in os.walk(nf_root):
            ....
            do the process
            ....

same as for ‘includes’:

if **any**(fnmatch.fnmatch(nf_root, pattern) for pattern in **includes**):

回答 16

为什么不简单地使用range和并os.walk结合zip?不是最佳解决方案,但也可以。

例如这样:

# your part before
for count, (root, dirs, files) in zip(range(0, 1), os.walk(dir_name)):
    # logic stuff
# your later part

适用于python 3。

另外:A break更简单。(看@Pieter的答案)

Why not simply use a range and os.walk combined with the zip? Is not the best solution, but would work too.

For example like this:

# your part before
for count, (root, dirs, files) in zip(range(0, 1), os.walk(dir_name)):
    # logic stuff
# your later part

Works for me on python 3.

Also: A break is simpler too btw. (Look at the answer from @Pieter)


回答 17

亚历克斯的答案略有变化,但使用__next__()

print(next(os.walk('d:/'))[2]) 要么 print(os.walk('d:/').__next__()[2])

[2]作为fileroot, dirs, file其他的答案中提到

A slight change to Alex’s answer, but using __next__():

print(next(os.walk('d:/'))[2]) or print(os.walk('d:/').__next__()[2])

with the [2] being the file in root, dirs, file mentioned in other answers


回答 18

os.walk找到的每个目录的根文件夹都会更改。我求解器检查根==目录

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        if root == dir_name: #This only meet parent folder
            for f in files:
                if os.path.splitext(f)[1] in whitelist:
                    outputList.append(os.path.join(root, f))
                else:
                    self._email_to_("ignore")
    return outputList

root folder changes for every directory os.walk finds. I solver that checking if root == directory

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        if root == dir_name: #This only meet parent folder
            for f in files:
                if os.path.splitext(f)[1] in whitelist:
                    outputList.append(os.path.join(root, f))
                else:
                    self._email_to_("ignore")
    return outputList

回答 19

import os

def listFiles(self, dir_name):
    names = []
    for root, directory, files in os.walk(dir_name):
        if root == dir_name:
            for name in files:
                names.append(name)
    return names
import os

def listFiles(self, dir_name):
    names = []
    for root, directory, files in os.walk(dir_name):
        if root == dir_name:
            for name in files:
                names.append(name)
    return names

加载和解析具有多个JSON对象的JSON文件

问题:加载和解析具有多个JSON对象的JSON文件

我正在尝试在Python中加载和解析JSON文件。但是我在尝试加载文件时遇到了麻烦:

import json
json_data = open('file')
data = json.load(json_data)

Yield:

ValueError: Extra data: line 2 column 1 - line 225116 column 1 (char 232 - 160128774)

我看着18.2。json Python文档中的JSON编码器和解码器,但是通读这个看起来糟透了的文档非常令人沮丧。

前几行(用随机条目匿名):

{"votes": {"funny": 2, "useful": 5, "cool": 1}, "user_id": "harveydennis", "name": "Jasmine Graham", "url": "http://example.org/user_details?userid=harveydennis", "average_stars": 3.5, "review_count": 12, "type": "user"}
{"votes": {"funny": 1, "useful": 2, "cool": 4}, "user_id": "njohnson", "name": "Zachary Ballard", "url": "https://www.example.com/user_details?userid=njohnson", "average_stars": 3.5, "review_count": 12, "type": "user"}
{"votes": {"funny": 1, "useful": 0, "cool": 4}, "user_id": "david06", "name": "Jonathan George", "url": "https://example.com/user_details?userid=david06", "average_stars": 3.5, "review_count": 12, "type": "user"}
{"votes": {"funny": 6, "useful": 5, "cool": 0}, "user_id": "santiagoerika", "name": "Amanda Taylor", "url": "https://www.example.com/user_details?userid=santiagoerika", "average_stars": 3.5, "review_count": 12, "type": "user"}
{"votes": {"funny": 1, "useful": 8, "cool": 2}, "user_id": "rodriguezdennis", "name": "Jennifer Roach", "url": "http://www.example.com/user_details?userid=rodriguezdennis", "average_stars": 3.5, "review_count": 12, "type": "user"}

I am trying to load and parse a JSON file in Python. But I’m stuck trying to load the file:

import json
json_data = open('file')
data = json.load(json_data)

Yields:

ValueError: Extra data: line 2 column 1 - line 225116 column 1 (char 232 - 160128774)

I looked at 18.2. json — JSON encoder and decoder in the Python documentation, but it’s pretty discouraging to read through this horrible-looking documentation.

First few lines (anonymized with randomized entries):

{"votes": {"funny": 2, "useful": 5, "cool": 1}, "user_id": "harveydennis", "name": "Jasmine Graham", "url": "http://example.org/user_details?userid=harveydennis", "average_stars": 3.5, "review_count": 12, "type": "user"}
{"votes": {"funny": 1, "useful": 2, "cool": 4}, "user_id": "njohnson", "name": "Zachary Ballard", "url": "https://www.example.com/user_details?userid=njohnson", "average_stars": 3.5, "review_count": 12, "type": "user"}
{"votes": {"funny": 1, "useful": 0, "cool": 4}, "user_id": "david06", "name": "Jonathan George", "url": "https://example.com/user_details?userid=david06", "average_stars": 3.5, "review_count": 12, "type": "user"}
{"votes": {"funny": 6, "useful": 5, "cool": 0}, "user_id": "santiagoerika", "name": "Amanda Taylor", "url": "https://www.example.com/user_details?userid=santiagoerika", "average_stars": 3.5, "review_count": 12, "type": "user"}
{"votes": {"funny": 1, "useful": 8, "cool": 2}, "user_id": "rodriguezdennis", "name": "Jennifer Roach", "url": "http://www.example.com/user_details?userid=rodriguezdennis", "average_stars": 3.5, "review_count": 12, "type": "user"}

回答 0

您有一个JSON Lines格式的文本文件。您需要逐行解析文件:

import json

data = []
with open('file') as f:
    for line in f:
        data.append(json.loads(line))

行都包含有效的JSON,但总体而言,它不是有效的JSON值,因为没有顶级列表或对象定义。

请注意,由于该文件每行包含JSON,因此您无需费力地尝试一次性分析所有内容或找出流JSON解析器。现在,您可以选择在继续进行下一行之前分别处理每一行,从而节省了进程中的内存。如果文件很大,您可能不想将每个结果附加到一个列表中,然后再处理所有内容。

如果您有一个文件,其中包含带有分隔符的单个JSON对象,请使用如何使用“ json”模块一次读取一个JSON对象?使用缓冲方法解析单个对象。

You have a JSON Lines format text file. You need to parse your file line by line:

import json

data = []
with open('file') as f:
    for line in f:
        data.append(json.loads(line))

Each line contains valid JSON, but as a whole, it is not a valid JSON value as there is no top-level list or object definition.

Note that because the file contains JSON per line, you are saved the headaches of trying to parse it all in one go or to figure out a streaming JSON parser. You can now opt to process each line separately before moving on to the next, saving memory in the process. You probably don’t want to append each result to one list and then process everything if your file is really big.

If you have a file containing individual JSON objects with delimiters in-between, use How do I use the ‘json’ module to read in one JSON object at a time? to parse out individual objects using a buffered method.


回答 1

对于那些绊倒这个问题的人:python jsonlines库(比这个问题要年轻得多)优雅地处理每行一个json文档的文件。参见https://jsonlines.readthedocs.io/

for those stumbling upon this question: the python jsonlines library (much younger than this question) elegantly handles files with one json document per line. see https://jsonlines.readthedocs.io/


回答 2

病了格式化。每行有一个JSON对象,但是它们不包含在较大的数据结构(即数组)中。您可能需要重新格式化它,使其以每行结尾处的逗号开头[和结尾],或者将其作为单独的字典逐行进行解析。

That is ill-formatted. You have one JSON object per line, but they are not contained in a larger data structure (ie an array). You’ll either need to reformat it so that it begins with [ and ends with ] with a comma at the end of each line, or parse it line by line as separate dictionaries.


确定目录是否可写

问题:确定目录是否可写

在Python中确定执行脚本的用户是否可写目录的最佳方法是什么?因为这可能涉及使用os模块,所以我应该提到我是在* nix环境下运行它的。

What would be the best way in Python to determine whether a directory is writeable for the user executing the script? Since this will likely involve using the os module I should mention I’m running it under a *nix environment.


回答 0

尽管Christophe建议的是更Python化的解决方案,但os模块确实具有os.access函数来检查访问:

os.access('/path/to/folder', os.W_OK) #W_OK用于写入,R_OK用于读取,等等。

Although what Christophe suggested is a more Pythonic solution, the os module does have the os.access function to check access:

os.access('/path/to/folder', os.W_OK) # W_OK is for writing, R_OK for reading, etc.


回答 1

提出这个建议似乎很奇怪,但是一个常见的Python习惯用法是

寻求宽恕比获得许可要容易

遵循这一习语,人们可能会说:

尝试写入有问题的目录,如果没有权限,则捕获错误。

It may seem strange to suggest this, but a common Python idiom is

It’s easier to ask for forgiveness than for permission

Following that idiom, one might say:

Try writing to the directory in question, and catch the error if you don’t have the permission to do so.


回答 2

我使用tempfile模块的解决方案:

import tempfile
import errno

def isWritable(path):
    try:
        testfile = tempfile.TemporaryFile(dir = path)
        testfile.close()
    except OSError as e:
        if e.errno == errno.EACCES:  # 13
            return False
        e.filename = path
        raise
    return True

更新:在Windows上再次测试代码后,我发现在那里使用tempfile确实存在问题,请参见issue22107:tempfile模块错误地解释了Windows上的拒绝访问错误。对于不可写的目录,代码会挂起几秒钟,最后抛出IOError: [Errno 17] No usable temporary file name found。也许这是user2171842正在观察的内容?不幸的是,该问题暂时无法解决,因此要解决此问题,还必须捕获该错误:

    except (OSError, IOError) as e:
        if e.errno == errno.EACCES or e.errno == errno.EEXIST:  # 13, 17

那时在这些情况下当然仍然存在延迟。

My solution using the tempfile module:

import tempfile
import errno

def isWritable(path):
    try:
        testfile = tempfile.TemporaryFile(dir = path)
        testfile.close()
    except OSError as e:
        if e.errno == errno.EACCES:  # 13
            return False
        e.filename = path
        raise
    return True

Update: After testing the code again on Windows I see that there is indeed an issue when using tempfile there, see issue22107: tempfile module misinterprets access denied error on Windows. In the case of a non-writable directory, the code hangs for several seconds and finally throws an IOError: [Errno 17] No usable temporary file name found. Maybe this is what user2171842 was observing? Unfortunately the issue is not resolved for now so to handle this, the error needs to be catched as well:

    except (OSError, IOError) as e:
        if e.errno == errno.EACCES or e.errno == errno.EEXIST:  # 13, 17

The delay is of course still present in these cases then.


回答 3

偶然发现该线程在寻找某人的示例。恭喜,您在Google上获得了第一个结果!

人们谈论在此线程中使用Python的方式,但是没有简单的代码示例吗?在这里,对于任何偶然发现的人:

import sys

filepath = 'C:\\path\\to\\your\\file.txt'

try:
    filehandle = open( filepath, 'w' )
except IOError:
    sys.exit( 'Unable to write to file ' + filepath )

filehandle.write("I am writing this text to the file\n")

这会尝试打开文件句柄进行写入,如果指定的文件无法写入,则会退出并返回错误:这更容易阅读,并且比对文件路径或目录进行预检查要好得多,因为它避免了比赛条件;在运行预检查的时间到实际尝试写入文件之间文件不可写的情况。

Stumbled across this thread searching for examples for someone. First result on Google, congrats!

People talk about the Pythonic way of doing it in this thread, but no simple code examples? Here you go, for anyone else who stumbles in:

import sys

filepath = 'C:\\path\\to\\your\\file.txt'

try:
    filehandle = open( filepath, 'w' )
except IOError:
    sys.exit( 'Unable to write to file ' + filepath )

filehandle.write("I am writing this text to the file\n")

This attempts to open a filehandle for writing, and exits with an error if the file specified cannot be written to: This is far easier to read, and is a much better way of doing it rather than doing prechecks on the file path or the directory, as it avoids race conditions; cases where the file becomes unwriteable between the time you run the precheck, and when you actually attempt to write to the file.


回答 4

如果您只关心文件烫发,os.access(path, os.W_OK)应按要求进行操作。相反,如果您想知道是否可以写入该目录,则可以编写open()一个用于写入的测试文件(该文件不应事先存在),捕获并检查其中的任何IOError文件,然后清理该测试文件。

更一般而言,为避免TOCTOU攻击(仅当脚本以提升的特权-suid或cgi或更高的特权运行时才出现问题),您不应该真正信任这些提前测试,而应该放弃privs,执行open()并期望的IOError

If you only care about the file perms, os.access(path, os.W_OK) should do what you ask for. If you instead want to know whether you can write to the directory, open() a test file for writing (it shouldn’t exist beforehand), catch and examine any IOError, and clean up the test file afterwards.

More generally, to avoid TOCTOU attacks (only a problem if your script runs with elevated privileges — suid or cgi or so), you shouldn’t really trust these ahead-of-time tests, but drop privs, do the open(), and expect the IOError.


回答 5

检查模式位:

def isWritable(name):
  uid = os.geteuid()
  gid = os.getegid()
  s = os.stat(dirname)
  mode = s[stat.ST_MODE]
  return (
     ((s[stat.ST_UID] == uid) and (mode & stat.S_IWUSR)) or
     ((s[stat.ST_GID] == gid) and (mode & stat.S_IWGRP)) or
     (mode & stat.S_IWOTH)
     )

Check the mode bits:

def isWritable(name):
  uid = os.geteuid()
  gid = os.getegid()
  s = os.stat(dirname)
  mode = s[stat.ST_MODE]
  return (
     ((s[stat.ST_UID] == uid) and (mode & stat.S_IWUSR)) or
     ((s[stat.ST_GID] == gid) and (mode & stat.S_IWGRP)) or
     (mode & stat.S_IWOTH)
     )

回答 6

这是我根据ChristopheD的答案创建的:

import os

def isWritable(directory):
    try:
        tmp_prefix = "write_tester";
        count = 0
        filename = os.path.join(directory, tmp_prefix)
        while(os.path.exists(filename)):
            filename = "{}.{}".format(os.path.join(directory, tmp_prefix),count)
            count = count + 1
        f = open(filename,"w")
        f.close()
        os.remove(filename)
        return True
    except Exception as e:
        #print "{}".format(e)
        return False

directory = "c:\\"
if (isWritable(directory)):
    print "directory is writable"
else:
    print "directory is not writable"

Here is something I created based on ChristopheD’s answer:

import os

def isWritable(directory):
    try:
        tmp_prefix = "write_tester";
        count = 0
        filename = os.path.join(directory, tmp_prefix)
        while(os.path.exists(filename)):
            filename = "{}.{}".format(os.path.join(directory, tmp_prefix),count)
            count = count + 1
        f = open(filename,"w")
        f.close()
        os.remove(filename)
        return True
    except Exception as e:
        #print "{}".format(e)
        return False

directory = "c:\\"
if (isWritable(directory)):
    print "directory is writable"
else:
    print "directory is not writable"

回答 7

 if os.access(path_to_folder, os.W_OK) is not True:
            print("Folder not writable")
 else :
            print("Folder writable")

有关访问的更多信息可以在这里找到

 if os.access(path_to_folder, os.W_OK) is not True:
            print("Folder not writable")
 else :
            print("Folder writable")

more info about access can be find it here


回答 8

通过argparse添加参数时,我遇到了同样的需求。内置type=FileType('w')目录对我不起作用,因为我在寻找目录。我最终写出了自己的方法来解决我的问题。这是argparse代码段的结果。

#! /usr/bin/env python
import os
import argparse

def writable_dir(dir):
    if os.access(dir, os.W_OK) and os.path.isdir(dir):
        return os.path.abspath(dir)
    else:
        raise argparse.ArgumentTypeError(dir + " is not writable or does not exist.")

parser = argparse.ArgumentParser()
parser.add_argument("-d","--dir", type=writable_dir(), default='/tmp/',
    help="Directory to use. Default: /tmp")
opts = parser.parse_args()

结果如下:

$ python dir-test.py -h
usage: dir-test.py [-h] [-d DIR]

optional arguments:
  -h, --help         show this help message and exit
  -d DIR, --dir DIR  Directory to use. Default: /tmp

$ python dir-test.py -d /not/real
usage: dir-test.py [-h] [-d DIR]
dir-test.py: error: argument -d/--dir: /not/real is not writable or does not exist.

$ python dir-test.py -d ~

回过头来,在最后添加了print opts.dir,一切似乎都可以正常运行了。

I ran into this same need while adding an argument via argparse. The built in type=FileType('w') wouldn’t work for me as I was looking for a directory. I ended up writing my own method to solve my problem. Here is the result with argparse snippet.

#! /usr/bin/env python
import os
import argparse

def writable_dir(dir):
    if os.access(dir, os.W_OK) and os.path.isdir(dir):
        return os.path.abspath(dir)
    else:
        raise argparse.ArgumentTypeError(dir + " is not writable or does not exist.")

parser = argparse.ArgumentParser()
parser.add_argument("-d","--dir", type=writable_dir(), default='/tmp/',
    help="Directory to use. Default: /tmp")
opts = parser.parse_args()

That results in the following:

$ python dir-test.py -h
usage: dir-test.py [-h] [-d DIR]

optional arguments:
  -h, --help         show this help message and exit
  -d DIR, --dir DIR  Directory to use. Default: /tmp

$ python dir-test.py -d /not/real
usage: dir-test.py [-h] [-d DIR]
dir-test.py: error: argument -d/--dir: /not/real is not writable or does not exist.

$ python dir-test.py -d ~

I went back and added print opts.dir to the end, and everything appears to be functioning as desired.


回答 9

如果您需要检查其他用户的权限(是的,我知道这与问题相矛盾,但可能对某人有用),则可以通过pwd模块和目录的模式位来进行检查。

免责声明 -在Windows上不起作用,因为它不使用POSIX权限模型(并且该pwd模块在那里不可用),例如-仅针对* nix系统的解决方案。

请注意,目录必须设置所有3位-读,写和eXecute。
好的,R不是绝对必须的,但是没有,您不能在目录中列出条目(因此您必须知道它们的名称)。另一方面,绝对需要执行-没有用户无法读取文件的inode;因此即使没有W也无法创建或修改W。在此链接上有更详细的说明。

最后,这些模式在stat模块中可用,其描述在inode(7)man中

示例代码如何检查:

import pwd
import stat
import os

def check_user_dir(user, directory):
    dir_stat = os.stat(directory)

    user_id, group_id = pwd.getpwnam(user).pw_uid, pwd.getpwnam(user).pw_gid
    directory_mode = dir_stat[stat.ST_MODE]

    # use directory_mode as mask 
    if user_id == dir_stat[stat.ST_UID] and stat.S_IRWXU & directory_mode == stat.S_IRWXU:     # owner and has RWX
        return True
    elif group_id == dir_stat[stat.ST_GID] and stat.S_IRWXG & directory_mode == stat.S_IRWXG:  # in group & it has RWX
        return True
    elif stat.S_IRWXO & directory_mode == stat.S_IRWXO:                                        # everyone has RWX
        return True

    # no permissions
    return False

If you need to check the permission of another user (yes, I realize this contradicts the question, but may come in handy for someone), you can do it through the pwd module, and the directory’s mode bits.

Disclaimer – does not work on Windows, as it doesn’t use the POSIX permissions model (and the pwd module is not available there), e.g. – solution only for *nix systems.

Note that a directory has to have all the 3 bits set – Read, Write and eXecute.
Ok, R is not an absolute must, but w/o it you cannot list the entries in the directory (so you have to know their names). Execute on the other hand is absolutely needed – w/o it the user cannot read the file’s inodes; so even having W, without X files cannot be created or modified. More detailed explanation at this link.

Finally, the modes are available in the stat module, their descriptions are in inode(7) man.

Sample code how to check:

import pwd
import stat
import os

def check_user_dir(user, directory):
    dir_stat = os.stat(directory)

    user_id, group_id = pwd.getpwnam(user).pw_uid, pwd.getpwnam(user).pw_gid
    directory_mode = dir_stat[stat.ST_MODE]

    # use directory_mode as mask 
    if user_id == dir_stat[stat.ST_UID] and stat.S_IRWXU & directory_mode == stat.S_IRWXU:     # owner and has RWX
        return True
    elif group_id == dir_stat[stat.ST_GID] and stat.S_IRWXG & directory_mode == stat.S_IRWXG:  # in group & it has RWX
        return True
    elif stat.S_IRWXO & directory_mode == stat.S_IRWXO:                                        # everyone has RWX
        return True

    # no permissions
    return False

Python逐行写入CSV

问题:Python逐行写入CSV

我有通过http请求访问的数据,并由服务器以逗号分隔的格式发送回去,我有以下代码:

site= 'www.example.com'
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
soup = soup.get_text()
text=str(soup)

文本内容如下:

april,2,5,7
may,3,5,8
june,4,7,3
july,5,6,9

如何将这些数据保存到CSV文件中。我知道我可以按照以下步骤做一些事情,逐行进行迭代:

import StringIO
s = StringIO.StringIO(text)
for line in s:

但是我不确定现在如何正确地将每一行写入CSV

编辑—>感谢您提供的反馈,该解决方案非常简单,可以在下面看到。

解:

import StringIO
s = StringIO.StringIO(text)
with open('fileName.csv', 'w') as f:
    for line in s:
        f.write(line)

I have data which is being accessed via http request and is sent back by the server in a comma separated format, I have the following code :

site= 'www.example.com'
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
soup = soup.get_text()
text=str(soup)

The content of text is as follows:

april,2,5,7
may,3,5,8
june,4,7,3
july,5,6,9

How can I save this data into a CSV file. I know I can do something along the lines of the following to iterate line by line:

import StringIO
s = StringIO.StringIO(text)
for line in s:

But i’m unsure how to now properly write each line to CSV

EDIT—> Thanks for the feedback as suggested the solution was rather simple and can be seen below.

Solution:

import StringIO
s = StringIO.StringIO(text)
with open('fileName.csv', 'w') as f:
    for line in s:
        f.write(line)

回答 0

一般方式:

##text=List of strings to be written to file
with open('csvfile.csv','wb') as file:
    for line in text:
        file.write(line)
        file.write('\n')

要么

使用CSV编写器:

import csv
with open(<path to output_csv>, "wb") as csv_file:
        writer = csv.writer(csv_file, delimiter=',')
        for line in data:
            writer.writerow(line)

要么

最简单的方法:

f = open('csvfile.csv','w')
f.write('hi there\n') #Give your csv text here.
## Python will convert \n to os.linesep
f.close()

General way:

##text=List of strings to be written to file
with open('csvfile.csv','wb') as file:
    for line in text:
        file.write(line)
        file.write('\n')

OR

Using CSV writer :

import csv
with open(<path to output_csv>, "wb") as csv_file:
        writer = csv.writer(csv_file, delimiter=',')
        for line in data:
            writer.writerow(line)

OR

Simplest way:

f = open('csvfile.csv','w')
f.write('hi there\n') #Give your csv text here.
## Python will convert \n to os.linesep
f.close()

回答 1

您可以像写入任何普通文件一样直接写入文件。

with open('csvfile.csv','wb') as file:
    for l in text:
        file.write(l)
        file.write('\n')

如果以防万一,它是一个列表列表,您可以直接使用内置csv模块

import csv

with open("csvfile.csv", "wb") as file:
    writer = csv.writer(file)
    writer.writerows(text)

You could just write to the file as you would write any normal file.

with open('csvfile.csv','wb') as file:
    for l in text:
        file.write(l)
        file.write('\n')

If just in case, it is a list of lists, you could directly use built-in csv module

import csv

with open("csvfile.csv", "wb") as file:
    writer = csv.writer(file)
    writer.writerows(text)

回答 2

我只需将每一行写入文件,因为它已经是CSV格式:

write_file = "output.csv"
with open(write_file, "w") as output:
    for line in text:
        output.write(line + '\n')

我现在不记得如何写带有换行符的行,尽管:p

此外,你可能想看看这个答案write()writelines()'\n'

I would simply write each line to a file, since it’s already in a CSV format:

write_file = "output.csv"
with open(write_file, "w") as output:
    for line in text:
        output.write(line + '\n')

I can’t recall how to write lines with line-breaks at the moment, though :p

Also, you might like to take a look at this answer about write(), writelines(), and '\n'.


回答 3

为了补充前面的答案,我快速上了一堂课来写CSV文件。如果您必须处理多个文件,它可以更轻松地管理和关闭打开的文件,并实现一致性和更简洁的代码。

class CSVWriter():

    filename = None
    fp = None
    writer = None

    def __init__(self, filename):
        self.filename = filename
        self.fp = open(self.filename, 'w', encoding='utf8')
        self.writer = csv.writer(self.fp, delimiter=';', quotechar='"', quoting=csv.QUOTE_ALL, lineterminator='\n')

    def close(self):
        self.fp.close()

    def write(self, elems):
        self.writer.writerow(elems)

    def size(self):
        return os.path.getsize(self.filename)

    def fname(self):
        return self.filename

用法示例:

mycsv = CSVWriter('/tmp/test.csv')
mycsv.write((12,'green','apples'))
mycsv.write((7,'yellow','bananas'))
mycsv.close()
print("Written %d bytes to %s" % (mycsv.size(), mycsv.fname()))

玩得开心

To complement the previous answers, I whipped up a quick class to write to CSV files. It makes it easier to manage and close open files and achieve consistency and cleaner code if you have to deal with multiple files.

class CSVWriter():

    filename = None
    fp = None
    writer = None

    def __init__(self, filename):
        self.filename = filename
        self.fp = open(self.filename, 'w', encoding='utf8')
        self.writer = csv.writer(self.fp, delimiter=';', quotechar='"', quoting=csv.QUOTE_ALL, lineterminator='\n')

    def close(self):
        self.fp.close()

    def write(self, elems):
        self.writer.writerow(elems)

    def size(self):
        return os.path.getsize(self.filename)

    def fname(self):
        return self.filename

Example usage:

mycsv = CSVWriter('/tmp/test.csv')
mycsv.write((12,'green','apples'))
mycsv.write((7,'yellow','bananas'))
mycsv.close()
print("Written %d bytes to %s" % (mycsv.size(), mycsv.fname()))

Have fun


回答 4

那这个呢:

with open("your_csv_file.csv", "w") as f:
    f.write("\n".join(text))

str.join()返回一个字符串,该字符串是可迭代的字符串的串联。元素之间的分隔符是提供此方法的字符串。

What about this:

with open("your_csv_file.csv", "w") as f:
    f.write("\n".join(text))

str.join() Return a string which is the concatenation of the strings in iterable. The separator between elements is the string providing this method.


将列表的Python列表写入csv文件

问题:将列表的Python列表写入csv文件

我有一长串以下形式的清单-

a = [[1.2,'abc',3],[1.2,'werew',4],........,[1.4,'qew',2]]

即列表中的值是不同的类型-浮点数,整数,字符串。如何将其写入csv文件,以便输出的csv文件看起来像

1.2,abc,3
1.2,werew,4
.
.
.
1.4,qew,2

I have a long list of lists of the following form —

a = [[1.2,'abc',3],[1.2,'werew',4],........,[1.4,'qew',2]]

i.e. the values in the list are of different types — float,int, strings.How do I write it into a csv file so that my output csv file looks like

1.2,abc,3
1.2,werew,4
.
.
.
1.4,qew,2

回答 0

Python的内置CSV模块可以轻松处理此问题:

import csv

with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(a)

假设您的问题中的清单定义为a。您可以通过各种可选参数来调整输出CSV的确切格式,csv.writer()如上面链接的库参考页中所述。

Python 3更新

import csv

with open("out.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(a)

Python’s built-in CSV module can handle this easily:

import csv

with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(a)

This assumes your list is defined as a, as it is in your question. You can tweak the exact format of the output CSV via the various optional parameters to csv.writer() as documented in the library reference page linked above.

Update for Python 3

import csv

with open("out.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(a)

回答 1

您可以使用pandas

In [1]: import pandas as pd

In [2]: a = [[1.2,'abc',3],[1.2,'werew',4],[1.4,'qew',2]]

In [3]: my_df = pd.DataFrame(a)

In [4]: my_df.to_csv('my_csv.csv', index=False, header=False)

You could use pandas:

In [1]: import pandas as pd

In [2]: a = [[1.2,'abc',3],[1.2,'werew',4],[1.4,'qew',2]]

In [3]: my_df = pd.DataFrame(a)

In [4]: my_df.to_csv('my_csv.csv', index=False, header=False)

回答 2

import csv
with open(file_path, 'a') as outcsv:   
    #configure writer to write standard csv file
    writer = csv.writer(outcsv, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL, lineterminator='\n')
    writer.writerow(['number', 'text', 'number'])
    for item in list:
        #Write item to outcsv
        writer.writerow([item[0], item[1], item[2]])

官方文档:http : //docs.python.org/2/library/csv.html

import csv
with open(file_path, 'a') as outcsv:   
    #configure writer to write standard csv file
    writer = csv.writer(outcsv, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL, lineterminator='\n')
    writer.writerow(['number', 'text', 'number'])
    for item in list:
        #Write item to outcsv
        writer.writerow([item[0], item[1], item[2]])

official docs: http://docs.python.org/2/library/csv.html


回答 3

如果出于某种原因你想要做手工(不使用模块一样csvpandasnumpy等):

with open('myfile.csv','w') as f:
    for sublist in mylist:
        for item in sublist:
            f.write(item + ',')
        f.write('\n')

当然,滚动自己的版本可能容易出错且效率低下……这通常就是为什么要使用该模块的原因。但是有时编写自己的内容可以帮助您了解它们的工作原理,有时则更容易。

If for whatever reason you wanted to do it manually (without using a module like csv,pandas,numpy etc.):

with open('myfile.csv','w') as f:
    for sublist in mylist:
        for item in sublist:
            f.write(item + ',')
        f.write('\n')

Of course, rolling your own version can be error-prone and inefficient … that’s usually why there’s a module for that. But sometimes writing your own can help you understand how they work, and sometimes it’s just easier.


回答 4

在我很大的列表中使用csv.writer花费了很长时间。我决定使用熊猫,它更快,更容易控制和理解:

 import pandas

 yourlist = [[...],...,[...]]
 pd = pandas.DataFrame(yourlist)
 pd.to_csv("mylist.csv")

好的部分,您可以更改一些内容以制作更好的csv文件:

 yourlist = [[...],...,[...]]
 columns = ["abcd","bcde","cdef"] #a csv with 3 columns
 index = [i[0] for i in yourlist] #first element of every list in yourlist
 not_index_list = [i[1:] for i in yourlist]
 pd = pandas.DataFrame(not_index_list, columns = columns, index = index)

 #Now you have a csv with columns and index:
 pd.to_csv("mylist.csv")

Using csv.writer in my very large list took quite a time. I decided to use pandas, it was faster and more easy to control and understand:

 import pandas

 yourlist = [[...],...,[...]]
 pd = pandas.DataFrame(yourlist)
 pd.to_csv("mylist.csv")

The good part you can change somethings to make a better csv file:

 yourlist = [[...],...,[...]]
 columns = ["abcd","bcde","cdef"] #a csv with 3 columns
 index = [i[0] for i in yourlist] #first element of every list in yourlist
 not_index_list = [i[1:] for i in yourlist]
 pd = pandas.DataFrame(not_index_list, columns = columns, index = index)

 #Now you have a csv with columns and index:
 pd.to_csv("mylist.csv")

回答 5

Ambers的解决方案也适用于numpy数组:

from pylab import *
import csv

array_=arange(0,10,1)
list_=[array_,array_*2,array_*3]
with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(list_)

Ambers’s solution also works well for numpy arrays:

from pylab import *
import csv

array_=arange(0,10,1)
list_=[array_,array_*2,array_*3]
with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(list_)

回答 6

如果您不想为此导入csv模块,则可以仅使用Python内置组件将列表列表写入csv文件

with open("output.csv", "w") as f:
    for row in a:
        f.write("%s\n" % ','.join(str(col) for col in row))

If you don’t want to import csv module for that, you can write a list of lists to a csv file using only Python built-ins

with open("output.csv", "w") as f:
    for row in a:
        f.write("%s\n" % ','.join(str(col) for col in row))

回答 7

确保lineterinator='\n'创建创作者时注明;否则,当数据源来自其他csv文件时,每条数据行之后可能会在文件中写入多余的空行…

这是我的解决方案:

with open('csvfile', 'a') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter='    ',quotechar='|', quoting=csv.QUOTE_MINIMAL, lineterminator='\n')
for i in range(0, len(data)):
    spamwriter.writerow(data[i])

Make sure to indicate lineterinator='\n' when create the writer; otherwise, an extra empty line might be written into file after each data line when data sources are from other csv file…

Here is my solution:

with open('csvfile', 'a') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter='    ',quotechar='|', quoting=csv.QUOTE_MINIMAL, lineterminator='\n')
for i in range(0, len(data)):
    spamwriter.writerow(data[i])

回答 8

如何将列表列表转储到pickle中并使用pickle模块恢复呢?很方便

>>> import pickle
>>> 
>>> mylist = [1, 'foo', 'bar', {1, 2, 3}, [ [1,4,2,6], [3,6,0,10]]]
>>> with open('mylist', 'wb') as f:
...     pickle.dump(mylist, f) 


>>> with open('mylist', 'rb') as f:
...      mylist = pickle.load(f)
>>> mylist
[1, 'foo', 'bar', {1, 2, 3}, [[1, 4, 2, 6], [3, 6, 0, 10]]]
>>> 

How about dumping the list of list into pickle and restoring it with pickle module? It’s quite convenient.

>>> import pickle
>>> 
>>> mylist = [1, 'foo', 'bar', {1, 2, 3}, [ [1,4,2,6], [3,6,0,10]]]
>>> with open('mylist', 'wb') as f:
...     pickle.dump(mylist, f) 


>>> with open('mylist', 'rb') as f:
...      mylist = pickle.load(f)
>>> mylist
[1, 'foo', 'bar', {1, 2, 3}, [[1, 4, 2, 6], [3, 6, 0, 10]]]
>>> 

回答 9

在csv.writer函数中使用换行符跟随示例时,出现错误消息。以下代码对我有用。

 with open(strFileName, "w") as f:
    writer = csv.writer(f, delimiter=',',  quoting=csv.QUOTE_MINIMAL)
    writer.writerows(result)

I got an error message when following the examples with a newline parameter in the csv.writer function. The following code worked for me.

 with open(strFileName, "w") as f:
    writer = csv.writer(f, delimiter=',',  quoting=csv.QUOTE_MINIMAL)
    writer.writerows(result)

基本的HTTP文件下载并保存到python中的磁盘上?

问题:基本的HTTP文件下载并保存到python中的磁盘上?

我是Python的新手,并且已经在本网站上进行了问答,以解答我的问题。但是,我是一个初学者,我发现很难理解某些解决方案。我需要一个非常基本的解决方案。

有人可以向我解释一下“通过http下载文件”和“在Windows中保存到磁盘”的简单解决方案吗?

我也不知道如何使用shutil和os模块。

我要下载的文件不到500 MB,是一个.gz存档文件。如果有人可以解释如何提取存档并利用其中的文件,那就太好了!

这是部分解决方案,是我根据各种答案写的:

import requests
import os
import shutil

global dump

def download_file():
    global dump
    url = "http://randomsite.com/file.gz"
    file = requests.get(url, stream=True)
    dump = file.raw

def save_file():
    global dump
    location = os.path.abspath("D:\folder\file.gz")
    with open("file.gz", 'wb') as location:
        shutil.copyfileobj(dump, location)
    del dump

有人可以指出错误(初学者水平)并解释执行此操作的更简单方法吗?

谢谢!

I’m new to Python and I’ve been going through the Q&A on this site, for an answer to my question. However, I’m a beginner and I find it difficult to understand some of the solutions. I need a very basic solution.

Could someone please explain a simple solution to ‘Downloading a file through http’ and ‘Saving it to disk, in Windows’, to me?

I’m not sure how to use shutil and os modules, either.

The file I want to download is under 500 MB and is an .gz archive file.If someone can explain how to extract the archive and utilise the files in it also, that would be great!

Here’s a partial solution, that I wrote from various answers combined:

import requests
import os
import shutil

global dump

def download_file():
    global dump
    url = "http://randomsite.com/file.gz"
    file = requests.get(url, stream=True)
    dump = file.raw

def save_file():
    global dump
    location = os.path.abspath("D:\folder\file.gz")
    with open("file.gz", 'wb') as location:
        shutil.copyfileobj(dump, location)
    del dump

Could someone point out errors (beginner level) and explain any easier methods to do this?

Thanks!


回答 0

一种下载文件的干净方法是:

import urllib

testfile = urllib.URLopener()
testfile.retrieve("http://randomsite.com/file.gz", "file.gz")

这将从网站下载文件并命名file.gz。这是我最喜欢的解决方案之一,从通过urllib和python下载图片开始

本示例使用该urllib库,它将直接从源中检索文件。

A clean way to download a file is:

import urllib

testfile = urllib.URLopener()
testfile.retrieve("http://randomsite.com/file.gz", "file.gz")

This downloads a file from a website and names it file.gz. This is one of my favorite solutions, from Downloading a picture via urllib and python.

This example uses the urllib library, and it will directly retrieve the file form a source.


回答 1

如前所述这里

import urllib
urllib.urlretrieve ("http://randomsite.com/file.gz", "file.gz")

EDIT:如果您仍想使用请求,请查看此问题问题

As mentioned here:

import urllib
urllib.urlretrieve ("http://randomsite.com/file.gz", "file.gz")

EDIT: If you still want to use requests, take a look at this question or this one.


回答 2

我用wget

如果您想举例说明简单而又好的库?

import wget

file_url = 'http://johndoe.com/download.zip'

file_name = wget.download(file_url)

wget模块支持python 2和python 3版本

I use wget.

Simple and good library if you want to example?

import wget

file_url = 'http://johndoe.com/download.zip'

file_name = wget.download(file_url)

wget module support python 2 and python 3 versions


回答 3

四种使用wget,urllib和request的方法。

#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile as profile
import urllib
import wget


url = 'https://tinypng.com/images/social/website.jpg'

def testRequest():
    image_name = 'test1.jpg'
    r = requests.get(url, stream=True)
    with open(image_name, 'wb') as f:
        for chunk in r.iter_content():
            f.write(chunk)

def testRequest2():
    image_name = 'test2.jpg'
    r = requests.get(url)
    i = Image.open(StringIO(r.content))
    i.save(image_name)

def testUrllib():
    image_name = 'test3.jpg'
    testfile = urllib.URLopener()
    testfile.retrieve(url, image_name)

def testwget():
    image_name = 'test4.jpg'
    wget.download(url, image_name)

if __name__ == '__main__':
    profile.run('testRequest()')
    profile.run('testRequest2()')
    profile.run('testUrllib()')
    profile.run('testwget()')

testRequest-在20.236秒内调用4469882函数(4469842基本调用)

testRequest2-8580个函数调用(8574个基本调用)在0.072秒内

testUrllib-在0.036秒内调用3810个函数(调用3775个原始函数)

testwget-在0.020秒内调用3489函数

Four methods using wget, urllib and request.

#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile as profile
import urllib
import wget


url = 'https://tinypng.com/images/social/website.jpg'

def testRequest():
    image_name = 'test1.jpg'
    r = requests.get(url, stream=True)
    with open(image_name, 'wb') as f:
        for chunk in r.iter_content():
            f.write(chunk)

def testRequest2():
    image_name = 'test2.jpg'
    r = requests.get(url)
    i = Image.open(StringIO(r.content))
    i.save(image_name)

def testUrllib():
    image_name = 'test3.jpg'
    testfile = urllib.URLopener()
    testfile.retrieve(url, image_name)

def testwget():
    image_name = 'test4.jpg'
    wget.download(url, image_name)

if __name__ == '__main__':
    profile.run('testRequest()')
    profile.run('testRequest2()')
    profile.run('testUrllib()')
    profile.run('testwget()')

testRequest – 4469882 function calls (4469842 primitive calls) in 20.236 seconds

testRequest2 – 8580 function calls (8574 primitive calls) in 0.072 seconds

testUrllib – 3810 function calls (3775 primitive calls) in 0.036 seconds

testwget – 3489 function calls in 0.020 seconds


回答 4

对于Python3 +, URLopener已弃用。使用时会出现如下错误:

url_opener = urllib.URLopener()AttributeError:模块’urllib’没有属性’URLopener’

因此,请尝试:

import urllib.request 
urllib.request.urlretrieve(url, filename)

For Python3+ URLopener is deprecated. And when used you will get error as below:

url_opener = urllib.URLopener() AttributeError: module ‘urllib’ has no attribute ‘URLopener’

So, try:

import urllib.request 
urllib.request.urlretrieve(url, filename)

回答 5

异国Windows解决方案

import subprocess

subprocess.run("powershell Invoke-WebRequest {} -OutFile {}".format(your_url, filename), shell=True)

Exotic Windows Solution

import subprocess

subprocess.run("powershell Invoke-WebRequest {} -OutFile {}".format(your_url, filename), shell=True)

回答 6

我开始沿着这条路走,因为ESXi的wget没有使用SSL编译,我想将OVA从供应商的网站直接下载到位于世界另一端的ESXi主机上。

我必须通过编辑规则来禁用防火墙(懒惰)/启用https(正确)

创建了python脚本:

import ssl
import shutil
import tempfile
import urllib.request
context = ssl._create_unverified_context()

dlurl='https://somesite/path/whatever'
with urllib.request.urlopen(durl, context=context) as response:
    with open("file.ova", 'wb') as tmp_file:
        shutil.copyfileobj(response, tmp_file)

ESXi库是配对的,但是开源的鼬鼠安装程序似乎将urllib用于https …因此启发了我走这条路

I started down this path because ESXi’s wget is not compiled with SSL and I wanted to download an OVA from a vendor’s website directly onto the ESXi host which is on the other side of the world.

I had to disable the firewall(lazy)/enable https out by editing the rules(proper)

created the python script:

import ssl
import shutil
import tempfile
import urllib.request
context = ssl._create_unverified_context()

dlurl='https://somesite/path/whatever'
with urllib.request.urlopen(durl, context=context) as response:
    with open("file.ova", 'wb') as tmp_file:
        shutil.copyfileobj(response, tmp_file)

ESXi libraries are kind of paired down but the open source weasel installer seemed to use urllib for https… so it inspired me to go down this path


回答 7

另一种保存文件的干净方法是:

import csv
import urllib

urllib.retrieve("your url goes here" , "output.csv")

Another clean way to save the file is this:

import csv
import urllib

urllib.retrieve("your url goes here" , "output.csv")

在python中读取文件的前N行

问题:在python中读取文件的前N行

我们有一个很大的原始数据文件,我们希望将其修剪到指定的大小。我在.net c#方面经验丰富,但是想在python中做到这一点,以简化事情,并且不感兴趣。

我将如何在python中获取文本文件的前N行?使用的操作系统会对实施产生影响吗?

We have a large raw data file that we would like to trim to a specified size. I am experienced in .net c#, however would like to do this in python to simplify things and out of interest.

How would I go about getting the first N lines of a text file in python? Will the OS being used have any effect on the implementation?


回答 0

Python 2

with open("datafile") as myfile:
    head = [next(myfile) for x in xrange(N)]
print head

Python 3

with open("datafile") as myfile:
    head = [next(myfile) for x in range(N)]
print(head)

这是另一种方式(Python 2和3)

from itertools import islice
with open("datafile") as myfile:
    head = list(islice(myfile, N))
print head

Python 2

with open("datafile") as myfile:
    head = [next(myfile) for x in xrange(N)]
print head

Python 3

with open("datafile") as myfile:
    head = [next(myfile) for x in range(N)]
print(head)

Here’s another way (both Python 2 & 3)

from itertools import islice
with open("datafile") as myfile:
    head = list(islice(myfile, N))
print head

回答 1

N = 10
with open("file.txt", "a") as file:  # the a opens it in append mode
    for i in range(N):
        line = next(file).strip()
        print(line)
N = 10
with open("file.txt", "a") as file:  # the a opens it in append mode
    for i in range(N):
        line = next(file).strip()
        print(line)

回答 2

如果您想快速阅读第一行而又不关心性能,则可以使用.readlines()返回列表对象然后对列表进行切片的方法。

例如前5行:

with open("pathofmyfileandfileandname") as myfile:
    firstNlines=myfile.readlines()[0:5] #put here the interval you want

注意:整个文件都是读取的,因此从性能的角度来看并不是最好的,但是它易于使用,编写速度快且易于记忆,因此如果您只想执行一次一次性计算就非常方便

print firstNlines

与其他答案相比,一个优点是可以轻松选择行范围,例如跳过前10行[10:30]或后10行或[:-10]仅采用偶数行[::2]

If you want to read the first lines quickly and you don’t care about performance you can use .readlines() which returns list object and then slice the list.

E.g. for the first 5 lines:

with open("pathofmyfileandfileandname") as myfile:
    firstNlines=myfile.readlines()[0:5] #put here the interval you want

Note: the whole file is read so is not the best from the performance point of view but it is easy to use, fast to write and easy to remember so if you want just perform some one-time calculation is very convenient

print firstNlines

One advantage compared to the other answers is the possibility to select easily the range of lines e.g. skipping the first 10 lines [10:30] or the lasts 10 [:-10] or taking only even lines [::2].


回答 3

我要做的是使用调用N行pandas。我认为性能不是最好的,但是例如N=1000

import pandas as pd
yourfile = pd.read('path/to/your/file.csv',nrows=1000)

What I do is to call the N lines using pandas. I think the performance is not the best, but for example if N=1000:

import pandas as pd
yourfile = pd.read_csv('path/to/your/file.csv',nrows=1000)

回答 4

没有读取文件对象公开的行数的特定方法。

我想最简单的方法如下:

lines =[]
with open(file_name) as f:
    lines.extend(f.readline() for i in xrange(N))

There is no specific method to read number of lines exposed by file object.

I guess the easiest way would be following:

lines =[]
with open(file_name) as f:
    lines.extend(f.readline() for i in xrange(N))

回答 5

基于gnibbler最高投票的答案(09年11月20日,0:27):此类将head()和tail()方法添加到文件对象。

class File(file):
    def head(self, lines_2find=1):
        self.seek(0)                            #Rewind file
        return [self.next() for x in xrange(lines_2find)]

    def tail(self, lines_2find=1):  
        self.seek(0, 2)                         #go to end of file
        bytes_in_file = self.tell()             
        lines_found, total_bytes_scanned = 0, 0
        while (lines_2find+1 > lines_found and
               bytes_in_file > total_bytes_scanned): 
            byte_block = min(1024, bytes_in_file-total_bytes_scanned)
            self.seek(-(byte_block+total_bytes_scanned), 2)
            total_bytes_scanned += byte_block
            lines_found += self.read(1024).count('\n')
        self.seek(-total_bytes_scanned, 2)
        line_list = list(self.readlines())
        return line_list[-lines_2find:]

用法:

f = File('path/to/file', 'r')
f.head(3)
f.tail(3)

Based on gnibbler top voted answer (Nov 20 ’09 at 0:27): this class add head() and tail() method to file object.

class File(file):
    def head(self, lines_2find=1):
        self.seek(0)                            #Rewind file
        return [self.next() for x in xrange(lines_2find)]

    def tail(self, lines_2find=1):  
        self.seek(0, 2)                         #go to end of file
        bytes_in_file = self.tell()             
        lines_found, total_bytes_scanned = 0, 0
        while (lines_2find+1 > lines_found and
               bytes_in_file > total_bytes_scanned): 
            byte_block = min(1024, bytes_in_file-total_bytes_scanned)
            self.seek(-(byte_block+total_bytes_scanned), 2)
            total_bytes_scanned += byte_block
            lines_found += self.read(1024).count('\n')
        self.seek(-total_bytes_scanned, 2)
        line_list = list(self.readlines())
        return line_list[-lines_2find:]

Usage:

f = File('path/to/file', 'r')
f.head(3)
f.tail(3)

回答 6

做到这一点的两种最直观的方法是:

  1. 迭代对文件中的行由行和breakN线。

  2. 使用next()方法N时间逐行迭代文件。(这实际上是最佳答案的语法不同。)

这是代码:

# Method 1:
with open("fileName", "r") as f:
    counter = 0
    for line in f:
        print line
        counter += 1
        if counter == N: break

# Method 2:
with open("fileName", "r") as f:
    for i in xrange(N):
        line = f.next()
        print line

底线是,只要您不使用整个文件readlines()enumerate将其放入内存中,您就有很多选择。

The two most intuitive ways of doing this would be:

  1. Iterate on the file line-by-line, and break after N lines.

  2. Iterate on the file line-by-line using the next() method N times. (This is essentially just a different syntax for what the top answer does.)

Here is the code:

# Method 1:
with open("fileName", "r") as f:
    counter = 0
    for line in f:
        print line
        counter += 1
        if counter == N: break

# Method 2:
with open("fileName", "r") as f:
    for i in xrange(N):
        line = f.next()
        print line

The bottom line is, as long as you don’t use readlines() or enumerateing the whole file into memory, you have plenty of options.


回答 7

我自己最方便的方法:

LINE_COUNT = 3
print [s for (i, s) in enumerate(open('test.txt')) if i < LINE_COUNT]

基于列表理解的解决方案 的函数open()支持迭代接口。enumerate()覆盖open()并返回元组(索引,项目),然后我们检查是否在可接受的范围内(如果i <LINE_COUNT),然后简单地打印结果。

享受Python。;)

most convinient way on my own:

LINE_COUNT = 3
print [s for (i, s) in enumerate(open('test.txt')) if i < LINE_COUNT]

Solution based on List Comprehension The function open() supports an iteration interface. The enumerate() covers open() and return tuples (index, item), then we check that we’re inside an accepted range (if i < LINE_COUNT) and then simply print the result.

Enjoy the Python. ;)


回答 8

对于前5行,只需执行以下操作:

N=5
with open("data_file", "r") as file:
    for i in range(N):
       print file.next()

For first 5 lines, simply do:

N=5
with open("data_file", "r") as file:
    for i in range(N):
       print file.next()

回答 9

如果您希望某些东西(无需查找手册中深奥的东西)显然不需要导入和try / except即可工作,并且可以在各种Python 2.x版本(2.2至2.6)上工作:

def headn(file_name, n):
    """Like *x head -N command"""
    result = []
    nlines = 0
    assert n >= 1
    for line in open(file_name):
        result.append(line)
        nlines += 1
        if nlines >= n:
            break
    return result

if __name__ == "__main__":
    import sys
    rval = headn(sys.argv[1], int(sys.argv[2]))
    print rval
    print len(rval)

If you want something that obviously (without looking up esoteric stuff in manuals) works without imports and try/except and works on a fair range of Python 2.x versions (2.2 to 2.6):

def headn(file_name, n):
    """Like *x head -N command"""
    result = []
    nlines = 0
    assert n >= 1
    for line in open(file_name):
        result.append(line)
        nlines += 1
        if nlines >= n:
            break
    return result

if __name__ == "__main__":
    import sys
    rval = headn(sys.argv[1], int(sys.argv[2]))
    print rval
    print len(rval)

回答 10

如果文件很大,并且假设您希望输出为numpy数组,则使用np.genfromtxt将冻结您的计算机。根据我的经验,这要好得多:

def load_big_file(fname,maxrows):
'''only works for well-formed text file of space-separated doubles'''

rows = []  # unknown number of lines, so use list

with open(fname) as f:
    j=0        
    for line in f:
        if j==maxrows:
            break
        else:
            line = [float(s) for s in line.split()]
            rows.append(np.array(line, dtype = np.double))
            j+=1
return np.vstack(rows)  # convert list of vectors to array

If you have a really big file, and assuming you want the output to be a numpy array, using np.genfromtxt will freeze your computer. This is so much better in my experience:

def load_big_file(fname,maxrows):
'''only works for well-formed text file of space-separated doubles'''

rows = []  # unknown number of lines, so use list

with open(fname) as f:
    j=0        
    for line in f:
        if j==maxrows:
            break
        else:
            line = [float(s) for s in line.split()]
            rows.append(np.array(line, dtype = np.double))
            j+=1
return np.vstack(rows)  # convert list of vectors to array

回答 11

从Python 2.6开始,您可以在IO基本类中利用更复杂的功能。因此,上面评分最高的答案可以重写为:

    with open("datafile") as myfile:
       head = myfile.readlines(N)
    print head

(您不必担心文件少于N行,因为不会引发StopIteration异常。)

Starting at Python 2.6, you can take advantage of more sophisticated functions in the IO base clase. So the top rated answer above can be rewritten as:

    with open("datafile") as myfile:
       head = myfile.readlines(N)
    print head

(You don’t have to worry about your file having less than N lines since no StopIteration exception is thrown.)


回答 12

这对我有用

f = open("history_export.csv", "r")
line= 5
for x in range(line):
    a = f.readline()
    print(a)

This worked for me

f = open("history_export.csv", "r")
line= 5
for x in range(line):
    a = f.readline()
    print(a)

回答 13

这适用于Python 2和3:

from itertools import islice

with open('/tmp/filename.txt') as inf:
    for line in islice(inf, N, N+M):
        print(line)

This works for Python 2 & 3:

from itertools import islice

with open('/tmp/filename.txt') as inf:
    for line in islice(inf, N, N+M):
        print(line)

回答 14


fname = input("Enter file name: ")
num_lines = 0

with open(fname, 'r') as f: #lines count
    for line in f:
        num_lines += 1

num_lines_input = int (input("Enter line numbers: "))

if num_lines_input <= num_lines:
    f = open(fname, "r")
    for x in range(num_lines_input):
        a = f.readline()
        print(a)

else:
    f = open(fname, "r")
    for x in range(num_lines_input):
        a = f.readline()
        print(a)
        print("Don't have", num_lines_input, " lines print as much as you can")


print("Total lines in the text",num_lines)

fname = input("Enter file name: ")
num_lines = 0

with open(fname, 'r') as f: #lines count
    for line in f:
        num_lines += 1

num_lines_input = int (input("Enter line numbers: "))

if num_lines_input <= num_lines:
    f = open(fname, "r")
    for x in range(num_lines_input):
        a = f.readline()
        print(a)

else:
    f = open(fname, "r")
    for x in range(num_lines_input):
        a = f.readline()
        print(a)
        print("Don't have", num_lines_input, " lines print as much as you can")


print("Total lines in the text",num_lines)


回答 15

#!/usr/bin/python

import subprocess

p = subprocess.Popen(["tail", "-n 3", "passlist"], stdout=subprocess.PIPE)

output, err = p.communicate()

print  output

这种方法对我有用

#!/usr/bin/python

import subprocess

p = subprocess.Popen(["tail", "-n 3", "passlist"], stdout=subprocess.PIPE)

output, err = p.communicate()

print  output

This Method Worked for me