访问包子目录中的数据

问题:访问包子目录中的数据

我正在编写一个python软件包,其中包含需要在./data/子目录中打开数据文件的模块。现在,我已经将文件的路径硬编码到了我的类和函数中。我想编写更健壮的代码,无论子目录在用户系统上的安装位置如何,都可以访问该子目录。

我尝试了多种方法,但是到目前为止,我还没有运气。似乎大多数“当前目录”命令返回系统的python解释器的目录,而不是模块的目录。

看来这应该是一个微不足道的普遍问题。但是我似乎无法弄清楚。问题的部分原因是我的数据文件不是.py文件,因此我不能使用导入功能等。

有什么建议?

现在,我的包目录如下所示:

/
__init__.py
module1.py
module2.py
data/   
   data.txt

我试图访问data.txt距离module*.py

I am writing a python package with modules that need to open data files in a ./data/ subdirectory. Right now I have the paths to the files hardcoded into my classes and functions. I would like to write more robust code that can access the subdirectory regardless of where it is installed on the user’s system.

I’ve tried a variety of methods, but so far I have had no luck. It seems that most of the “current directory” commands return the directory of the system’s python interpreter, and not the directory of the module.

This seems like it ought to be a trivial, common problem. Yet I can’t seem to figure it out. Part of the problem is that my data files are not .py files, so I can’t use import functions and the like.

Any suggestions?

Right now my package directory looks like:

/
__init__.py
module1.py
module2.py
data/   
   data.txt

I am trying to access data.txt from module*.py!


回答 0

您可以使用__file__获取包的路径,如下所示:

import os
this_dir, this_filename = os.path.split(__file__)
DATA_PATH = os.path.join(this_dir, "data", "data.txt")
print open(DATA_PATH).read()

You can use __file__ to get the path to the package, like this:

import os
this_dir, this_filename = os.path.split(__file__)
DATA_PATH = os.path.join(this_dir, "data", "data.txt")
print open(DATA_PATH).read()

回答 1

执行此操作的标准方法是使用setuptools软件包和pkg_resources。

您可以按照以下层次结构布置软件包,并按照以下链接配置软件包设置文件以将其指向您的数据资源:

http://docs.python.org/distutils/setupscript.html#installing-package-data

然后,您可以按照以下链接使用pkg_resources重新查找和使用这些文件:

http://peak.telecommunity.com/DevCenter/PkgResources#basic-resource-access

import pkg_resources

DATA_PATH = pkg_resources.resource_filename('<package name>', 'data/')
DB_FILE = pkg_resources.resource_filename('<package name>', 'data/sqlite.db')

The standard way to do this is with setuptools packages and pkg_resources.

You can lay out your package according to the following hierarchy, and configure the package setup file to point it your data resources, as per this link:

http://docs.python.org/distutils/setupscript.html#installing-package-data

You can then re-find and use those files using pkg_resources, as per this link:

http://peak.telecommunity.com/DevCenter/PkgResources#basic-resource-access

import pkg_resources

DATA_PATH = pkg_resources.resource_filename('<package name>', 'data/')
DB_FILE = pkg_resources.resource_filename('<package name>', 'data/sqlite.db')

回答 2

提供今天可以使用的解决方案。绝对使用此API不会重塑所有这些轮子。

需要一个真实的文件系统文件名。压缩的鸡蛋将被提取到缓存目录中:

from pkg_resources import resource_filename, Requirement

path_to_vik_logo = resource_filename(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png")

返回指定资源的可读文件状对象;它可能是实际文件,StringIO或某些类似的对象。从某种意义上说,该流处于“二进制模式”,即资源中的任何字节都将按原样读取。

from pkg_resources import resource_stream, Requirement

vik_logo_as_stream = resource_stream(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png")

使用pkg_resources进行软件包发现和资源访问

To provide a solution working today. Definitely use this API to not reinvent all those wheels.

A true filesystem filename is needed. Zipped eggs will be extracted to a cache directory:

from pkg_resources import resource_filename, Requirement

path_to_vik_logo = resource_filename(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png")

Return a readable file-like object for the specified resource; it may be an actual file, a StringIO, or some similar object. The stream is in “binary mode”, in the sense that whatever bytes are in the resource will be read as-is.

from pkg_resources import resource_stream, Requirement

vik_logo_as_stream = resource_stream(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png")

Package Discovery and Resource Access using pkg_resources


回答 3

做出详细的代码无法按原样工作的答案通常是没有意义的,但是我认为这是一个exceptions。Python 3.7添加importlib.resources了应该替换的pkg_resources。它可以用于访问名称中没有斜杠的软件包中的文件,即

foo/
    __init__.py
    module1.py
    module2.py
    data/   
       data.txt
    data2.txt

即您可以使用例如访问data2.txt内部软件包foo

importlib.resources.open_binary('foo', 'data2.txt')

但是它会失败,但有一个exceptions

>>> importlib.resources.open_binary('foo', 'data/data.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/importlib/resources.py", line 87, in open_binary
    resource = _normalize_path(resource)
  File "/usr/lib/python3.7/importlib/resources.py", line 61, in _normalize_path
    raise ValueError('{!r} must be only a file name'.format(path))
ValueError: 'data/data2.txt' must be only a file name

这不能被固定,除了通过将__init__.pydata再使用它作为一个包:

importlib.resources.open_binary('foo.data', 'data.txt')

这种行为的原因是“这是设计使然”;但是设计可能会改变

There is often not point in making an answer that details code that does not work as is, but I believe this to be an exception. Python 3.7 added importlib.resources that is supposed to replace pkg_resources. It would work for accessing files within packages that do not have slashes in their names, i.e.

foo/
    __init__.py
    module1.py
    module2.py
    data/   
       data.txt
    data2.txt

i.e. you could access data2.txt inside package foo with for example

importlib.resources.open_binary('foo', 'data2.txt')

but it would fail with an exception for

>>> importlib.resources.open_binary('foo', 'data/data.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/importlib/resources.py", line 87, in open_binary
    resource = _normalize_path(resource)
  File "/usr/lib/python3.7/importlib/resources.py", line 61, in _normalize_path
    raise ValueError('{!r} must be only a file name'.format(path))
ValueError: 'data/data2.txt' must be only a file name

This cannot be fixed except by placing __init__.py in data and then using it as a package:

importlib.resources.open_binary('foo.data', 'data.txt')

The reason for this behaviour is “it is by design”; but the design might change


回答 4

您需要为整个模块命名,目录树没有列出详细信息,对我来说这是可行的:

import pkg_resources
print(    
    pkg_resources.resource_filename(__name__, 'data/data.txt')
)

值得注意的是,setuptools似乎无法基于与打包数据文件匹配的名称来解析文件,所以无论如何,您都必须data/几乎包含前缀。os.path.join('data', 'data.txt)如果需要备用目录分隔符,可以使用。通常,我发现硬编码的unix样式目录分隔符没有兼容性问题。

You need a name for your whole module, you’re given directory tree doesn’t list that detail, for me this worked:

import pkg_resources
print(    
    pkg_resources.resource_filename(__name__, 'data/data.txt')
)

Notibly setuptools does not appear to resolve files based on a name match with packed data files, soo you’re gunna have to include the data/ prefix pretty much no matter what. You can use os.path.join('data', 'data.txt) if you need alternate directory separators, Generally I find no compatibility problems with hard-coded unix style directory separators though.


回答 5

我想我找到了答案。

我创建一个模块data_path.py,将其导入其他包含以下内容的模块:

data_path = os.path.join(os.path.dirname(__file__),'data')

然后我用打开所有文件

open(os.path.join(data_path,'filename'), <param>)

I think I hunted down an answer.

I make a module data_path.py, which I import into my other modules containing:

data_path = os.path.join(os.path.dirname(__file__),'data')

And then I open all my files with

open(os.path.join(data_path,'filename'), <param>)