Python 实用宝典

Question 1

Could you explain to me what the difference is between calling

python -m mymod1 mymod2.py args

and

python mymod1.py mymod2.py args

It seems in both cases mymod1.py is called and sys.argv is

['mymod1.py', 'mymod2.py', 'args']

So what is the -m switch for?

Question 2

The first line of the Rationale section of PEP 338 says:

Python 2.4 adds the command line switch -m to allow modules to be located using the Python module namespace for execution as scripts. The motivating examples were standard library modules such as pdb and profile, and the Python 2.4 implementation is fine for this limited purpose.

So you can specify any module in Python’s search path this way, not just files in the current directory. You’re correct that python mymod1.py mymod2.py args has exactly the same effect. The first line of the Scope of this proposal section states:

In Python 2.4, a module located using -m is executed just as if its filename had been provided on the command line.

With -m more is possible, like working with modules which are part of a package, etc. That’s what the rest of PEP 338 is about. Read it for more info.

Question 3

It’s worth mentioning this only works if the package has a file __main__.py Otherwise, this package can not be executed directly.

python -m some_package some_arguments

The python interpreter will looking for a __main__.py file in the package path to execute. It’s equivalent to:

python path_to_package/__main__.py somearguments

It will execute the content after:

if __name__ == "__main__":

Question 4

Despite this question having been asked and answered several times (e.g., here, here, here, and here) in my opinion no existing answer fully or concisely captures all the implications of the -m flag. Therefore, the following will attempt to improve on what has come before.

Introduction (TLDR)

The -m flag does a lot of things, not all of which will be needed all the time. In short it can be used to: (1) execute python code from the command line via modulename rather than filename (2) add a directory to sys.path for use in import resolution and (3) execute python code that contains relative imports from the command line.

Preliminaries

To explain the -m flag we first need to explain a little terminology.

Python’s primary organizational unit is known as a module. Module’s come in one of two flavors: code modules and package modules. A code module is any file that contains python executable code. A package module is a directory that contains other modules (either code modules or package modules). The most common type of code modules are *.py files while the most common type of package modules are directories containing an __init__.py file.

Python allows modules to be uniquely identified in two distinct ways: modulename and filename. In general, modules are identified by modulename in Python code (e.g., import <modulename>) and by filename on the command line (e.g., python <filename>). All python interpreters are able to convert modulenames to filenames by following the same few, well-defined rules. These rules hinge on the sys.path variable. By altering this variable one can change how Python resolves modulenames into filenames (for more on how this is done see PEP 302).

All modules (both code and package) can be executed (i.e., code associated with the module will be evaluated by the Python interpreter). Depending on the execution method (and module type) what code gets evaluated, and when, can change quite a bit. For example, if one executes a package module via python <filename> then <filename>/__init__.py will be evaluated followed by <filename>/__main__.py. On the other hand, if one executes that same package module via import <modulename> then only the package’s __init__.py will be executed.

Historical Development of `-m`

The -m flag was first introduced in Python 2.4.1. Initially its only purpose was to provide an alternative means of identifying the python module to execute from the command line. That is, if we knew both the <filename> and <modulename> for a module then the following two commands were equivalent: python <filename> <args> and python -m <modulename> <args>. One constraint with this iteration, according to PEP 338, was that -m only worked with top level modulenames (i.e., modules that could be found directly on sys.path without any intervening package modules).

With the completion of PEP 338 the -m feature was extended to support <modulename> representations beyond the top level. This meant names such as http.server were now fully supported. This extension also meant that each parent package in modulename was now evaluated (i.e., all parent package __init__.py files were evaluated) in addition to the module referenced by the modulename itself.

The final major feature enhancement for -m came with PEP 366. With this upgrade -m gained the ability to support not only absolute imports but also explicit relative imports when executing modules. This was achieved by changing -m so that it set the __package__ variable to the parent module of the given modulename (in addition to everything else it already did).

Use Cases

There are two notable use cases for the -m flag:

To execute modules from the command line for which one may not know their filename. This use case takes advantage of the fact that the Python interpreter knows how to convert modulenames to filenames. This is particularly advantageous when one wants to run stdlib modules or 3rd-party module from the command line. For example, very few people know the filename for the http.server module but most people do know its modulename so we can execute it from the command line using python -m http.server.
To execute a local package containing absolute or relative imports without needing to install it. This use case is detailed in PEP 338 and leverages the fact that the current working directory is added to sys.path rather than the module’s directory. This use case is very similar to using pip install -e . to install a package in develop/edit mode.

Shortcomings

With all the enhancements made to -m over the years it still has one major shortcoming — it can only execute modules written in Python (i.e., *.py). For example, if -m is used to execute a C compiled code module the following error will be produced, No code object available for <modulename> (see here for more details).

Detailed Comparisons

Effects of module execution via import statement (i.e., import <modulename>):

sys.path is not modified in any way
__name__ is set to the absolute form of <modulename>
__package__ is set to the immediate parent package in <modulename>
__init__.py is evaluated for all packages (including its own for package modules)
__main__.py is not evaluated for package modules; the code is evaluated for code modules

Effects of module execution via command line (i.e., python <filename>):

sys.path is modified to include the final directory in <filename>
__name__ is set to '__main__'
__package__ is set to None
__init__.py is not evaluated for any package (including its own for package modules)
__main__.py is evaluated for package modules; the code is evaluated for code modules.

Effects of module execution via command line with the -m flag (i.e., python -m <modulename>):

sys.path is modified to include the current directory
__name__ is set to '__main__'
__package__ is set to the immediate parent package in <modulename>
__init__.py is evaluated for all packages (including its own for package modules)
__main__.py is evaluated for package modules; the code is evaluated for code modules

Conclusion

The -m flag is, at its simplest, a means to execute python scripts from the command line by using modulenames rather than filenames. The real power of -m, however, is in its ability to combine the power of import statements (e.g., support for explicit relative imports and automatic package __init__ evaluation) with the convenience of the command line.

Question 5

I am writing a python package with modules that need to open data files in a ./data/ subdirectory. Right now I have the paths to the files hardcoded into my classes and functions. I would like to write more robust code that can access the subdirectory regardless of where it is installed on the user’s system.

I’ve tried a variety of methods, but so far I have had no luck. It seems that most of the “current directory” commands return the directory of the system’s python interpreter, and not the directory of the module.

This seems like it ought to be a trivial, common problem. Yet I can’t seem to figure it out. Part of the problem is that my data files are not .py files, so I can’t use import functions and the like.

Any suggestions?

Right now my package directory looks like:

/
__init__.py
module1.py
module2.py
data/   
   data.txt

I am trying to access data.txt from module*.py!

Question 6

You can use __file__ to get the path to the package, like this:

import os
this_dir, this_filename = os.path.split(__file__)
DATA_PATH = os.path.join(this_dir, "data", "data.txt")
print open(DATA_PATH).read()

Question 7

The standard way to do this is with setuptools packages and pkg_resources.

You can lay out your package according to the following hierarchy, and configure the package setup file to point it your data resources, as per this link:

http://docs.python.org/distutils/setupscript.html#installing-package-data

You can then re-find and use those files using pkg_resources, as per this link:

http://peak.telecommunity.com/DevCenter/PkgResources#basic-resource-access

import pkg_resources

DATA_PATH = pkg_resources.resource_filename('<package name>', 'data/')
DB_FILE = pkg_resources.resource_filename('<package name>', 'data/sqlite.db')

Question 8

To provide a solution working today. Definitely use this API to not reinvent all those wheels.

A true filesystem filename is needed. Zipped eggs will be extracted to a cache directory:

from pkg_resources import resource_filename, Requirement

path_to_vik_logo = resource_filename(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png")

Return a readable file-like object for the specified resource; it may be an actual file, a StringIO, or some similar object. The stream is in “binary mode”, in the sense that whatever bytes are in the resource will be read as-is.

from pkg_resources import resource_stream, Requirement

vik_logo_as_stream = resource_stream(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png")

Package Discovery and Resource Access using pkg_resources

Question 9

There is often not point in making an answer that details code that does not work as is, but I believe this to be an exception. Python 3.7 added importlib.resources that is supposed to replace pkg_resources. It would work for accessing files within packages that do not have slashes in their names, i.e.

foo/
    __init__.py
    module1.py
    module2.py
    data/   
       data.txt
    data2.txt

i.e. you could access data2.txt inside package foo with for example

importlib.resources.open_binary('foo', 'data2.txt')

but it would fail with an exception for

>>> importlib.resources.open_binary('foo', 'data/data.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/importlib/resources.py", line 87, in open_binary
    resource = _normalize_path(resource)
  File "/usr/lib/python3.7/importlib/resources.py", line 61, in _normalize_path
    raise ValueError('{!r} must be only a file name'.format(path))
ValueError: 'data/data2.txt' must be only a file name

This cannot be fixed except by placing __init__.py in data and then using it as a package:

importlib.resources.open_binary('foo.data', 'data.txt')

The reason for this behaviour is “it is by design”; but the design might change…

Question 10

You need a name for your whole module, you’re given directory tree doesn’t list that detail, for me this worked:

import pkg_resources
print(    
    pkg_resources.resource_filename(__name__, 'data/data.txt')
)

Notibly setuptools does not appear to resolve files based on a name match with packed data files, soo you’re gunna have to include the data/ prefix pretty much no matter what. You can use os.path.join('data', 'data.txt) if you need alternate directory separators, Generally I find no compatibility problems with hard-coded unix style directory separators though.

Question 11

I think I hunted down an answer.

I make a module data_path.py, which I import into my other modules containing:

data_path = os.path.join(os.path.dirname(__file__),'data')

And then I open all my files with

open(os.path.join(data_path,'filename'), <param>)

Question 12

What’s a good way to check if a package is installed while within a Python script? I know it’s easy from the interpreter, but I need to do it within a script.

I guess I could check if there’s a directory on the system that’s created during the installation, but I feel like there’s a better way. I’m trying to make sure the Skype4Py package is installed, and if not I’ll install it.

My ideas for accomplishing the check

check for a directory in the typical install path
try to import the package and if an exception is throw, then install package

Question 13

If you mean a python script, just do something like this:

Python 3.3+ use sys.modules and find_spec:

import importlib.util
import sys

# For illustrative purposes.
name = 'itertools'

if name in sys.modules:
    print(f"{name!r} already in sys.modules")
elif (spec := importlib.util.find_spec(name)) is not None:
    # If you choose to perform the actual import ...
    module = importlib.util.module_from_spec(spec)
    sys.modules[name] = module
    spec.loader.exec_module(module)
    print(f"{name!r} has been imported")
else:
    print(f"can't find the {name!r} module")

Python 3:

try:
    import mymodule
except ImportError as e:
    pass  # module doesn't exist, deal with it.

Python 2:

try:
    import mymodule
except ImportError, e:
    pass  # module doesn't exist, deal with it.

Question 14

Updated answer

A better way of doing this is:

import subprocess
import sys

reqs = subprocess.check_output([sys.executable, '-m', 'pip', 'freeze'])
installed_packages = [r.decode().split('==')[0] for r in reqs.split()]

The result:

print(installed_packages)

[
    "Django",
    "six",
    "requests",
]

Check if requests is installed:

if 'requests' in installed_packages:
    # Do something

Why this way? Sometimes you have app name collisions. Importing from the app namespace doesn’t give you the full picture of what’s installed on the system.

Note, that proposed solution works:

When using pip to install from PyPI or from any other alternative source (like pip install http://some.site/package-name.zip or any other archive type).
When installing manually using python setup.py install.
When installing from system repositories, like sudo apt install python-requests.

Cases when it might not work:

When installing in development mode, like python setup.py develop.
When installing in development mode, like pip install -e /path/to/package/source/.

Old answer

A better way of doing this is:

import pip
installed_packages = pip.get_installed_distributions()

For pip>=10.x use:

from pip._internal.utils.misc import get_installed_distributions

Why this way? Sometimes you have app name collisions. Importing from the app namespace doesn’t give you the full picture of what’s installed on the system.

As a result, you get a list of pkg_resources.Distribution objects. See the following as an example:

print installed_packages
[
    "Django 1.6.4 (/path-to-your-env/lib/python2.7/site-packages)",
    "six 1.6.1 (/path-to-your-env/lib/python2.7/site-packages)",
    "requests 2.5.0 (/path-to-your-env/lib/python2.7/site-packages)",
]

Make a list of it:

flat_installed_packages = [package.project_name for package in installed_packages]

[
    "Django",
    "six",
    "requests",
]

Check if requests is installed:

if 'requests' in flat_installed_packages:
    # Do something

Question 15

As of Python 3.3, you can use the find_spec() method

import importlib.util

# For illustrative purposes.
package_name = 'pandas'

spec = importlib.util.find_spec(package_name)
if spec is None:
    print(package_name +" is not installed")

Question 16

If you want to have the check from the terminal, you can run

pip3 show package_name

and if nothing is returned, the package is not installed.

If perhaps you want to automate this check, so that for example you can install it if missing, you can have the following in your bash script:

pip3 show package_name 1>/dev/null #pip for Python 2
if [ $? == 0 ]; then
   echo "Installed" #Replace with your actions
else
   echo "Not Installed" #Replace with your actions, 'pip3 install --upgrade package_name' ?
fi

Question 17

As an extension of this answer:

For Python 2.*, pip show <package_name> will perform the same task.

For example pip show numpy will return the following or alike:

Name: numpy
Version: 1.11.1
Summary: NumPy: array processing for numbers, strings, records, and objects.
Home-page: http://www.numpy.org
Author: NumPy Developers
Author-email: numpy-discussion@scipy.org
License: BSD
Location: /home/***/anaconda2/lib/python2.7/site-packages
Requires: 
Required-by: smop, pandas, tables, spectrum, seaborn, patsy, odo, numpy-stl, numba, nfft, netCDF4, MDAnalysis, matplotlib, h5py, GridDataFormats, dynd, datashape, Bottleneck, blaze, astropy

Question 18

You can use the pkg_resources module from setuptools. For example:

import pkg_resources

package_name = 'cool_package'
try:
    cool_package_dist_info = pkg_resources.get_distribution(package_name)
except pkg_resources.DistributionNotFound:
    print('{} not installed'.format(package_name))
else:
    print(cool_package_dist_info)

Note that there is a difference between python module and a python package. A package can contain multiple modules and module’s names might not match the package name.

Question 19

Open your command prompt type

pip3 list

Question 20

I’d like to add some thoughts/findings of mine to this topic. I’m writing a script that checks all requirements for a custom made program. There are many checks with python modules too.

There’s a little issue with the

try:
   import ..
except:
   ..

solution. In my case one of the python modules called python-nmap, but you import it with import nmap and as you see the names mismatch. Therefore the test with the above solution returns a False result, and it also imports the module on hit, but maybe no need to use a lot of memory for a simple test/check.

I also found that

import pip
installed_packages = pip.get_installed_distributions()

installed_packages will have only the packages has been installed with pip. On my system pip freeze returns over 40 python modules, while installed_packages has only 1, the one I installed manually (python-nmap).

Another solution below that I know it may not relevant to the question, but I think it’s a good practice to keep the test function separate from the one that performs the install it might be useful for some.

The solution that worked for me. It based on this answer How to check if a python module exists without importing it

from imp import find_module

def checkPythonmod(mod):
    try:
        op = find_module(mod)
        return True
    except ImportError:
        return False

NOTE: this solution can’t find the module by the name python-nmap too, I have to use nmap instead (easy to live with) but in this case the module won’t be loaded to the memory whatsoever.

Question 21

If you’d like your script to install missing packages and continue, you could do something like this (on example of ‘krbV’ module in ‘python-krbV’ package):

import pip
import sys

for m, pkg in [('krbV', 'python-krbV')]:
    try:
        setattr(sys.modules[__name__], m, __import__(m))
    except ImportError:
        pip.main(['install', pkg])
        setattr(sys.modules[__name__], m, __import__(m))

Question 22

A quick way is to use python command line tool. Simply type import <your module name> You see an error if module is missing.

$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
>>> import sys
>>> import jocker
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named jocker
$

Question 23

Hmmm … the closest I saw to a convenient answer was using the command line to try the import. But I prefer to even avoid that.

How about ‘pip freeze | grep pkgname’? I tried it and it works well. It also shows you the version it has and whether it is installed under version control (install) or editable (develop).

Question 24

You can use this:

class myError(exception):
 pass # Or do some thing like this.
try:
 import mymodule
except ImportError as e:
 raise myError("error was occurred")

Question 25

In the Terminal type

pip show some_package_name

Example

pip show matplotlib

Question 26

I would like to comment to @ice.nicer reply but I cannot, so … My observations is that packages with dashes are saved with underscores, not only with dots as pointed out by @dwich comment

For example, you do pip3 install sphinx-rtd-theme, but:

importlib.util.find_spec(sphinx_rtd_theme) returns an Object
importlib.util.find_spec(sphinx-rtd-theme) returns None
importlib.util.find_spec(sphinx.rtd.theme) raises ModuleNotFoundError

Moreover, some names are totally changed. For example, you do pip3 install pyyaml but it is saved simply as yaml

I am using python3.8

Question 27

if pip3 list | grep -sE '^some_command\s+[0-9]' >/dev/null
  # installed ...
else
  # not installed ...
fi

Question 28

Go option #2. If ImportError is thrown, then the package is not installed (or not in sys.path).

Question 29

当我执行“ pip install -e …”以从git repo安装时，我必须指定＃egg = somename或pip抱怨。例如：

pip install -e git://github.com/hiidef/oauth2app.git#egg=oauth2app

这个“蛋”字符串的意义是什么？

Question 30

When I do a “pip install -e …” to install from a git repo, I have to specify #egg=somename or pip complains. For example:

pip install -e git://github.com/hiidef/oauth2app.git#egg=oauth2app

What’s the significance of this “egg” string?

Question 31

每点安装-h“ egg”字符串是在安装过程中检出的目录

Question 32

per pip install -h the “egg” string is the directory that gets checked out as part of the install

Question 33

您必须包含＃egg = Package，这样pip才能知道该URL的期望值。参见https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support

更多鸡蛋

Question 34

You have to include #egg=Package so pip knows what to expect at that URL. See https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support

Execution of Python code with -m option or not

Use the -m flag.

The results are pretty much the same when you have a script, but when you develop a package, without the -m flag, there’s no way to get the imports to work correctly if you want to run a subpackage or module in the package as the main entry point to your program (and believe me, I’ve tried.)

The docs

Like the docs on the -m flag say:

Search sys.path for the named module and execute its contents as the __main__ module.

and

As with the -c option, the current directory will be added to the start of sys.path.

so

python -m pdb

is roughly equivalent to

python /usr/lib/python3.5/pdb.py

(assuming you don’t have a package or script in your current directory called pdb.py)

Explanation:

Behavior is made “deliberately similar to” scripts.

Many standard library modules contain code that is invoked on their execution as a script. An example is the timeit module:

Some python code is intended to be run as a module: (I think this example is better than the commandline option doc example)

$ python -m timeit '"-".join(str(n) for n in range(100))'
10000 loops, best of 3: 40.3 usec per loop
$ python -m timeit '"-".join([str(n) for n in range(100)])'
10000 loops, best of 3: 33.4 usec per loop
$ python -m timeit '"-".join(map(str, range(100)))'
10000 loops, best of 3: 25.2 usec per loop

And from the release note highlights for Python 2.4:

The -m command line option – python -m modulename will find a module in the standard library, and invoke it. For example, python -m pdb is equivalent to python /usr/lib/python2.4/pdb.py

Follow-up Question

Also, David Beazley’s Python Essential Reference explains it as “The -m option runs a library module as a script which executes inside the __main__ module prior to the execution of the main script”.

It means any module you can lookup with an import statement can be run as the entry point of the program – if it has a code block, usually near the end, with if __name__ == '__main__':.

`-m` without adding the current directory to the path:

A comment here elsewhere says:

That the -m option also adds the current directory to sys.path, is obviously a security issue (see: preload attack). This behavior is similar to library search order in Windows (before it had been hardened recently). It’s a pity that Python does not follow the trend and does not offer a simple way to disable adding . to sys.path

Well, this demonstrates the possible issue – (in windows remove the quotes):

echo "import sys; print(sys.version)" > pdb.py

python -m pdb
3.5.2 |Anaconda 4.1.1 (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]

Use the -I flag to lock this down for production environments (new in version 3.4):

python -Im pdb
usage: pdb.py [-c command] ... pyfile [arg] ...
etc...

from the docs:

-I

Run Python in isolated mode. This also implies -E and -s. In isolated mode sys.path contains neither the script’s directory nor the user’s site-packages directory. All PYTHON* environment variables are ignored, too. Further restrictions may be imposed to prevent the user from injecting malicious code.

What does `package` do?

It enables explicit relative imports, not particularly germane to this question, though – see this answer here: What’s the purpose of the “__package__” attribute in Python?

Question 42

The main reason to run a module (or package) as a script with -m is to simplify deployment, especially on Windows. You can install scripts in the same place in the Python library where modules normally go – instead of polluting PATH or global executable directories such as ~/.local (the per-user scripts directory is ridiculously hard to find in Windows).

Then you just type -m and Python finds the script automagically. For example, python -m pip will find the correct pip for the same instance of Python interpreter which executes it. Without -m, if user has several Python versions installed, which one would be the “global” pip?

If user prefers “classic” entry points for command-line scripts, these can be easily added as small scripts somewhere in PATH, or pip can create these at install time with entry_points parameter in setup.py.

So just check for __name__ == '__main__' and ignore other non-reliable implementation details.

Question 43

Could you tell me how can I read a file that is inside my Python package?

My situation

A package that I load has a number of templates (text files used as strings) that I want to load from within the program. But how do I specify the path to such file?

Imagine I want to read a file from:

package\templates\temp_file

Some kind of path manipulation? Package base path tracking?

Question 44

[added 2016-06-15: apparently this doesn’t work in all situations. please refer to the other answers]


import os, mypackage
template = os.path.join(mypackage.__path__[0], 'templates', 'temp_file')

Question 45

TLDR; Use standard-library’s `importlib.resources` module as explained in the method no 2, below.

The traditional pkg_resources from setuptools is not recommended anymore because the new method:

it is significantly more performant;
is is safer since the use of packages (instead of path-stings) raises compile-time errors;
it is more intuitive because you don’t have to “join” paths;
it is faster when developing since you don’t need an extra dependency (setuptools), but rely on Python’s standard-library alone.

I kept the traditional listed first, to explain the differences with the new method when porting existing code (porting also explained here).

Let’s assume your templates are located in a folder nested inside your module’s package:

  <your-package>
    +--<module-asking-the-file>
    +--templates/
          +--temp_file                         <-- We want this file.

Note 1: For sure, we should NOT fiddle with the __file__ attribute (e.g. code will break when served from a zip).

Note 2: If you are building this package, remember to declatre your data files as package_data or data_files in your setup.py.

1) Using `pkg_resources` from `setuptools`(slow)

You may use pkg_resources package from setuptools distribution, but that comes with a cost, performance-wise:

import pkg_resources

# Could be any dot-separated package/module name or a "Requirement"
resource_package = __name__
resource_path = '/'.join(('templates', 'temp_file'))  # Do not use os.path.join()
template = pkg_resources.resource_string(resource_package, resource_path)
# or for a file-like stream:
template = pkg_resources.resource_stream(resource_package, resource_path)

Tips:

This will read data even if your distribution is zipped, so you may set zip_safe=True in your setup.py, and/or use the long-awaited zipapp packer from python-3.5 to create self-contained distributions.

Remember to add setuptools into your run-time requirements (e.g. in install_requires`).

… and notice that according to the Setuptools/pkg_resources docs, you should not use os.path.join:

Basic Resource Access
Note that resource names must be /-separated paths and cannot be absolute (i.e. no leading /) or contain relative names like “..“. Do not use os.path routines to manipulate resource paths, as they are not filesystem paths.

2) Python >= 3.7, or using the backported `importlib_resources` library

Use the standard library’s importlib.resources module which is more efficient than setuptools, above:

try:
    import importlib.resources as pkg_resources
except ImportError:
    # Try backported to PY<37 `importlib_resources`.
    import importlib_resources as pkg_resources

from . import templates  # relative-import the *package* containing the templates

template = pkg_resources.read_text(templates, 'temp_file')
# or for a file-like stream:
template = pkg_resources.open_text(templates, 'temp_file')

Attention:

Regarding the function read_text(package, resource):

The package can be either a string or a module.

The resource is NOT a path anymore, but just the filename of the resource to open, within an existing package; it may not contain path separators and it may not have sub-resources (i.e. it cannot be a directory).

For the example asked in the question, we must now:

make the <your_package>/templates/ into a proper package, by creating an empty __init__.py file in it,
so now we can use a simple (possibly relative) import statement (no more parsing package/module names),
and simply ask for resource_name = "temp_file" (no path).

Tips:

To access a file inside the current module, set the package argument to __package__, e.g. pkg_resources.read_text(__package__, 'temp_file') (thanks to @ben-mares).

Things become interesting when an actual filename is asked with path(), since now context-managers are used for temporarily-created files (read this).

Add the backported library, conditionally for older Pythons, with install_requires=[" importlib_resources ; python_version<'3.7'"] (check this if you package your project with setuptools<36.2.1).

Remember to remove setuptools library from your runtime-requirements, if you migrated from the traditional method.

Remember to customize setup.py or MANIFEST to include any static files.

You may also set zip_safe=True in your setup.py.

Question 46

A packaging prelude:

Before you can even worry about reading resource files, the first step is to make sure that the data files are getting packaged into your distribution in the first place – it is easy to read them directly from the source tree, but the important part is making sure these resource files are accessible from code within an installed package.

Structure your project like this, putting data files into a subdirectory within the package:

.
├── package
│   ├── __init__.py
│   ├── templates
│   │   └── temp_file
│   ├── mymodule1.py
│   └── mymodule2.py
├── README.rst
├── MANIFEST.in
└── setup.py

You should pass include_package_data=True in the setup() call. The manifest file is only needed if you want to use setuptools/distutils and build source distributions. To make sure the templates/temp_file gets packaged for this example project structure, add a line like this into the manifest file:

recursive-include package *

Historical cruft note: Using a manifest file is not needed for modern build backends such as flit, poetry, which will include the package data files by default. So, if you’re using pyproject.toml and you don’t have a setup.py file then you can ignore all the stuff about MANIFEST.in.

Now, with packaging out of the way, onto the reading part…

Recommendation:

Use standard library pkgutil APIs. It’s going to look like this in library code:

# within package/mymodule1.py, for example
import pkgutil

data = pkgutil.get_data(__name__, "templates/temp_file")

It works in zips. It works on Python 2 and Python 3. It doesn’t require third-party dependencies. I’m not really aware of any downsides (if you are, then please comment on the answer).

Bad ways to avoid:

Bad way #1: using relative paths from a source file

This is currently the accepted answer. At best, it looks something like this:

from pathlib import Path

resource_path = Path(__file__).parent / "templates"
data = resource_path.joinpath("temp_file").read_bytes()

What’s wrong with that? The assumption that you have files and subdirectories available is not correct. This approach doesn’t work if executing code which is packed in a zip or a wheel, and it may be entirely out of the user’s control whether or not your package gets extracted to a filesystem at all.

Bad way #2: using pkg_resources APIs

This is described in the top-voted answer. It looks something like this:

from pkg_resources import resource_string

data = resource_string(__name__, "templates/temp_file")

What’s wrong with that? It adds a runtime dependency on setuptools, which should preferably be an install time dependency only. Importing and using pkg_resources can become really slow, as the code builds up a working set of all installed packages, even though you were only interested in your own package resources. That’s not a big deal at install time (since installation is once-off), but it’s ugly at runtime.

Bad way #3: using importlib.resources APIs

This is currently the recommendation in the top-voted answer. It’s a recent standard library addition (new in Python 3.7). It looks like this:

from importlib.resources import read_binary

data = read_binary("package.templates", "temp_file")

What’s wrong with that? Well, unfortunately, it doesn’t work…yet. This is still an incomplete API, using importlib.resources will require you to add an empty file templates/__init__.py in order that the data files will reside within a sub-package rather than in a subdirectory. It will also expose the package/templates subdirectory as an importable package.templates sub-package in its own right. If that’s not a big deal and it doesn’t bother you, then you can go ahead and add the __init__.py file there and use the import system to access resources. However, while you’re at it you may as well make it into a my_resources.py file instead, and just define some bytes or string variables in the module, then import them in Python code. It’s the import system doing the heavy lifting here either way.

Honorable mention: using newer importlib_resources APIs

This has not been mentioned in any other answers yet, but importlib_resources is more than a simple backport of the Python 3.7+ importlib.resources code. It has traversable APIs which you can use like this:

import importlib_resources

my_resources = importlib_resources.files("package")
data = (my_resources / "templates" / "temp_file").read_bytes()

This works on Python 2 and 3, it works in zips, and it doesn’t require spurious __init__.py files to be added in resource subdirectories. The only downside vs pkgutil that I can see is that these new APIs haven’t yet arrived in stdlib, so there is still a third-party dependency. Newer APIs from importlib_resources should arrive to stdlib importlib.resources in Python 3.9.

Example project:

I’ve created an example project on github and uploaded on PyPI, which demonstrates all five approaches discussed above. Try it out with:

$ pip install resources-example
$ resources-example

See https://github.com/wimglenn/resources-example for more info.

Question 47

In case you have this structure

lidtk
├── bin
│   └── lidtk
├── lidtk
│   ├── analysis
│   │   ├── char_distribution.py
│   │   └── create_cm.py
│   ├── classifiers
│   │   ├── char_dist_metric_train_test.py
│   │   ├── char_features.py
│   │   ├── cld2
│   │   │   ├── cld2_preds.txt
│   │   │   └── cld2wili.py
│   │   ├── get_cld2.py
│   │   ├── text_cat
│   │   │   ├── __init__.py
│   │   │   ├── README.md   <---------- say you want to get this
│   │   │   └── textcat_ngram.py
│   │   └── tfidf_features.py
│   ├── data
│   │   ├── __init__.py
│   │   ├── create_ml_dataset.py
│   │   ├── download_documents.py
│   │   ├── language_utils.py
│   │   ├── pickle_to_txt.py
│   │   └── wili.py
│   ├── __init__.py
│   ├── get_predictions.py
│   ├── languages.csv
│   └── utils.py
├── README.md
├── setup.cfg
└── setup.py

you need this code:

import pkg_resources

# __name__ in case you're within the package
# - otherwise it would be 'lidtk' in this example as it is the package name
path = 'classifiers/text_cat/README.md'  # always use slash
filepath = pkg_resources.resource_filename(__name__, path)

The strange “always use slash” part comes from setuptools APIs

Also notice that if you use paths, you must use a forward slash (/) as the path separator, even if you are on Windows. Setuptools automatically converts slashes to appropriate platform-specific separators at build time

In case you wonder where the documentation is:

Question 48

The content in “10.8. Reading Datafiles Within a Package” of Python Cookbook, Third Edition by David Beazley and Brian K. Jones giving the answers.

I’ll just get it to here:

Suppose you have a package with files organized as follows:

mypackage/
    __init__.py
    somedata.dat
    spam.py

Now suppose the file spam.py wants to read the contents of the file somedata.dat. To do it, use the following code:

import pkgutil
data = pkgutil.get_data(__package__, 'somedata.dat')

The resulting variable data will be a byte string containing the raw contents of the file.

The first argument to get_data() is a string containing the package name. You can either supply it directly or use a special variable, such as __package__. The second argument is the relative name of the file within the package. If necessary, you can navigate into different directories using standard Unix filename conventions as long as the final directory is still located within the package.

In this way, the package can installed as directory, .zip or .egg.

Question 49

Every python module in your package has a __file__ attribute

You can use it as:

import os 
from mypackage

templates_dir = os.path.join(os.path.dirname(mypackage.__file__), 'templates')
template_file = os.path.join(templates_dir, 'template.txt')

For egg resources see: http://peak.telecommunity.com/DevCenter/PythonEggs#accessing-package-resources

Question 50

assuming you are using an egg file; not extracted:

I “solved” this in a recent project, by using a postinstall script, that extracts my templates from the egg (zip file) to the proper directory in the filesystem. It was the quickest, most reliable solution I found, since working with __path__[0] can go wrong sometimes (i don’t recall the name, but i cam across at least one library, that added something in front of that list!).

Also egg files are usually extracted on the fly to a temporary location called the “egg cache”. You can change that location using an environment variable, either before starting your script or even later, eg.

os.environ['PYTHON_EGG_CACHE'] = path

However there is pkg_resources that might do the job properly.

Question 51

Is there a straightforward way to list the names of all modules in a package, without using __all__?

For example, given this package:

/testpkg
/testpkg/__init__.py
/testpkg/modulea.py
/testpkg/moduleb.py

I’m wondering if there is a standard or built-in way to do something like this:

>>> package_contents("testpkg")
['modulea', 'moduleb']

The manual approach would be to iterate through the module search paths in order to find the package’s directory. One could then list all the files in that directory, filter out the uniquely-named py/pyc/pyo files, strip the extensions, and return that list. But this seems like a fair amount of work for something the module import mechanism is already doing internally. Is that functionality exposed anywhere?

Question 52

Maybe this will do what you’re looking for?

import imp
import os
MODULE_EXTENSIONS = ('.py', '.pyc', '.pyo')

def package_contents(package_name):
    file, pathname, description = imp.find_module(package_name)
    if file:
        raise ImportError('Not a package: %r', package_name)
    # Use a set because some may be both source and compiled.
    return set([os.path.splitext(module)[0]
        for module in os.listdir(pathname)
        if module.endswith(MODULE_EXTENSIONS)])

Question 53

Using python2.3 and above, you could also use the pkgutil module:

>>> import pkgutil
>>> [name for _, name, _ in pkgutil.iter_modules(['testpkg'])]
['modulea', 'moduleb']

EDIT: Note that the parameter is not a list of modules, but a list of paths, so you might want to do something like this:

>>> import os.path, pkgutil
>>> import testpkg
>>> pkgpath = os.path.dirname(testpkg.__file__)
>>> print [name for _, name, _ in pkgutil.iter_modules([pkgpath])]

Question 54

import module
help(module)

Question 55

~~Don’t know if I’m overlooking something, or if the answers are just out-dated but;~~

As stated by user815423426 this only works for live objects and the listed modules are only modules that were imported before.

Listing modules in a package seems really easy using inspect:

>>> import inspect, testpkg
>>> inspect.getmembers(testpkg, inspect.ismodule)
['modulea', 'moduleb']

Question 56

This is a recursive version that works with python 3.6 and above:

import importlib.util
from pathlib import Path
import os
MODULE_EXTENSIONS = '.py'

def package_contents(package_name):
    spec = importlib.util.find_spec(package_name)
    if spec is None:
        return set()

    pathname = Path(spec.origin).parent
    ret = set()
    with os.scandir(pathname) as entries:
        for entry in entries:
            if entry.name.startswith('__'):
                continue
            current = '.'.join((package_name, entry.name.partition('.')[0]))
            if entry.is_file():
                if entry.name.endswith(MODULE_EXTENSIONS):
                    ret.add(current)
            elif entry.is_dir():
                ret.add(current)
                ret |= package_contents(current)


    return ret

Question 57

Based on cdleary’s example, here’s a recursive version listing path for all submodules:

import imp, os

def iter_submodules(package):
    file, pathname, description = imp.find_module(package)
    for dirpath, _, filenames in os.walk(pathname):
        for  filename in filenames:
            if os.path.splitext(filename)[1] == ".py":
                yield os.path.join(dirpath, filename)

Question 58

This should list the modules:

help("modules")

Question 59

If you would like to view an inforamtion about your package outside of the python code (from a command prompt) you can use pydoc for it.

# get a full list of packages that you have installed on you machine
$ python -m pydoc modules

# get information about a specific package
$ python -m pydoc <your package>

You will have the same result as pydoc but inside of interpreter using help

>>> import <my package>
>>> help(<my package>)

Question 60

def package_contents(package_name):
  package = __import__(package_name)
  return [module_name for module_name in dir(package) if not module_name.startswith("__")]

Question 61

print dir(module)

Question 62

I’m having a hard time understanding how module importing works in Python (I’ve never done it in any other language before either).

Let’s say I have:

myapp/__init__.py
myapp/myapp/myapp.py
myapp/myapp/SomeObject.py
myapp/tests/TestCase.py

Now I’m trying to get something like this:

myapp.py
===================
from myapp import SomeObject
# stuff ...

TestCase.py
===================
from myapp import SomeObject
# some tests on SomeObject

However, I’m definitely doing something wrong as Python can’t see that myapp is a module:

ImportError: No module named myapp

Question 63

In your particular case it looks like you’re trying to import SomeObject from the myapp.py and TestCase.py scripts. From myapp.py, do

import SomeObject

since it is in the same folder. For TestCase.py, do

from ..myapp import SomeObject

However, this will work only if you are importing TestCase from the package. If you want to directly run python TestCase.py, you would have to mess with your path. This can be done within Python:

import sys
sys.path.append("..")
from myapp import SomeObject

though that is generally not recommended.

In general, if you want other people to use your Python package, you should use distutils to create a setup script. That way, anyone can install your package easily using a command like python setup.py install and it will be available everywhere on their machine. If you’re serious about the package, you could even add it to the Python Package Index, PyPI.

Question 64

The function import looks for files into your PYTHONPATH env. variable and your local directory. So you can either put all your files in the same directory, or export the path typing into a terminal::

export PYTHONPATH="$PYTHONPATH:/path_to_myapp/myapp/myapp/"

问题：-m开关的作用是什么？

回答 0

回答 1

回答 2

简介（TLDR）

初赛

的历史发展 -m

用例

缺点

详细比较

结论

Introduction (TLDR)

Preliminaries

Historical Development of -m

Use Cases

Shortcomings

Detailed Comparisons

Conclusion

问题：访问包子目录中的数据

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：检查是否安装了Python软件包

回答 0

Python 3.3+使用sys.modules和find_spec：

Python 3：

Python 2：

Python 3.3+ use sys.modules and find_spec:

Python 3:

Python 2:

回答 1

更新的答案

旧答案

Updated answer

Old answer

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

问题：为什么从git repo进行pip安装时＃egg = foo

回答 0

回答 1

回答 2

回答 3

问题：是否使用-m选项执行Python代码

回答 0

回答 1

是否使用-m选项执行Python代码

该文档

说明：

后续问题

-m 而不将当前目录添加到路径：

怎么__package__办？

Execution of Python code with -m option or not

The docs

Explanation:

Follow-up Question

-m without adding the current directory to the path:

What does __package__ do?

回答 2

问题：如何从Python包内部读取（静态）文件？

我的情况

My situation

回答 0

回答 1

TLDR；使用标准库的importlib.resources模块，如下面方法2中所述。

1）使用pkg_resources从setuptools（慢）

的历史发展 `-m`

Historical Development of `-m`

`-m` 而不将当前目录添加到路径：

怎么`package`办？

`-m` without adding the current directory to the path:

What does `package` do?

TLDR；使用标准库的`importlib.resources`模块，如下面方法2中所述。

1）使用`pkg_resources`从`setuptools`（慢）

2）Python> = 3.7，或使用反向移植的`importlib_resources`库

TLDR; Use standard-library’s `importlib.resources` module as explained in the method no 2, below.

1) Using `pkg_resources` from `setuptools`(slow)

2) Python >= 3.7, or using the backported `importlib_resources` library

问题：Python 3.3+中的软件包不需要init.py吗