Python 实用宝典

Question 1

I know how to set it in my /etc/profile and in my environment variables.

But what if I want to set it during a script? Is it import os, sys? How do I do it?

Question 2

You don’t set PYTHONPATH, you add entries to sys.path. It’s a list of directories that should be searched for Python packages, so you can just append your directories to that list.

sys.path.append('/path/to/whatever')

In fact, sys.path is initialized by splitting the value of PYTHONPATH on the path separator character (: on Linux-like systems, ; on Windows).

You can also add directories using site.addsitedir, and that method will also take into account .pth files existing within the directories you pass. (That would not be the case with directories you specify in PYTHONPATH.)

Question 3

You can get and set environment variables via os.environ:

import os
user_home = os.environ["HOME"]

os.environ["PYTHONPATH"] = "..."

But since your interpreter is already running, this will have no effect. You’re better off using

import sys
sys.path.append("...")

which is the array that your PYTHONPATH will be transformed into on interpreter startup.

Question 4

If you put sys.path.append('dir/to/path') without check it is already added, you could generate a long list in sys.path. For that, I recommend this:

import sys
import os # if you want this directory

try:
    sys.path.index('/dir/path') # Or os.getcwd() for this directory
except ValueError:
    sys.path.append('/dir/path') # Or os.getcwd() for this directory

Question 5

PYTHONPATH ends up in sys.path, which you can modify at runtime.

import sys
sys.path += ["whatever"]

Question 6

you can set PYTHONPATH, by os.environ['PATHPYTHON']=/some/path, then you need to call os.system('python') to restart the python shell to make the newly added path effective.

Question 7

I linux this works too:

import sys
sys.path.extend(["/path/to/dotpy/file/"])

Question 8

My Code:

import nltk.data
tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')

ERROR Message:

[ec2-user@ip-172-31-31-31 sentiment]$ python mapper_local_v1.0.py
Traceback (most recent call last):
File "mapper_local_v1.0.py", line 16, in <module>

    tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')

File "/usr/lib/python2.6/site-packages/nltk/data.py", line 774, in load

    opened_resource = _open(resource_url)

File "/usr/lib/python2.6/site-packages/nltk/data.py", line 888, in _open

    return find(path_, path + ['']).open()

File "/usr/lib/python2.6/site-packages/nltk/data.py", line 618, in find

    raise LookupError(resource_not_found)

LookupError:

Resource u'tokenizers/punkt/english.pickle' not found.  Please
use the NLTK Downloader to obtain the resource:

    >>>nltk.download()

Searched in:
- '/home/ec2-user/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''

I’m trying to run this program in Unix machine:

As per the error message, I logged into python shell from my unix machine then I used the below commands:

import nltk
nltk.download()

and then I downloaded all the available things using d- down loader and l- list options but still the problem persists.

I tried my best to find the solution in internet but I got the same solution what I did as I mentioned in my above steps.

Question 9

To add to alvas’ answer, you can download only the punkt corpus:

nltk.download('punkt')

Downloading all sounds like overkill to me. Unless that’s what you want.

Question 10

If you’re looking to only download the punkt model:

import nltk
nltk.download('punkt')

If you’re unsure which data/model you need, you can install the popular datasets, models and taggers from NLTK:

import nltk
nltk.download('popular')

With the above command, there is no need to use the GUI to download the datasets.

Question 11

I got the solution:

import nltk
nltk.download()

once the NLTK Downloader starts

d) Download l) List u) Update c) Config h) Help q) Quit

Downloader> d

Download which package (l=list; x=cancel)? Identifier> punkt

Question 12

From the shell you can execute:

sudo python -m nltk.downloader punkt

If you want to install the popular NLTK corpora/models:

sudo python -m nltk.downloader popular

If you want to install all NLTK corpora/models:

sudo python -m nltk.downloader all

To list the resources you have downloaded:

python -c 'import os; import nltk; print os.listdir(nltk.data.find("corpora"))'
python -c 'import os; import nltk; print os.listdir(nltk.data.find("tokenizers"))'

Question 13

import nltk
nltk.download('punkt')

Open the Python prompt and run the above statements.

The sent_tokenize function uses an instance of PunktSentenceTokenizer from the nltk.tokenize.punkt module. This instance has already been trained and works well for many European languages. So it knows what punctuation and characters mark the end of a sentence and the beginning of a new sentence.

Question 14

The same thing happened to me recently, you just need to download the “punkt” package and it should work.

When you execute “list” (l) after having “downloaded all the available things”, is everything marked like the following line?:

[*] punkt............... Punkt Tokenizer Models

If you see this line with the star, it means you have it, and nltk should be able to load it.

Question 15

Go to python console by typing

$ python

in your terminal. Then, type the following 2 commands in your python shell to install the respective packages:

>> nltk.download(‘punkt’) >> nltk.download(‘averaged_perceptron_tagger’)

This solved the issue for me.

Question 16

My issue was that I called nltk.download('all') as the root user, but the process that eventually used nltk was another user who didn’t have access to /root/nltk_data where the content was downloaded.

So I simply recursively copied everything from the download location to one of the paths where NLTK was looking to find it like this:

cp -R /root/nltk_data/ /home/ubuntu/nltk_data

Question 17

Execute the following code:
```
import nltk
nltk.download()
```
After this, NLTK downloader will pop out.
Select All packages.
Download punkt.

Question 18

I was getting an error despite importing the following,

import nltk
nltk.download()

but for google colab this solved my issue.

   !python3 -c "import nltk; nltk.download('all')"

Question 19

Simple nltk.download() will not solve this issue. I tried the below and it worked for me:

in the nltk folder create a tokenizers folder and copy your punkt folder into tokenizers folder.

This will work.! the folder structure needs to be as shown in the picture

Question 20

You need to rearrange your folders Move your tokenizers folder into nltk_data folder. This doesn’t work if you have nltk_data folder containing corpora folder containing tokenizers folder

Question 21

For me nothing of the above worked, so I just downloaded all the files by hand from the web site http://www.nltk.org/nltk_data/ and I put them also by hand in a file “tokenizers” inside of “nltk_data” folder. Not a pretty solution but still a solution.

Question 22

After adding this line of code, the issue will be fixed:

nltk.download('punkt')

Question 23

I faced same issue. After downloading everything, still ‘punkt’ error was there. I searched package on my windows machine at C:\Users\vaibhav\AppData\Roaming\nltk_data\tokenizers and I can see ‘punkt.zip’ present there. I realized that somehow the zip has not been extracted into C:\Users\vaibhav\AppData\Roaming\nltk_data\tokenizers\punk. Once I extracted the zip, it worked like music.

Question 24

Just make sure you are using Jupyter Notebook and in a notebook, do the following:

import nltk

nltk.download()

Then one popup window will appear (showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml) From that you have to download everything.

Then rerun your code.

Question 25

For me it got solved by using “nltk:”

http://www.nltk.org/howto/data.html

Failed loading english.pickle with nltk.data.load

sent_tokenizer=nltk.data.load('nltk:tokenizers/punkt/english.pickle')

Question 26

What file do I edit, and how? I created a virtual environment.

Question 27

EDIT #2

The right answer is @arogachev’s one.

If you want to change the PYTHONPATH used in a virtualenv, you can add the following line to your virtualenv’s bin/activate file:

export PYTHONPATH="/the/path/you/want"

This way, the new PYTHONPATH will be set each time you use this virtualenv.

EDIT: (to answer @RamRachum’s comment)

To have it restored to its original value on deactivate, you could add

export OLD_PYTHONPATH="$PYTHONPATH"

before the previously mentioned line, and add the following line to your bin/postdeactivate script.

export PYTHONPATH="$OLD_PYTHONPATH"

Question 28

The comment by @s29 should be an answer:

One way to add a directory to the virtual environment is to install virtualenvwrapper (which is useful for many things) and then do

mkvirtualenv myenv
workon myenv
add2virtualenv . #for current directory
add2virtualenv ~/my/path

If you want to remove these path edit the file myenvhomedir/lib/python2.7/site-packages/_virtualenv_path_extensions.pth

Documentation on virtualenvwrapper can be found at http://virtualenvwrapper.readthedocs.org/en/latest/

Specific documentation on this feature can be found at http://virtualenvwrapper.readthedocs.org/en/latest/command_ref.html?highlight=add2virtualenv

Question 29

You can create a .pth file that contains the directory to search for, and place it in the site-packages directory. E.g.:

cd $(python -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())")
echo /some/library/path > some-library.pth

The effect is the same as adding /some/library/path to sys.path, and remain local to the virtualenv setup.

Question 30

Initialize your virtualenv

cd venv

source bin/activate

Just set or change your python path by entering command following:

export PYTHONPATH='/home/django/srmvenv/lib/python3.4'

for checking python path enter in python:

   python

      \>\> import sys

      \>\> sys.path

Question 31

I modified my activate script to source the file .virtualenvrc, if it exists in the current directory, and to save/restore PYTHONPATH on activate/deactivate.

You can find the patched activate script here.. It’s a drop-in replacement for the activate script created by virtualenv 1.11.6.

Then I added something like this to my .virtualenvrc:

export PYTHONPATH="${PYTHONPATH:+$PYTHONPATH:}/some/library/path"

Question 32

It’s already answered here -> Is my virtual environment (python) causing my PYTHONPATH to break?

UNIX/LINUX

Add “export PYTHONPATH=/usr/local/lib/python2.0” this to ~/.bashrc file and source it by typing “source ~/.bashrc” OR “. ~/.bashrc”.

WINDOWS XP

1) Go to the Control panel 2) Double click System 3) Go to the Advanced tab 4) Click on Environment Variables

In the System Variables window, check if you have a variable named PYTHONPATH. If you have one already, check that it points to the right directories. If you don’t have one already, click the New button and create it.

PYTHON CODE

Alternatively, you can also do below your code:-

import sys
sys.path.append("/home/me/mypy")

Question 33

I’m trying to create a daemon in python. I’ve found the following question, which has some good resources in it which I am currently following, but I’m curious as to why a double fork is necessary. I’ve scratched around google and found plenty of resources declaring that one is necessary, but not why.

Some mention that it is to prevent the daemon from acquiring a controlling terminal. How would it do this without the second fork? What are the repercussions?

Question 34

Looking at the code referenced in the question, the justification is:

Fork a second child and exit immediately to prevent zombies. This causes the second child process to be orphaned, making the init process responsible for its cleanup. And, since the first child is a session leader without a controlling terminal, it’s possible for it to acquire one by opening a terminal in the future (System V- based systems). This second fork guarantees that the child is no longer a session leader, preventing the daemon from ever acquiring a controlling terminal.

So it is to ensure that the daemon is re-parented onto init (just in case the process kicking off the daemon is long lived), and removes any chance of the daemon reacquiring a controlling tty. So if neither of these cases apply, then one fork should be sufficient. “Unix Network Programming – Stevens” has a good section on this.

Question 35

I was trying to understand the double fork and stumbled upon this question here. After a lot of research this is what I figured out. Hopefully it will help clarify things better for anyone who has the same question.

In Unix every process belongs to a group which in turn belongs to a session. Here is the hierarchy…

Session (SID) → Process Group (PGID) → Process (PID)

The first process in the process group becomes the process group leader and the first process in the session becomes the session leader. Every session can have one TTY associated with it. Only a session leader can take control of a TTY. For a process to be truly daemonized (ran in the background) we should ensure that the session leader is killed so that there is no possibility of the session ever taking control of the TTY.

I ran Sander Marechal’s python example daemon program from this site on my Ubuntu. Here are the results with my comments.

1. `Parent`    = PID: 28084, PGID: 28084, SID: 28046
2. `Fork#1`    = PID: 28085, PGID: 28084, SID: 28046
3. `Decouple#1`= PID: 28085, PGID: 28085, SID: 28085
4. `Fork#2`    = PID: 28086, PGID: 28085, SID: 28085

Note that the process is the session leader after Decouple#1, because it’s PID = SID. It could still take control of a TTY.

Note that Fork#2 is no longer the session leader PID != SID. This process can never take control of a TTY. Truly daemonized.

I personally find terminology fork-twice to be confusing. A better idiom might be fork-decouple-fork.

Additional links of interest:

Unix processes – http://www.win.tue.nl/~aeb/linux/lk/lk-10.html

Question 36

Strictly speaking, the double-fork has nothing to do with re-parenting the daemon as a child of init. All that is necessary to re-parent the child is that the parent must exit. This can be done with only a single fork. Also, doing a double-fork by itself doesn’t re-parent the daemon process to init; the daemon’s parent must exit. In other words, the parent always exits when forking a proper daemon so that the daemon process is re-parented to init.

So why the double fork? POSIX.1-2008 Section 11.1.3, “The Controlling Terminal“, has the answer (emphasis added):

The controlling terminal for a session is allocated by the session leader in an implementation-defined manner. If a session leader has no controlling terminal, and opens a terminal device file that is not already associated with a session without using the O_NOCTTY option (see open()), it is implementation-defined whether the terminal becomes the controlling terminal of the session leader. If a process which is not a session leader opens a terminal file, or the O_NOCTTY option is used on open(), then that terminal shall not become the controlling terminal of the calling process.

This tells us that if a daemon process does something like this …

int fd = open("/dev/console", O_RDWR);

… then the daemon process might acquire /dev/console as its controlling terminal, depending on whether the daemon process is a session leader, and depending on the system implementation. The program can guarantee that the above call will not acquire a controlling terminal if the program first ensures that it is not a session leader.

Normally, when launching a daemon, setsid is called (from the child process after calling fork) to dissociate the daemon from its controlling terminal. However, calling setsid also means that the calling process will be the session leader of the new session, which leaves open the possibility that the daemon could reacquire a controlling terminal. The double-fork technique ensures that the daemon process is not the session leader, which then guarantees that a call to open, as in the example above, will not result in the daemon process reacquiring a controlling terminal.

The double-fork technique is a bit paranoid. It may not be necessary if you know that the daemon will never open a terminal device file. Also, on some systems it may not be necessary even if the daemon does open a terminal device file, since that behavior is implementation-defined. However, one thing that is not implementation-defined is that only a session leader can allocate the controlling terminal. If a process isn’t a session leader, it can’t allocate a controlling terminal. Therefore, if you want to be paranoid and be certain that the daemon process cannot inadvertently acquire a controlling terminal, regardless of any implementation-defined specifics, then the double-fork technique is essential.

Question 37

Taken from Bad CTK:

“On some flavors of Unix, you are forced to do a double-fork on startup, in order to go into daemon mode. This is because single forking isn’t guaranteed to detach from the controlling terminal.”

Question 38

According to “Advanced Programming in the Unix Environment”, by Stephens and Rago, the second fork is more a recommendation, and it is done to guarantee that the daemon does not acquire a controlling terminal on System V-based systems.

Question 39

One reason is that the parent process can immediately wait_pid() for the child, and then forget about it. When then grand-child dies, it’s parent is init, and it will wait() for it – and taking it out of the zombie state.

The result is that the parent process doesn’t need to be aware of the forked children, and it also makes it possible to fork long running processes from libs etc.

Question 40

The daemon() call has the parent call _exit() if it succeeds. The original motivation may have been to allow the parent to do some extra work while the child is daemonizing.

It may also be based on a mistaken belief that it’s necessary in order to ensure the daemon has no parent process and is reparented to init – but this will happen anyway once the parent dies in the single fork case.

So I suppose it all just boils down to tradition in the end – a single fork is sufficient as long as the parent dies in short order anyway.

Question 41

A decent discussion of it appear to be at http://www.developerweb.net/forum/showthread.php?t=3025

Quoting mlampkin from there:

…think of the setsid( ) call as the “new” way to do thing (disassociate from the terminal) and the [second] fork( ) call after it as redundancy to deal with the SVr4…

Question 42

It might be easier to understand in this way:

The first fork and setsid will create a new session (but the process ID == session ID).
The second fork makes sure the process ID != session ID.

Question 43

How can I make any use of PYTHONPATH? When I try to run a script in the path the file is not found. When I cd to the directory holding the script the script runs. So what good is the PYTHONPATH?

$ echo $PYTHONPATH
:/home/randy/lib/python

$ tree -L 1 '/home/randy/lib/python' 
/home/randy/lib/python
├── gbmx_html.py
├── gbmx.py
├── __init__.py
├── __pycache__
├── scripts
└── yesno.py

$ python gbmx.py -h
python: can't open file 'gbmx.py': [Errno 2] No such file or directory

$ cd '/home/randy/lib/python'

After cd to the file directory it runs ..

$ python gbmx.py -h
usage: gbmx.py [-h] [-b]

Why can I not make any use of the PYTHONPATH?

Question 44

I think you’re a little confused. PYTHONPATH sets the search path for importing python modules, not for executing them like you’re trying.

PYTHONPATH Augment the default search path for module files. The format is the same as the shell’s PATH: one or more directory pathnames separated by os.pathsep (e.g. colons on Unix or semicolons on Windows). Non-existent directories are silently ignored.

In addition to normal directories, individual PYTHONPATH entries may refer to zipfiles containing pure Python modules (in either source or compiled form). Extension modules cannot be imported from zipfiles.

The default search path is installation dependent, but generally begins with prefix/lib/pythonversion (see PYTHONHOME above). It is always appended to PYTHONPATH.

An additional directory will be inserted in the search path in front of PYTHONPATH as described above under Interface options. The search path can be manipulated from within a Python program as the variable sys.path.

http://docs.python.org/2/using/cmdline.html#envvar-PYTHONPATH

What you’re looking for is PATH.

export PATH=$PATH:/home/randy/lib/python

However, to run your python script as a program, you also need to set a shebang for Python in the first line. Something like this should work:

#!/usr/bin/env python

And give execution privileges to it:

chmod +x /home/randy/lib/python/gbmx.py

Then you should be able to simply run gmbx.py from anywhere.

Question 45

You’re confusing PATH and PYTHONPATH. You need to do this:

export PATH=$PATH:/home/randy/lib/python

PYTHONPATH is used by the python interpreter to determine which modules to load.

PATH is used by the shell to determine which executables to run.

Question 46

PYTHONPATH only affects import statements, not the top-level Python interpreter’s lookup of python files given as arguments.

Needing PYTHONPATH to be set is not a great idea – as with anything dependent on environment variables, replicating things consistently across different machines gets tricky. Better is to use Python ‘packages’ which can be installed (using ‘pip’, or distutils) in system-dependent paths which Python already knows about.

Have a read of https://the-hitchhikers-guide-to-packaging.readthedocs.org/en/latest/ – ‘The Hitchhiker’s Guide to Packaging’, and also http://docs.python.org/3/tutorial/modules.html – which explains PYTHONPATH and packages at a lower level.

Question 47

I think you’re mixed up between PATH and PYTHONPATH. All you have to do to run a ‘script’ is have it’s parental directory appended to your PATH variable. You can test this by running

which myscript.py

Also, if myscripy.py depends on custom modules, their parental directories must also be added to the PYTHONPATH variable. Unfortunately, because the designers of python were clearly on drugs, testing your imports in the repl with the following will not guarantee that your PYTHONPATH is set properly for use in a script. This part of python programming is magic and can’t be answered appropriately on stackoverflow.

$python
Python 2.7.8 blahblahblah
...
>from mymodule.submodule import ClassName
>test = ClassName()
>^D
$myscript_that_needs_mymodule.submodule.py
Traceback (most recent call last):
  File "myscript_that_needs_mymodule.submodule.py", line 5, in <module>
    from mymodule.submodule import ClassName
  File "/path/to/myscript_that_needs_mymodule.submodule.py", line 5, in <module>
    from mymodule.submodule import ClassName
ImportError: No module named submodule

Question 48

With PYTHONPATH set as in your example, you should be able to do

python -m gmbx

-m option will make Python search for your module in paths Python usually searches modules in, including what you added to PYTHONPATH. When you run interpreter like python gmbx.py, it looks for particular file and PYTHONPATH does not apply.

Question 49

I have written a code in python which uses / to make a particular file in a folder, if I want to use the code in windows it will not work, is there a way by which I can use the code in Windows and Linux.

In python I am using this code:

pathfile=os.path.dirname(templateFile)
rootTree.write(''+pathfile+'/output/log.txt')

When I will use my code in suppose windows machine my code will not work.

How do I use “/” (directory separator) in both Linux and Windows?

Question 50

Use os.path.join(). Example: os.path.join(pathfile,"output","log.txt").

In your code that would be: rootTree.write(os.path.join(pathfile,"output","log.txt"))

Question 51

Use:

import os
print os.sep

to see how separator looks on a current OS.
In your code you can use:

import os
path = os.path.join('folder_name', 'file_name')

Question 52

You can use os.sep:

>>> import os
>>> os.sep
'/'

Question 53

os.path.normpath(pathname) should also be mentioned as it converts / path separators into \ separators on Windows. It also collapses redundant uplevel references… i.e., A/B and A/foo/../B and A/./B all become A/B. And if you are Windows, these all become A\B.

Question 54

If you are fortunate enough to be running Python 3.4+, you can use pathlib:

from pathlib import Path

path = Path(dir, subdir, filename)  # returns a path of the system's path flavour

or, equivalently,

path = Path(dir) / subdir / filename

Question 55

Some useful links that will help you:

Question 56

Do a import os and then use os.sep

Question 57

You can use “os.sep “

 import os
 pathfile=os.path.dirname(templateFile)
 directory = str(pathfile)+os.sep+'output'+os.sep+'log.txt'
 rootTree.write(directory)

Question 58

Don’t build directory and file names your self, use python’s included libraries.

In this case the relevant one is os.path. Especially join which creates a new pathname from a directory and a file name or directory and split that gets the filename from a full path.

Your example would be

pathfile=os.path.dirname(templateFile)
p = os.path.join(pathfile, 'output')
p = os.path.join( p, 'log.txt')
rootTree.write(p)

问题：在Python脚本中，如何设置PYTHONPATH？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：找不到资源u’tokenizers / punkt / english.pickle’

回答 0

回答 1

回答 2

NLTK下载器启动后

d）下载l）列表u）更新c）配置h）帮助q）退出

once the NLTK Downloader starts

d) Download l) List u) Update c) Config h) Help q) Quit

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

回答 16

问题：如何在已经创建的virtualenv中设置pythonpath？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

UNIX / Linux

Windows XP

密码

UNIX/LINUX

WINDOWS XP

PYTHON CODE

问题：创建守护程序时执行双叉的原因是什么？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：如何在不使用CD-cd进入目录的情况下在命令行中使用Python脚本？是PYTHONPATH吗？

回答 0

回答 1

回答 2

回答 3

回答 4

问题：如何在Linux和Windows中的Python中使用“ /”（目录分隔符）？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

有趣好用的Python教程