分类目录归档:知识问答

人们为什么在Python脚本的第一行上编写#!/ usr / bin / env python shebang?

问题:人们为什么在Python脚本的第一行上编写#!/ usr / bin / env python shebang?

在我看来,如果没有该行,文件运行相同。

It seems to me like the files run the same without that line.


回答 0

如果您安装了多个版本的Python,请/usr/bin/env确保使用的解释器是您环境中的第一个解释器$PATH。另一种方法是对类似的东西进行硬编码#!/usr/bin/python;可以,但是不太灵活。

在Unix中,要解释的可执行文件可以通过#!在第一行的开头加上,然后是解释器(及其可能需要的任何标志)来指示要使用的解释器。

当然,如果您在谈论其他平台,则此规则不适用(但“ shebang行”没有害处,并且如果您将该脚本复制到具有 Unix基础的平台(例如Linux,Mac),将有帮助等)。

If you have several versions of Python installed, /usr/bin/env will ensure the interpreter used is the first one on your environment’s $PATH. The alternative would be to hardcode something like #!/usr/bin/python; that’s ok, but less flexible.

In Unix, an executable file that’s meant to be interpreted can indicate what interpreter to use by having a #! at the start of the first line, followed by the interpreter (and any flags it may need).

If you’re talking about other platforms, of course, this rule does not apply (but that “shebang line” does no harm, and will help if you ever copy that script to a platform with a Unix base, such as Linux, Mac, etc).


回答 1

那就是shebang线。如Wikipedia条目所述

在计算中,shebang(也称为“ hashbang”,“ hashhpling”,“ bang bang”或“ crunchbang”)是指字符“#!”。当它们是解释器指令中的前两个字符作为文本文件的第一行时。在类似Unix的操作系统中,程序加载器将这两个字符的存在指示为文件是脚本,并尝试使用文件中第一行其余部分指定的解释器执行该脚本。

另请参见Unix FAQ条目

即使在Windows上,shebang行不能确定要运行的解释程序,也可以通过在shebang行上指定选项来将选项传递给解释程序。我发现在一次性脚本中保留通用的shebang行很有用(例如我在回答SO问题时编写的脚本),因此我可以在Windows和ArchLinux上快速对其进行测试。

使用env实用程序可以在路径上调用命令:

剩下的第一个参数指定要调用的程序名称;根据PATH环境变量进行搜索。任何剩余的参数将作为参数传递给该程序。

That is called the shebang line. As the Wikipedia entry explains:

In computing, a shebang (also called a hashbang, hashpling, pound bang, or crunchbang) refers to the characters “#!” when they are the first two characters in an interpreter directive as the first line of a text file. In a Unix-like operating system, the program loader takes the presence of these two characters as an indication that the file is a script, and tries to execute that script using the interpreter specified by the rest of the first line in the file.

See also the Unix FAQ entry.

Even on Windows, where the shebang line does not determine the interpreter to be run, you can pass options to the interpreter by specifying them on the shebang line. I find it useful to keep a generic shebang line in one-off scripts (such as the ones I write when answering questions on SO), so I can quickly test them on both Windows and ArchLinux.

The env utility allows you to invoke a command on the path:

The first remaining argument specifies the program name to invoke; it is searched for according to the PATH environment variable. Any remaining arguments are passed as arguments to that program.


回答 2

进一步扩展其他答案,这是一个小示例,说明了如何谨慎使用/usr/bin/envshebang行会导致命令行脚本出现问题:

$ /usr/local/bin/python -V
Python 2.6.4
$ /usr/bin/python -V
Python 2.5.1
$ cat my_script.py 
#!/usr/bin/env python
import json
print "hello, json"
$ PATH=/usr/local/bin:/usr/bin
$ ./my_script.py 
hello, json
$ PATH=/usr/bin:/usr/local/bin
$ ./my_script.py 
Traceback (most recent call last):
  File "./my_script.py", line 2, in <module>
    import json
ImportError: No module named json

json模块在Python 2.5中不存在。

防止此类问题的一种方法是使用大多数Python通常安装的版本化python命令名称:

$ cat my_script.py 
#!/usr/bin/env python2.6
import json
print "hello, json"

如果您只需要区分Python 2.x和Python 3.x,Python 3的最新版本还提供了一个python3名称:

$ cat my_script.py 
#!/usr/bin/env python3
import json
print("hello, json")

Expanding a bit on the other answers, here’s a little example of how your command line scripts can get into trouble by incautious use of /usr/bin/env shebang lines:

$ /usr/local/bin/python -V
Python 2.6.4
$ /usr/bin/python -V
Python 2.5.1
$ cat my_script.py 
#!/usr/bin/env python
import json
print "hello, json"
$ PATH=/usr/local/bin:/usr/bin
$ ./my_script.py 
hello, json
$ PATH=/usr/bin:/usr/local/bin
$ ./my_script.py 
Traceback (most recent call last):
  File "./my_script.py", line 2, in <module>
    import json
ImportError: No module named json

The json module doesn’t exist in Python 2.5.

One way to guard against that kind of problem is to use the versioned python command names that are typically installed with most Pythons:

$ cat my_script.py 
#!/usr/bin/env python2.6
import json
print "hello, json"

If you just need to distinguish between Python 2.x and Python 3.x, recent releases of Python 3 also provide a python3 name:

$ cat my_script.py 
#!/usr/bin/env python3
import json
print("hello, json")

回答 3

为了运行python脚本,我们需要告诉shell三件事:

  1. 该文件是一个脚本
  2. 我们要执行哪个解释器的脚本
  3. 口译员的路径

射手#!完成(1.)。shebang以a开头,#因为该#字符在许多脚本语言中都是注释标记。因此,解释器会自动忽略shebang行的内容。

env命令完成(2.)和(3.)。引用“草率”

env命令的常见用法是通过利用env将在$ PATH中搜索被告知要启动的命令的事实来启动解释器。由于shebang行需要指定绝对路径,并且由于各种解释器(perl,bash,python)的位置可能相差很大,因此通常使用:

#!/usr/bin/env perl  而不是尝试猜测它是/ bin / perl,/ usr / bin / perl,/ usr / local / bin / perl,/ usr / local / pkg / perl,/ fileserver / usr / bin / perl还是/ home用户系统上的/ MrDaniel / usr / bin / perl …

另一方面,env几乎总是位于/ usr / bin / env中。(除非不是这种情况;某些系统可能使用/ bin / env,但这是一种相当罕见的情况,仅在非Linux系统上发生。)

In order to run the python script, we need to tell the shell three things:

  1. That the file is a script
  2. Which interpreter we want to execute the script
  3. The path of said interpreter

The shebang #! accomplishes (1.). The shebang begins with a # because the # character is a comment marker in many scripting languages. The contents of the shebang line are therefore automatically ignored by the interpreter.

The env command accomplishes (2.) and (3.). To quote “grawity,”

A common use of the env command is to launch interpreters, by making use of the fact that env will search $PATH for the command it is told to launch. Since the shebang line requires an absolute path to be specified, and since the location of various interpreters (perl, bash, python) may vary a lot, it is common to use:

#!/usr/bin/env perl  instead of trying to guess whether it is /bin/perl, /usr/bin/perl, /usr/local/bin/perl, /usr/local/pkg/perl, /fileserver/usr/bin/perl, or /home/MrDaniel/usr/bin/perl on the user’s system…

On the other hand, env is almost always in /usr/bin/env. (Except in cases when it isn’t; some systems might use /bin/env, but that’s a fairly rare occassion and only happens on non-Linux systems.)


回答 4

也许您的问题是这样的:

如果要使用: $python myscript.py

您根本不需要那条线。系统将调用python,然后python解释器将运行您的脚本。

但是,如果您打算使用: $./myscript.py

像普通程序或bash脚本一样直接调用它,您需要编写该行以向系统指定运行该程序的程序(并使其通过可执行chmod 755

Perhaps your question is in this sense:

If you want to use: $python myscript.py

You don’t need that line at all. The system will call python and then python interpreter will run your script.

But if you intend to use: $./myscript.py

Calling it directly like a normal program or bash script, you need write that line to specify to the system which program use to run it, (and also make it executable with chmod 755)


回答 5

execLinux内核的系统调用#!本身了解shebangs()

当您进行bash操作时:

./something

在Linux上,这会exec使用path 调用系统调用./something

内核的这一行在传递给exec以下文件的文件上调用:https : //github.com/torvalds/linux/blob/v4.8/fs/binfmt_script.c#L25

if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))

它读取文件的头几个字节,并将其与进行比较#!

如果比较结果为真,那么Linux内核将解析其余的行,这将exec使用路径/usr/bin/env python和当前文件作为第一个参数进行另一个调用:

/usr/bin/env python /path/to/script.py

这适用于任何#用作注释字符的脚本语言。

是的,您可以使用以下方法进行无限循环:

printf '#!/a\n' | sudo tee /a
sudo chmod +x /a
/a

Bash识别错误:

-bash: /a: /a: bad interpreter: Too many levels of symbolic links

#! 只是碰巧是人类可读的,但这不是必需的。

如果文件以不同的字节开头,则exec系统调用将使用其他处理程序。另一个最重要的内置处理程序是用于ELF可执行文件:https : //github.com/torvalds/linux/blob/v4.8/fs/binfmt_elf.c#L1305检查字节7f 45 4c 46(也恰好是人的)可读.ELF)。让我们通过读取的4个前字节来确认/bin/ls,这是ELF可执行文件:

head -c 4 "$(which ls)" | hd 

输出:

00000000  7f 45 4c 46                                       |.ELF|
00000004                                                                 

因此,当内核看到这些字节时,它将获取ELF文件,将其正确地放入内存,并使用它开始一个新进程。另请参阅:内核如何获取在Linux下运行的可执行二进制文件?

最后,您可以使用该binfmt_misc机制添加自己的shebang处理程序。例如,您可以.jarfiles添加自定义处理程序。该机制甚至通过文件扩展名支持处理程序。另一个应用程序是使用QEMU透明地运行不同体系结构的可执行文件

我不认为POSIX指定了shebangs:https ://unix.stackexchange.com/a/346214/32558 ,尽管它在基本原理部分中确实提到了,并且形式为“如果系统支持可执行脚本,则可能发生”。macOS和FreeBSD似乎也实现了它。

PATH 搜索动机

可能存在shebang的一个主要动机是,在Linux中,我们经常希望从以下命令运行命令PATH

basename-of-command

代替:

/full/path/to/basename-of-command

但是,如果没有shebang机制,Linux如何知道如何启动每种类型的文件?

在命令中对扩展进行硬编码:

 basename-of-command.py

或在每个解释器上实施PATH搜索:

python basename-of-command

这样做是有可能的,但这是一个主要问题,如果我们决定将命令重构为另一种语言,那么一切都会中断。

Shebangs很好地解决了这个问题。

The exec system call of the Linux kernel understands shebangs (#!) natively

When you do on bash:

./something

on Linux, this calls the exec system call with the path ./something.

This line of the kernel gets called on the file passed to exec: https://github.com/torvalds/linux/blob/v4.8/fs/binfmt_script.c#L25

if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))

It reads the very first bytes of the file, and compares them to #!.

If the comparison is true, then the rest of the line is parsed by the Linux kernel, which makes another exec call with path /usr/bin/env python and current file as the first argument:

/usr/bin/env python /path/to/script.py

and this works for any scripting language that uses # as a comment character.

And yes, you can make an infinite loop with:

printf '#!/a\n' | sudo tee /a
sudo chmod +x /a
/a

Bash recognizes the error:

-bash: /a: /a: bad interpreter: Too many levels of symbolic links

#! just happens to be human readable, but that is not required.

If the file started with different bytes, then the exec system call would use a different handler. The other most important built-in handler is for ELF executable files: https://github.com/torvalds/linux/blob/v4.8/fs/binfmt_elf.c#L1305 which checks for bytes 7f 45 4c 46 (which also happens to be human readable for .ELF). Let’s confirm that by reading the 4 first bytes of /bin/ls, which is an ELF executable:

head -c 4 "$(which ls)" | hd 

output:

00000000  7f 45 4c 46                                       |.ELF|
00000004                                                                 

So when the kernel sees those bytes, it takes the ELF file, puts it into memory correctly, and starts a new process with it. See also: How does kernel get an executable binary file running under linux?

Finally, you can add your own shebang handlers with the binfmt_misc mechanism. For example, you can add a custom handler for .jar files. This mechanism even supports handlers by file extension. Another application is to transparently run executables of a different architecture with QEMU.

I don’t think POSIX specifies shebangs however: https://unix.stackexchange.com/a/346214/32558 , although it does mention in on rationale sections, and in the form “if executable scripts are supported by the system something may happen”. macOS and FreeBSD also seem to implement it however.

PATH search motivation

Likely, one big motivation for the existence of shebangs is the fact that in Linux, we often want to run commands from PATH just as:

basename-of-command

instead of:

/full/path/to/basename-of-command

But then, without the shebang mechanism, how would Linux know how to launch each type of file?

Hardcoding the extension in commands:

 basename-of-command.py

or implementing PATH search on every interpreter:

python basename-of-command

would be a possibility, but this has the major problem that everything breaks if we ever decide to refactor the command into another language.

Shebangs solve this problem beautifully.


回答 6

从技术上讲,在Python中,这只是一条注释行。

仅当您从外壳程序(从命令行)运行py脚本时才使用此行。这就是众所周知的射帮!” ,它可用于各种情况,而不仅限于Python脚本。

在这里,它指示shell启动特定版本的Python(以照顾文件的其余部分。

Technically, in Python, this is just a comment line.

This line is only used if you run the py script from the shell (from the command line). This is know as the Shebang!”, and it is used in various situations, not just with Python scripts.

Here, it instructs the shell to start a specific version of Python (to take care of the rest of the file.


回答 7

这样做的主要原因是使脚本可跨操作系统环境移植。

例如在mingw下,python脚本使用:

#!/c/python3k/python 

在GNU / Linux发行版中,它是:

#!/usr/local/bin/python 

要么

#!/usr/bin/python

在所有最佳的商业Unix sw / hw系统(OS / X)下,它是:

#!/Applications/MacPython 2.5/python

或在FreeBSD上:

#!/usr/local/bin/python

但是,所有这些差异都可以通过使用以下命令使脚本可移植到所有人中:

#!/usr/bin/env python

The main reason to do this is to make the script portable across operating system environments.

For example under mingw, python scripts use :

#!/c/python3k/python 

and under GNU/Linux distribution it is either:

#!/usr/local/bin/python 

or

#!/usr/bin/python

and under the best commercial Unix sw/hw system of all (OS/X), it is:

#!/Applications/MacPython 2.5/python

or on FreeBSD:

#!/usr/local/bin/python

However all these differences can make the script portable across all by using:

#!/usr/bin/env python

回答 8

强调大多数人错过的一件事可能是有道理的,这可能会阻止立即理解。当您输入python终端时,通常不会提供完整路径。而是在PATH环境变量中向上查找可执行文件。反过来,当您想直接执行Python程序时/path/to/app.py,必须告诉Shell使用什么解释器(通过hashbang,上面其他贡献者在解释什么)。

Hashbang希望有完整的口译员。因此,要直接运行Python程序,您必须提供Python二进制文件的完整路径,该路径有很大差异,尤其是考虑到使用virtualenv时。为了解决可移植性,/usr/bin/env使用了技巧。后者最初旨在就地更改环境并在其中运行命令。如果未提供任何更改,它将在当前环境中运行该命令,从而有效地导致执行该操作的相同PATH查找。

来自unix stackexchange的来源

It probably makes sense to emphasize one thing that the most have missed, which may prevent immediate understanding. When you type python in terminal you don’t normally provide a full path. Instead, the executable is up looked in PATH environment variable. In turn, when you want to execute a Python program directly, /path/to/app.py, one must tell the shell what interpreter to use (via the hashbang, what the other contributors are explaining above).

Hashbang expects full path to an interpreter. Thus to run your Python program directly you have to provide full path to Python binary which varies significantly, especially considering a use of virtualenv. To address portability the trick with /usr/bin/env is used. The latter is originally intended to alter environment in-place and run a command in it. When no alteration is provided it runs the command in current environment, which effectively results in the same PATH lookup which does the trick.

Source from unix stackexchange


回答 9

这是一个Shell约定,它告诉Shell哪个程序可以执行脚本。

#!/ usr / bin / env python

解析为Python二进制文件的路径。

This is a shell convention that tells the shell which program can execute the script.

#!/usr/bin/env python

resolves to a path to the Python binary.


回答 10

建议的方法,在文档中提出:

2.2.2。可执行Python脚本

在BSD式的Unix系统上,可以将Python脚本像shell脚本一样直接执行,方法是:

#! /usr/bin/env python3.2

来自http://docs.python.org/py3k/tutorial/interpreter.html#executable-python-scripts

It’s recommended way, proposed in documentation:

2.2.2. Executable Python Scripts

On BSD’ish Unix systems, Python scripts can be made directly executable, like shell scripts, by putting the line

#! /usr/bin/env python3.2

from http://docs.python.org/py3k/tutorial/interpreter.html#executable-python-scripts


回答 11

您可以使用virtualenv尝试此问题

这是test.py

#! /usr/bin/env python
import sys
print(sys.version)

创建虚拟环境

virtualenv test2.6 -p /usr/bin/python2.6
virtualenv test2.7 -p /usr/bin/python2.7

激活每个环境,然后检查差异

echo $PATH
./test.py

You can try this issue using virtualenv

Here is test.py

#! /usr/bin/env python
import sys
print(sys.version)

Create virtual environments

virtualenv test2.6 -p /usr/bin/python2.6
virtualenv test2.7 -p /usr/bin/python2.7

activate each environment then check the differences

echo $PATH
./test.py

回答 12

它只是指定您要使用的解释器。要理解这一点,可以通过在终端上创建一个文件touch test.py,然后在该文件中键入以下内容:

#!/usr/bin/env python3
print "test"

chmod +x test.py使你的脚本执行。之后,当您执行此操作时,./test.py将出现错误消息:

  File "./test.py", line 2
    print "test"
               ^
SyntaxError: Missing parentheses in call to 'print'

因为python3不支持print运算符。

现在继续并将代码的第一行更改为:

#!/usr/bin/env python2

并且可以正常工作,并打印test到stdout,因为python2支持print运算符。因此,现在您已经了解了如何在脚本解释器之间切换。

It just specifies what interpreter you want to use. To understand this, create a file through terminal by doing touch test.py, then type into that file the following:

#!/usr/bin/env python3
print "test"

and do chmod +x test.py to make your script executable. After this when you do ./test.py you should get an error saying:

  File "./test.py", line 2
    print "test"
               ^
SyntaxError: Missing parentheses in call to 'print'

because python3 doesn’t supprt the print operator.

Now go ahead and change the first line of your code to:

#!/usr/bin/env python2

and it’ll work, printing test to stdout, because python2 supports the print operator. So, now you’ve learned how to switch between script interpreters.


回答 13

在我看来,如果没有该行,文件运行相同。

如果是这样,那么也许您正在Windows上运行Python程序?Windows不使用该行,而是使用文件扩展名来运行与文件扩展名关联的程序。

但是在2011年,开发了“ Python启动器”,在某种程度上模仿了Windows的Linux行为。仅限于选择运行哪个Python解释器-例如,在同时安装了Python 2和Python 3的系统上进行选择。启动器可以选择py.exe通过Python安装进行安装,并且可以与.py文件关联,以便启动器将检查该行,然后启动指定的Python解释器版本。

It seems to me like the files run the same without that line.

If so, then perhaps you’re running the Python program on Windows? Windows doesn’t use that line—instead, it uses the file-name extension to run the program associated with the file extension.

However in 2011, a “Python launcher” was developed which (to some degree) mimics this Linux behaviour for Windows. This is limited just to choosing which Python interpreter is run — e.g. to select between Python 2 and Python 3 on a system where both are installed. The launcher is optionally installed as py.exe by Python installation, and can be associated with .py files so that the launcher will check that line and in turn launch the specified Python interpreter version.


回答 14

这意味着更多的历史信息,而不是“真实的”答案。

请记住,在过去,您有很多像操作系统一样的unix操作系统,其设计人员都对放置内容有自己的看法,有时甚至根本不包括Python,Perl,Bash或许多其他GNU /开源内容

甚至在不同的Linux发行版中也是如此。在Linux上-FHS之前的版本[1]-您可能在/ usr / bin /或/ usr / local / bin /中有python。或者它可能尚未安装,所以您构建了自己的并将其放入〜/ bin

Solaris是我曾经从事过的最糟糕的工作,部分是从Berkeley Unix到System V的过渡。您可能会在/ usr /,/ usr / local /,/ usr / ucb,/ opt /等目录中找到内容。对于一些真的长的路。我对Sunfreeware.com中的内容有记忆,但每个软件包都安装在其自己的目录中,但是我记不起来是否将二进制文件链接到/ usr / bin。

哦,有时/ usr / bin在NFS服务器上[2]。

因此,env开发实用程序来解决此问题。

然后,你可以写#!/bin/env interpreter只要路径是正确的事情有一个合理的运行的机会。当然,合理的意思是(对于Python和Perl)您还设置了适当的环境变量。对于bash / ksh / zsh来说,它才有效。

这很重要,因为人们正在传递shell脚本(例如perl和python),并且如果您在Red Hat Linux工作站上对/ usr / bin / python进行硬编码,那么在SGI上会很糟糕…嗯,不,我认为IRIX将python放在了正确的位置。但是在Sparc工作站上,它可能根本不运行。

我想念我的sparc站。但不是很多。好的,现在您已经让我在E-Bay上四处走动。卑鄙的人

[1]文件系统层次结构标准。https://zh.wikipedia.org/wiki/Filesystem_Hierarchy_Standard

[2]是的,有时人们仍然会做类似的事情。不,我没有在皮带上戴萝卜或洋葱。

This is meant as more of historical information than a “real” answer.

Remember that back in the day you had LOTS of unix like operating systems whose designers all had their own notion of where to put stuff, and sometimes didn’t include Python, Perl, Bash, or lots of other GNU/Open Source stuff at all.

This was even true of different Linux distributions. On Linux–pre-FHS[1]-you might have python in /usr/bin/ or /usr/local/bin/. Or it might not have been installed, so you built your own and put it in ~/bin

Solaris was the worst I ever worked on, partially as the transition from Berkeley Unix to System V. You could wind up with stuff in /usr/, /usr/local/, /usr/ucb, /opt/ etc. This could make for some really long paths. I have memories of the stuff from Sunfreeware.com installing each package in it’s own directory, but I can’t recall if it symlinked the binaries into /usr/bin or not.

Oh, and sometimes /usr/bin was on an NFS server[2].

So the env utility was developed to work around this.

Then you could write #!/bin/env interpreter and as long as the path was proper things had a reasonable chance of running. Of course, reasonable meant (for Python and Perl) that you had also set the appropriate environmental variables. For bash/ksh/zsh it just worked.

This was important because people were passing around shell scripts (like perl and python) and if you’d hard coded /usr/bin/python on your Red Hat Linux workstation it was going to break bad on a SGI…well, no, I think IRIX put python in the right spot. But on a Sparc station it might not run at all.

I miss my sparc station. But not a lot. Ok, now you’ve got me trolling around on E-Bay. Bastages.

[1] File-system Hierarchy Standard. https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard

[2] Yes, and sometimes people still do stuff like that. And no, I did not wear either a turnip OR an onion on my belt.


回答 15

如果您是在虚拟环境中运行脚本,请说venv,然后在执行which python时执行venv将显示Python解释器的路径:

~/Envs/venv/bin/python

请注意,虚拟环境名称嵌入在Python解释器的路径中。因此,在脚本中对此路径进行硬编码将导致两个问题:

  • 如果将脚本上载到存储库,则将强制其他用户使用相同的虚拟环境名称。这是他们首先发现问题的方法。
  • 将无法运行在多个虚拟环境中的脚本,即使你有在其他虚拟环境中所有需要的软件包。

因此,要补充Jonathan的答案,理想的shebang是#!/usr/bin/env python,不仅是跨OS的可移植性,而且是跨虚拟环境的可移植性!

If you’re running your script in a virtual environment, say venv, then executing which python while working on venv will display the path to the Python interpreter:

~/Envs/venv/bin/python

Note that the name of the virtual environment is embedded in the path to the Python interpreter. Therefore, hardcoding this path in your script will cause two problems:

  • If you upload the script to a repository, you’re forcing other users to have the same virtual environment name. This is if they identify the problem first.
  • You won’t be able to run the script across multiple virtual environments even if you had all required packages in other virtual environments.

Therefore, to add to Jonathan‘s answer, the ideal shebang is #!/usr/bin/env python, not just for portability across OSes but for portability across virtual environments as well!


回答 16

考虑到python2和之间的可移植性问题python3,除非您的程序与两者兼容,否则应始终指定任何一个版本。

一些分布航运python符号链接到python3现在一段时间-不要依赖pythonpython2

PEP 394强调了这一点:

为了容忍平台之间的差异,所有需要调用Python解释器的新代码都不应指定python,而应指定python2或python3(或更具体的python2.x和python3.x版本;请参阅迁移说明) 。从shell脚本调用时,通过system()调用调用时或在任何其他上下文中调用时,都应在shebang中进行区分。

Considering the portability issues between python2 and python3, you should always specify either version unless your program is compatible with both.

Some distributions are shipping python symlinked to python3 for a while now – do not rely on python being python2.

This is emphasized by PEP 394:

In order to tolerate differences across platforms, all new code that needs to invoke the Python interpreter should not specify python, but rather should specify either python2 or python3 (or the more specific python2.x and python3.x versions; see the Migration Notes). This distinction should be made in shebangs, when invoking from a shell script, when invoking via the system() call, or when invoking in any other context.


回答 17

当您有多个python版本时,它会告诉解释器与哪个版本的python一起运行程序。

It tells the interpreter which version of python to run the program with when you have multiple versions of python.


回答 18

它允许您选择要使用的可执行文件。如果您可能安装了多个python,并且每个安装中都有不同的模块并希望选择,则这非常方便。例如

#!/bin/sh
#
# Choose the python we need. Explanation:
# a) '''\' translates to \ in shell, and starts a python multi-line string
# b) "" strings are treated as string concat by python, shell ignores them
# c) "true" command ignores its arguments
# c) exit before the ending ''' so the shell reads no further
# d) reset set docstrings to ignore the multiline comment code
#
"true" '''\'
PREFERRED_PYTHON=/Library/Frameworks/Python.framework/Versions/2.7/bin/python
ALTERNATIVE_PYTHON=/Library/Frameworks/Python.framework/Versions/3.6/bin/python3
FALLBACK_PYTHON=python3

if [ -x $PREFERRED_PYTHON ]; then
    echo Using preferred python $ALTERNATIVE_PYTHON
    exec $PREFERRED_PYTHON "$0" "$@"
elif [ -x $ALTERNATIVE_PYTHON ]; then
    echo Using alternative python $ALTERNATIVE_PYTHON
    exec $ALTERNATIVE_PYTHON "$0" "$@"
else
    echo Using fallback python $FALLBACK_PYTHON
    exec python3 "$0" "$@"
fi
exit 127
'''

__doc__ = """What this file does"""
print(__doc__)
import platform
print(platform.python_version())

It allows you to select the executable that you wish to use; which is very handy if perhaps you have multiple python installs, and different modules in each and wish to choose. e.g.

#!/bin/sh
#
# Choose the python we need. Explanation:
# a) '''\' translates to \ in shell, and starts a python multi-line string
# b) "" strings are treated as string concat by python, shell ignores them
# c) "true" command ignores its arguments
# c) exit before the ending ''' so the shell reads no further
# d) reset set docstrings to ignore the multiline comment code
#
"true" '''\'
PREFERRED_PYTHON=/Library/Frameworks/Python.framework/Versions/2.7/bin/python
ALTERNATIVE_PYTHON=/Library/Frameworks/Python.framework/Versions/3.6/bin/python3
FALLBACK_PYTHON=python3

if [ -x $PREFERRED_PYTHON ]; then
    echo Using preferred python $ALTERNATIVE_PYTHON
    exec $PREFERRED_PYTHON "$0" "$@"
elif [ -x $ALTERNATIVE_PYTHON ]; then
    echo Using alternative python $ALTERNATIVE_PYTHON
    exec $ALTERNATIVE_PYTHON "$0" "$@"
else
    echo Using fallback python $FALLBACK_PYTHON
    exec python3 "$0" "$@"
fi
exit 127
'''

__doc__ = """What this file does"""
print(__doc__)
import platform
print(platform.python_version())

回答 19

这告诉脚本python目录在哪里!

#! /usr/bin/env python

this tells the script where is python directory !

#! /usr/bin/env python

在Python中以扩展名.txt查找目录中的所有文件

问题:在Python中以扩展名.txt查找目录中的所有文件

如何.txt在python中具有扩展名的目录中找到所有文件?

How can I find all the files in a directory having the extension .txt in python?


回答 0

您可以使用glob

import glob, os
os.chdir("/mydir")
for file in glob.glob("*.txt"):
    print(file)

或简单地os.listdir

import os
for file in os.listdir("/mydir"):
    if file.endswith(".txt"):
        print(os.path.join("/mydir", file))

或者如果要遍历目录,请使用os.walk

import os
for root, dirs, files in os.walk("/mydir"):
    for file in files:
        if file.endswith(".txt"):
             print(os.path.join(root, file))

You can use glob:

import glob, os
os.chdir("/mydir")
for file in glob.glob("*.txt"):
    print(file)

or simply os.listdir:

import os
for file in os.listdir("/mydir"):
    if file.endswith(".txt"):
        print(os.path.join("/mydir", file))

or if you want to traverse directory, use os.walk:

import os
for root, dirs, files in os.walk("/mydir"):
    for file in files:
        if file.endswith(".txt"):
             print(os.path.join(root, file))

回答 1

使用glob

>>> import glob
>>> glob.glob('./*.txt')
['./outline.txt', './pip-log.txt', './test.txt', './testingvim.txt']

Use glob.

>>> import glob
>>> glob.glob('./*.txt')
['./outline.txt', './pip-log.txt', './test.txt', './testingvim.txt']

回答 2

这样的事情应该做的

for root, dirs, files in os.walk(directory):
    for file in files:
        if file.endswith('.txt'):
            print file

Something like that should do the job

for root, dirs, files in os.walk(directory):
    for file in files:
        if file.endswith('.txt'):
            print file

回答 3

这样的事情会起作用:

>>> import os
>>> path = '/usr/share/cups/charmaps'
>>> text_files = [f for f in os.listdir(path) if f.endswith('.txt')]
>>> text_files
['euc-cn.txt', 'euc-jp.txt', 'euc-kr.txt', 'euc-tw.txt', ... 'windows-950.txt']

Something like this will work:

>>> import os
>>> path = '/usr/share/cups/charmaps'
>>> text_files = [f for f in os.listdir(path) if f.endswith('.txt')]
>>> text_files
['euc-cn.txt', 'euc-jp.txt', 'euc-kr.txt', 'euc-tw.txt', ... 'windows-950.txt']

回答 4

您可以简单地使用pathlibs 1glob

import pathlib

list(pathlib.Path('your_directory').glob('*.txt'))

或循环:

for txt_file in pathlib.Path('your_directory').glob('*.txt'):
    # do something with "txt_file"

如果您希望递归可以使用 .glob('**/*.txt)


1pathlib模块包含在python 3.4的标准库中。但是,即使在较旧的Python版本(例如,使用condapip)上,您也可以安装该模块的后端口:pathlibpathlib2

You can simply use pathlibs glob 1:

import pathlib

list(pathlib.Path('your_directory').glob('*.txt'))

or in a loop:

for txt_file in pathlib.Path('your_directory').glob('*.txt'):
    # do something with "txt_file"

If you want it recursive you can use .glob('**/*.txt)


1The pathlib module was included in the standard library in python 3.4. But you can install back-ports of that module even on older Python versions (i.e. using conda or pip): pathlib and pathlib2.


回答 5

import os

path = 'mypath/path' 
files = os.listdir(path)

files_txt = [i for i in files if i.endswith('.txt')]
import os

path = 'mypath/path' 
files = os.listdir(path)

files_txt = [i for i in files if i.endswith('.txt')]

回答 6

我喜欢os.walk()

import os

for root, dirs, files in os.walk(dir):
    for f in files:
        if os.path.splitext(f)[1] == '.txt':
            fullpath = os.path.join(root, f)
            print(fullpath)

或使用生成器:

import os

fileiter = (os.path.join(root, f)
    for root, _, files in os.walk(dir)
    for f in files)
txtfileiter = (f for f in fileiter if os.path.splitext(f)[1] == '.txt')
for txt in txtfileiter:
    print(txt)

I like os.walk():

import os

for root, dirs, files in os.walk(dir):
    for f in files:
        if os.path.splitext(f)[1] == '.txt':
            fullpath = os.path.join(root, f)
            print(fullpath)

Or with generators:

import os

fileiter = (os.path.join(root, f)
    for root, _, files in os.walk(dir)
    for f in files)
txtfileiter = (f for f in fileiter if os.path.splitext(f)[1] == '.txt')
for txt in txtfileiter:
    print(txt)

回答 7

以下是相同版本的更多版本,它们会产生稍微不同的结果:

glob.iglob()

import glob
for f in glob.iglob("/mydir/*/*.txt"): # generator, search immediate subdirectories 
    print f

glob.glob1()

print glob.glob1("/mydir", "*.tx?")  # literal_directory, basename_pattern

fnmatch.filter()

import fnmatch, os
print fnmatch.filter(os.listdir("/mydir"), "*.tx?") # include dot-files

Here’s more versions of the same that produce slightly different results:

glob.iglob()

import glob
for f in glob.iglob("/mydir/*/*.txt"): # generator, search immediate subdirectories 
    print f

glob.glob1()

print glob.glob1("/mydir", "*.tx?")  # literal_directory, basename_pattern

fnmatch.filter()

import fnmatch, os
print fnmatch.filter(os.listdir("/mydir"), "*.tx?") # include dot-files

回答 8

path.py是另一种替代方法:https : //github.com/jaraco/path.py

from path import path
p = path('/path/to/the/directory')
for f in p.files(pattern='*.txt'):
    print f

path.py is another alternative: https://github.com/jaraco/path.py

from path import path
p = path('/path/to/the/directory')
for f in p.files(pattern='*.txt'):
    print f

回答 9

Python v3.5 +

在递归函数中使用os.scandir的快速方法。在文件夹和子文件夹中搜索具有指定扩展名的所有文件。

import os

def findFilesInFolder(path, pathList, extension, subFolders = True):
    """  Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too)

    path:        Base directory to find files
    pathList:    A list that stores all paths
    extension:   File extension to find
    subFolders:  Bool.  If True, find files in all subfolders under path. If False, only searches files in the specified folder
    """

    try:   # Trapping a OSError:  File permissions problem I believe
        for entry in os.scandir(path):
            if entry.is_file() and entry.path.endswith(extension):
                pathList.append(entry.path)
            elif entry.is_dir() and subFolders:   # if its a directory, then repeat process as a nested function
                pathList = findFilesInFolder(entry.path, pathList, extension, subFolders)
    except OSError:
        print('Cannot access ' + path +'. Probably a permissions error')

    return pathList

dir_name = r'J:\myDirectory'
extension = ".txt"

pathList = []
pathList = findFilesInFolder(dir_name, pathList, extension, True)

2019年4月更新

如果要搜索包含10,000s个文件的目录,则附加到列表的效率将降低。“屈服”结果是一个更好的解决方案。我还提供了一个将输出转换为Pandas Dataframe的功能。

import os
import re
import pandas as pd
import numpy as np


def findFilesInFolderYield(path,  extension, containsTxt='', subFolders = True, excludeText = ''):
    """  Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too)

    path:               Base directory to find files
    extension:          File extension to find.  e.g. 'txt'.  Regular expression. Or  'ls\d' to match ls1, ls2, ls3 etc
    containsTxt:        List of Strings, only finds file if it contains this text.  Ignore if '' (or blank)
    subFolders:         Bool.  If True, find files in all subfolders under path. If False, only searches files in the specified folder
    excludeText:        Text string.  Ignore if ''. Will exclude if text string is in path.
    """
    if type(containsTxt) == str: # if a string and not in a list
        containsTxt = [containsTxt]

    myregexobj = re.compile('\.' + extension + '$')    # Makes sure the file extension is at the end and is preceded by a .

    try:   # Trapping a OSError or FileNotFoundError:  File permissions problem I believe
        for entry in os.scandir(path):
            if entry.is_file() and myregexobj.search(entry.path): # 

                bools = [True for txt in containsTxt if txt in entry.path and (excludeText == '' or excludeText not in entry.path)]

                if len(bools)== len(containsTxt):
                    yield entry.stat().st_size, entry.stat().st_atime_ns, entry.stat().st_mtime_ns, entry.stat().st_ctime_ns, entry.path

            elif entry.is_dir() and subFolders:   # if its a directory, then repeat process as a nested function
                yield from findFilesInFolderYield(entry.path,  extension, containsTxt, subFolders)
    except OSError as ose:
        print('Cannot access ' + path +'. Probably a permissions error ', ose)
    except FileNotFoundError as fnf:
        print(path +' not found ', fnf)

def findFilesInFolderYieldandGetDf(path,  extension, containsTxt, subFolders = True, excludeText = ''):
    """  Converts returned data from findFilesInFolderYield and creates and Pandas Dataframe.
    Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too)

    path:               Base directory to find files
    extension:          File extension to find.  e.g. 'txt'.  Regular expression. Or  'ls\d' to match ls1, ls2, ls3 etc
    containsTxt:        List of Strings, only finds file if it contains this text.  Ignore if '' (or blank)
    subFolders:         Bool.  If True, find files in all subfolders under path. If False, only searches files in the specified folder
    excludeText:        Text string.  Ignore if ''. Will exclude if text string is in path.
    """

    fileSizes, accessTimes, modificationTimes, creationTimes , paths  = zip(*findFilesInFolderYield(path,  extension, containsTxt, subFolders))
    df = pd.DataFrame({
            'FLS_File_Size':fileSizes,
            'FLS_File_Access_Date':accessTimes,
            'FLS_File_Modification_Date':np.array(modificationTimes).astype('timedelta64[ns]'),
            'FLS_File_Creation_Date':creationTimes,
            'FLS_File_PathName':paths,
                  })

    df['FLS_File_Modification_Date'] = pd.to_datetime(df['FLS_File_Modification_Date'],infer_datetime_format=True)
    df['FLS_File_Creation_Date'] = pd.to_datetime(df['FLS_File_Creation_Date'],infer_datetime_format=True)
    df['FLS_File_Access_Date'] = pd.to_datetime(df['FLS_File_Access_Date'],infer_datetime_format=True)

    return df

ext =   'txt'  # regular expression 
containsTxt=[]
path = 'C:\myFolder'
df = findFilesInFolderYieldandGetDf(path,  ext, containsTxt, subFolders = True)

Python v3.5+

Fast method using os.scandir in a recursive function. Searches for all files with a specified extension in folder and sub-folders.

import os

def findFilesInFolder(path, pathList, extension, subFolders = True):
    """  Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too)

    path:        Base directory to find files
    pathList:    A list that stores all paths
    extension:   File extension to find
    subFolders:  Bool.  If True, find files in all subfolders under path. If False, only searches files in the specified folder
    """

    try:   # Trapping a OSError:  File permissions problem I believe
        for entry in os.scandir(path):
            if entry.is_file() and entry.path.endswith(extension):
                pathList.append(entry.path)
            elif entry.is_dir() and subFolders:   # if its a directory, then repeat process as a nested function
                pathList = findFilesInFolder(entry.path, pathList, extension, subFolders)
    except OSError:
        print('Cannot access ' + path +'. Probably a permissions error')

    return pathList

dir_name = r'J:\myDirectory'
extension = ".txt"

pathList = []
pathList = findFilesInFolder(dir_name, pathList, extension, True)

Update April 2019

If you are searching over directories which contain 10,000s files, appending to a list becomes inefficient. ‘Yielding’ the results is a better solution. I have also included a function to convert the output to a Pandas Dataframe.

import os
import re
import pandas as pd
import numpy as np


def findFilesInFolderYield(path,  extension, containsTxt='', subFolders = True, excludeText = ''):
    """  Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too)

    path:               Base directory to find files
    extension:          File extension to find.  e.g. 'txt'.  Regular expression. Or  'ls\d' to match ls1, ls2, ls3 etc
    containsTxt:        List of Strings, only finds file if it contains this text.  Ignore if '' (or blank)
    subFolders:         Bool.  If True, find files in all subfolders under path. If False, only searches files in the specified folder
    excludeText:        Text string.  Ignore if ''. Will exclude if text string is in path.
    """
    if type(containsTxt) == str: # if a string and not in a list
        containsTxt = [containsTxt]

    myregexobj = re.compile('\.' + extension + '$')    # Makes sure the file extension is at the end and is preceded by a .

    try:   # Trapping a OSError or FileNotFoundError:  File permissions problem I believe
        for entry in os.scandir(path):
            if entry.is_file() and myregexobj.search(entry.path): # 

                bools = [True for txt in containsTxt if txt in entry.path and (excludeText == '' or excludeText not in entry.path)]

                if len(bools)== len(containsTxt):
                    yield entry.stat().st_size, entry.stat().st_atime_ns, entry.stat().st_mtime_ns, entry.stat().st_ctime_ns, entry.path

            elif entry.is_dir() and subFolders:   # if its a directory, then repeat process as a nested function
                yield from findFilesInFolderYield(entry.path,  extension, containsTxt, subFolders)
    except OSError as ose:
        print('Cannot access ' + path +'. Probably a permissions error ', ose)
    except FileNotFoundError as fnf:
        print(path +' not found ', fnf)

def findFilesInFolderYieldandGetDf(path,  extension, containsTxt, subFolders = True, excludeText = ''):
    """  Converts returned data from findFilesInFolderYield and creates and Pandas Dataframe.
    Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too)

    path:               Base directory to find files
    extension:          File extension to find.  e.g. 'txt'.  Regular expression. Or  'ls\d' to match ls1, ls2, ls3 etc
    containsTxt:        List of Strings, only finds file if it contains this text.  Ignore if '' (or blank)
    subFolders:         Bool.  If True, find files in all subfolders under path. If False, only searches files in the specified folder
    excludeText:        Text string.  Ignore if ''. Will exclude if text string is in path.
    """

    fileSizes, accessTimes, modificationTimes, creationTimes , paths  = zip(*findFilesInFolderYield(path,  extension, containsTxt, subFolders))
    df = pd.DataFrame({
            'FLS_File_Size':fileSizes,
            'FLS_File_Access_Date':accessTimes,
            'FLS_File_Modification_Date':np.array(modificationTimes).astype('timedelta64[ns]'),
            'FLS_File_Creation_Date':creationTimes,
            'FLS_File_PathName':paths,
                  })

    df['FLS_File_Modification_Date'] = pd.to_datetime(df['FLS_File_Modification_Date'],infer_datetime_format=True)
    df['FLS_File_Creation_Date'] = pd.to_datetime(df['FLS_File_Creation_Date'],infer_datetime_format=True)
    df['FLS_File_Access_Date'] = pd.to_datetime(df['FLS_File_Access_Date'],infer_datetime_format=True)

    return df

ext =   'txt'  # regular expression 
containsTxt=[]
path = 'C:\myFolder'
df = findFilesInFolderYieldandGetDf(path,  ext, containsTxt, subFolders = True)

回答 10

Python具有执行此操作的所有工具:

import os

the_dir = 'the_dir_that_want_to_search_in'
all_txt_files = filter(lambda x: x.endswith('.txt'), os.listdir(the_dir))

Python has all tools to do this:

import os

the_dir = 'the_dir_that_want_to_search_in'
all_txt_files = filter(lambda x: x.endswith('.txt'), os.listdir(the_dir))

回答 11

要以Python方式将“ dataPath”文件夹中的所有“ .txt”文件名作为列表获取:

from os import listdir
from os.path import isfile, join
path = "/dataPath/"
onlyTxtFiles = [f for f in listdir(path) if isfile(join(path, f)) and  f.endswith(".txt")]
print onlyTxtFiles

To get all ‘.txt’ file names inside ‘dataPath’ folder as a list in a Pythonic way:

from os import listdir
from os.path import isfile, join
path = "/dataPath/"
onlyTxtFiles = [f for f in listdir(path) if isfile(join(path, f)) and  f.endswith(".txt")]
print onlyTxtFiles

回答 12

试试这个,这将递归找到所有文件:

import glob, os
os.chdir("H:\\wallpaper")# use whatever directory you want

#double\\ no single \

for file in glob.glob("**/*.txt", recursive = True):
    print(file)

Try this this will find all your files recursively:

import glob, os
os.chdir("H:\\wallpaper")# use whatever directory you want

#double\\ no single \

for file in glob.glob("**/*.txt", recursive = True):
    print(file)

回答 13

import os
import sys 

if len(sys.argv)==2:
    print('no params')
    sys.exit(1)

dir = sys.argv[1]
mask= sys.argv[2]

files = os.listdir(dir); 

res = filter(lambda x: x.endswith(mask), files); 

print res
import os
import sys 

if len(sys.argv)==2:
    print('no params')
    sys.exit(1)

dir = sys.argv[1]
mask= sys.argv[2]

files = os.listdir(dir); 

res = filter(lambda x: x.endswith(mask), files); 

print res

回答 14

我做了一个测试(Python 3.6.4,W7x64),看哪个解决方案对于一个文件夹(没有子目录)最快,以获得具有特定扩展名的文件的完整文件路径列表。

要长话短说,这个任务os.listdir()是最快的是1.7倍的速度作为下一个最好的:os.walk()(!休息后),2.7倍一样快pathlib,3.2倍的速度比os.scandir()和3.3倍的速度比glob
请记住,当您需要递归结果时,这些结果将改变。如果您复制/粘贴以下一种方法,请添加.lower(),否则在搜索.ext时找不到.EXT。

import os
import pathlib
import timeit
import glob

def a():
    path = pathlib.Path().cwd()
    list_sqlite_files = [str(f) for f in path.glob("*.sqlite")]

def b(): 
    path = os.getcwd()
    list_sqlite_files = [f.path for f in os.scandir(path) if os.path.splitext(f)[1] == ".sqlite"]

def c():
    path = os.getcwd()
    list_sqlite_files = [os.path.join(path, f) for f in os.listdir(path) if f.endswith(".sqlite")]

def d():
    path = os.getcwd()
    os.chdir(path)
    list_sqlite_files = [os.path.join(path, f) for f in glob.glob("*.sqlite")]

def e():
    path = os.getcwd()
    list_sqlite_files = [os.path.join(path, f) for f in glob.glob1(str(path), "*.sqlite")]

def f():
    path = os.getcwd()
    list_sqlite_files = []
    for root, dirs, files in os.walk(path):
        for file in files:
            if file.endswith(".sqlite"):
                list_sqlite_files.append( os.path.join(root, file) )
        break



print(timeit.timeit(a, number=1000))
print(timeit.timeit(b, number=1000))
print(timeit.timeit(c, number=1000))
print(timeit.timeit(d, number=1000))
print(timeit.timeit(e, number=1000))
print(timeit.timeit(f, number=1000))

结果:

# Python 3.6.4
0.431
0.515
0.161
0.548
0.537
0.274

I did a test (Python 3.6.4, W7x64) to see which solution is the fastest for one folder, no subdirectories, to get a list of complete file paths for files with a specific extension.

To make it short, for this task os.listdir() is the fastest and is 1.7x as fast as the next best: os.walk() (with a break!), 2.7x as fast as pathlib, 3.2x faster than os.scandir() and 3.3x faster than glob.
Please keep in mind, that those results will change when you need recursive results. If you copy/paste one method below, please add a .lower() otherwise .EXT would not be found when searching for .ext.

import os
import pathlib
import timeit
import glob

def a():
    path = pathlib.Path().cwd()
    list_sqlite_files = [str(f) for f in path.glob("*.sqlite")]

def b(): 
    path = os.getcwd()
    list_sqlite_files = [f.path for f in os.scandir(path) if os.path.splitext(f)[1] == ".sqlite"]

def c():
    path = os.getcwd()
    list_sqlite_files = [os.path.join(path, f) for f in os.listdir(path) if f.endswith(".sqlite")]

def d():
    path = os.getcwd()
    os.chdir(path)
    list_sqlite_files = [os.path.join(path, f) for f in glob.glob("*.sqlite")]

def e():
    path = os.getcwd()
    list_sqlite_files = [os.path.join(path, f) for f in glob.glob1(str(path), "*.sqlite")]

def f():
    path = os.getcwd()
    list_sqlite_files = []
    for root, dirs, files in os.walk(path):
        for file in files:
            if file.endswith(".sqlite"):
                list_sqlite_files.append( os.path.join(root, file) )
        break



print(timeit.timeit(a, number=1000))
print(timeit.timeit(b, number=1000))
print(timeit.timeit(c, number=1000))
print(timeit.timeit(d, number=1000))
print(timeit.timeit(e, number=1000))
print(timeit.timeit(f, number=1000))

Results:

# Python 3.6.4
0.431
0.515
0.161
0.548
0.537
0.274

回答 15

此代码使我的生活更简单。

import os
fnames = ([file for root, dirs, files in os.walk(dir)
    for file in files
    if file.endswith('.txt') #or file.endswith('.png') or file.endswith('.pdf')
    ])
for fname in fnames: print(fname)

This code makes my life simpler.

import os
fnames = ([file for root, dirs, files in os.walk(dir)
    for file in files
    if file.endswith('.txt') #or file.endswith('.png') or file.endswith('.pdf')
    ])
for fname in fnames: print(fname)

回答 16

使用fnmatch:https : //docs.python.org/2/library/fnmatch.html

import fnmatch
import os

for file in os.listdir('.'):
    if fnmatch.fnmatch(file, '*.txt'):
        print file

Use fnmatch: https://docs.python.org/2/library/fnmatch.html

import fnmatch
import os

for file in os.listdir('.'):
    if fnmatch.fnmatch(file, '*.txt'):
        print file

回答 17

为了从同一目录中名为“ data”的文件夹中获取“ .txt”文件名的数组,我通常使用以下简单代码行:

import os
fileNames = [fileName for fileName in os.listdir("data") if fileName.endswith(".txt")]

To get an array of “.txt” file names from a folder called “data” in the same directory I usually use this simple line of code:

import os
fileNames = [fileName for fileName in os.listdir("data") if fileName.endswith(".txt")]

回答 18

我建议您使用fnmatch和upper方法。这样,您可以找到以下任意一项:

  1. 名称。txt ;
  2. 名称。TXT ;
  3. 名称。文本文件

import fnmatch
import os

    for file in os.listdir("/Users/Johnny/Desktop/MyTXTfolder"):
        if fnmatch.fnmatch(file.upper(), '*.TXT'):
            print(file)

I suggest you to use fnmatch and the upper method. In this way you can find any of the following:

  1. Name.txt;
  2. Name.TXT;
  3. Name.Txt

.

import fnmatch
import os

    for file in os.listdir("/Users/Johnny/Desktop/MyTXTfolder"):
        if fnmatch.fnmatch(file.upper(), '*.TXT'):
            print(file)

回答 19

这是一个 extend()

types = ('*.jpg', '*.png')
images_list = []
for files in types:
    images_list.extend(glob.glob(os.path.join(path, files)))

Here’s one with extend()

types = ('*.jpg', '*.png')
images_list = []
for files in types:
    images_list.extend(glob.glob(os.path.join(path, files)))

回答 20

带有子目录的功能解决方案:

from fnmatch import filter
from functools import partial
from itertools import chain
from os import path, walk

print(*chain(*(map(partial(path.join, root), filter(filenames, "*.txt")) for root, _, filenames in walk("mydir"))))

Functional solution with sub-directories:

from fnmatch import filter
from functools import partial
from itertools import chain
from os import path, walk

print(*chain(*(map(partial(path.join, root), filter(filenames, "*.txt")) for root, _, filenames in walk("mydir"))))

回答 21

如果文件夹包含很多文件或内存是一个限制,请考虑使用生成器:

def yield_files_with_extensions(folder_path, file_extension):
   for _, _, files in os.walk(folder_path):
       for file in files:
           if file.endswith(file_extension):
               yield file

选项A:重复

for f in yield_files_with_extensions('.', '.txt'): 
    print(f)

选项B:全部获取

files = [f for f in yield_files_with_extensions('.', '.txt')]

In case the folder contains a lot of files or memory is an constraint, consider using generators:

def yield_files_with_extensions(folder_path, file_extension):
   for _, _, files in os.walk(folder_path):
       for file in files:
           if file.endswith(file_extension):
               yield file

Option A: Iterate

for f in yield_files_with_extensions('.', '.txt'): 
    print(f)

Option B: Get all

files = [f for f in yield_files_with_extensions('.', '.txt')]

回答 22

可复制的解决方案,类似于ghostdog之一:

def get_all_filepaths(root_path, ext):
    """
    Search all files which have a given extension within root_path.

    This ignores the case of the extension and searches subdirectories, too.

    Parameters
    ----------
    root_path : str
    ext : str

    Returns
    -------
    list of str

    Examples
    --------
    >>> get_all_filepaths('/run', '.lock')
    ['/run/unattended-upgrades.lock',
     '/run/mlocate.daily.lock',
     '/run/xtables.lock',
     '/run/mysqld/mysqld.sock.lock',
     '/run/postgresql/.s.PGSQL.5432.lock',
     '/run/network/.ifstate.lock',
     '/run/lock/asound.state.lock']
    """
    import os
    all_files = []
    for root, dirs, files in os.walk(root_path):
        for filename in files:
            if filename.lower().endswith(ext):
                all_files.append(os.path.join(root, filename))
    return all_files

A copy-pastable solution similar to the one of ghostdog:

def get_all_filepaths(root_path, ext):
    """
    Search all files which have a given extension within root_path.

    This ignores the case of the extension and searches subdirectories, too.

    Parameters
    ----------
    root_path : str
    ext : str

    Returns
    -------
    list of str

    Examples
    --------
    >>> get_all_filepaths('/run', '.lock')
    ['/run/unattended-upgrades.lock',
     '/run/mlocate.daily.lock',
     '/run/xtables.lock',
     '/run/mysqld/mysqld.sock.lock',
     '/run/postgresql/.s.PGSQL.5432.lock',
     '/run/network/.ifstate.lock',
     '/run/lock/asound.state.lock']
    """
    import os
    all_files = []
    for root, dirs, files in os.walk(root_path):
        for filename in files:
            if filename.lower().endswith(ext):
                all_files.append(os.path.join(root, filename))
    return all_files

回答 23

使用Python OS模块查找具有特定扩展名的文件。

简单的例子在这里:

import os

# This is the path where you want to search
path = r'd:'  

# this is extension you want to detect
extension = '.txt'   # this can be : .jpg  .png  .xls  .log .....

for root, dirs_list, files_list in os.walk(path):
    for file_name in files_list:
        if os.path.splitext(file_name)[-1] == extension:
            file_name_path = os.path.join(root, file_name)
            print file_name
            print file_name_path   # This is the full path of the filter file

use Python OS module to find files with specific extension.

the simple example is here :

import os

# This is the path where you want to search
path = r'd:'  

# this is extension you want to detect
extension = '.txt'   # this can be : .jpg  .png  .xls  .log .....

for root, dirs_list, files_list in os.walk(path):
    for file_name in files_list:
        if os.path.splitext(file_name)[-1] == extension:
            file_name_path = os.path.join(root, file_name)
            print file_name
            print file_name_path   # This is the full path of the filter file

回答 24

许多用户回答了os.walk答案,其中包括所有文件,还包括所有目录和子目录及其文件。

import os


def files_in_dir(path, extension=''):
    """
       Generator: yields all of the files in <path> ending with
       <extension>

       \param   path       Absolute or relative path to inspect,
       \param   extension  [optional] Only yield files matching this,

       \yield              [filenames]
    """


    for _, dirs, files in os.walk(path):
        dirs[:] = []  # do not recurse directories.
        yield from [f for f in files if f.endswith(extension)]

# Example: print all the .py files in './python'
for filename in files_in_dir('./python', '*.py'):
    print("-", filename)

或者在不需要生成器的情况下停下来:

path, ext = "./python", ext = ".py"
for _, _, dirfiles in os.walk(path):
    matches = (f for f in dirfiles if f.endswith(ext))
    break

for filename in matches:
    print("-", filename)

如果要将匹配用于其他内容,则可能要使其成为列表而不是生成器表达式:

    matches = [f for f in dirfiles if f.endswith(ext)]

Many users have replied with os.walk answers, which includes all files but also all directories and subdirectories and their files.

import os


def files_in_dir(path, extension=''):
    """
       Generator: yields all of the files in <path> ending with
       <extension>

       \param   path       Absolute or relative path to inspect,
       \param   extension  [optional] Only yield files matching this,

       \yield              [filenames]
    """


    for _, dirs, files in os.walk(path):
        dirs[:] = []  # do not recurse directories.
        yield from [f for f in files if f.endswith(extension)]

# Example: print all the .py files in './python'
for filename in files_in_dir('./python', '*.py'):
    print("-", filename)

Or for a one off where you don’t need a generator:

path, ext = "./python", ext = ".py"
for _, _, dirfiles in os.walk(path):
    matches = (f for f in dirfiles if f.endswith(ext))
    break

for filename in matches:
    print("-", filename)

If you are going to use matches for something else, you may want to make it a list rather than a generator expression:

    matches = [f for f in dirfiles if f.endswith(ext)]

回答 25

使用forloop的简单方法:

import os

dir = ["e","x","e"]

p = os.listdir('E:')  #path

for n in range(len(p)):
   name = p[n]
   myfile = [name[-3],name[-2],name[-1]]  #for .txt
   if myfile == dir :
      print(name)
   else:
      print("nops")

虽然这可以使它更加笼统。

A simple method by using for loop :

import os

dir = ["e","x","e"]

p = os.listdir('E:')  #path

for n in range(len(p)):
   name = p[n]
   myfile = [name[-3],name[-2],name[-1]]  #for .txt
   if myfile == dir :
      print(name)
   else:
      print("nops")

Though this can be made more generalised .


如何获取字符的ASCII值

问题:如何获取字符的ASCII值

如何在Python中获取字符的ASCII值?int

How do I get the ASCII value of a character as an int in Python?


回答 0

这里

函数ord()将获取char的int值。如果您想在玩完数字后再转换回去,可以使用chr()函数来解决。

>>> ord('a')
97
>>> chr(97)
'a'
>>> chr(ord('a') + 3)
'd'
>>>

在Python 2中,还有一个unichr函数,返回其序数为参数的Unicode字符unichr

>>> unichr(97)
u'a'
>>> unichr(1234)
u'\u04d2'

在Python 3中,您可以使用chr代替unichr


ord()-Python 3.6.5rc1文档

ord()-Python 2.7.14文档

From here:

function ord() would get the int value of the char. And in case you want to convert back after playing with the number, function chr() does the trick.

>>> ord('a')
97
>>> chr(97)
'a'
>>> chr(ord('a') + 3)
'd'
>>>

In Python 2, there is also the unichr function, returning the Unicode character whose ordinal is the unichr argument:

>>> unichr(97)
u'a'
>>> unichr(1234)
u'\u04d2'

In Python 3 you can use chr instead of unichr.


ord() – Python 3.6.5rc1 documentation

ord() – Python 2.7.14 documentation


回答 1

请注意,这ord()本身并不能提供ASCII值;它以任何编码形式为您提供字符的数值。因此,ord('ä')如果您使用的是Latin-1,则结果可以为228,如果您使用的TypeError是UTF-8 ,则其结果可以为a 。如果您传递一个unicode,它甚至可以返回Unicode代码点:

>>> ord(u'あ')
12354

Note that ord() doesn’t give you the ASCII value per se; it gives you the numeric value of the character in whatever encoding it’s in. Therefore the result of ord('ä') can be 228 if you’re using Latin-1, or it can raise a TypeError if you’re using UTF-8. It can even return the Unicode codepoint instead if you pass it a unicode:

>>> ord(u'あ')
12354

回答 2

您正在寻找:

ord()

You are looking for:

ord()

回答 3

公认的答案是正确的,但是如果您需要一次将一大堆ASCII字符转换为它们的ASCII代码,则可以采用一种更聪明/更有效的方法。而不是做:

for ch in mystr:
    code = ord(ch)

或稍快:

for code in map(ord, mystr):

您将转换为直接对代码进行迭代的Python本机类​​型。在Python 3上,这很简单:

for code in mystr.encode('ascii'):

在Python 2.6 / 2.7上,它涉及的只是一点点,因为它没有Py3样式的bytes对象(bytes是的别名str,它是按字符迭代的),但是它们确实有bytearray

# If mystr is definitely str, not unicode
for code in bytearray(mystr):

# If mystr could be either str or unicode
for code in bytearray(mystr, 'ascii'):

编码为按序本机迭代的类型意味着转换要快得多。在Py2.7和Py3.5的本地测试中,使用进行迭代str以获取其ASCII码map(ord, mystr)开始关闭需要大约两倍的时间为len10 str比使用bytearray(mystr)上的Py2或mystr.encode('ascii')在PY3,并作为str变长,乘数支付map(ord, mystr)上升至〜6.5x-7x。

唯一的缺点是转换是一次完成的,因此您的第一个结果可能会花费更长的时间,而真正巨大的str临时结果会成比例地增加bytes /bytearray,但是除非迫使您进入页面崩溃状态,否则这无关紧要。

The accepted answer is correct, but there is a more clever/efficient way to do this if you need to convert a whole bunch of ASCII characters to their ASCII codes at once. Instead of doing:

for ch in mystr:
    code = ord(ch)

or the slightly faster:

for code in map(ord, mystr):

you convert to Python native types that iterate the codes directly. On Python 3, it’s trivial:

for code in mystr.encode('ascii'):

and on Python 2.6/2.7, it’s only slightly more involved because it doesn’t have a Py3 style bytes object (bytes is an alias for str, which iterates by character), but they do have bytearray:

# If mystr is definitely str, not unicode
for code in bytearray(mystr):

# If mystr could be either str or unicode
for code in bytearray(mystr, 'ascii'):

Encoding as a type that natively iterates by ordinal means the conversion goes much faster; in local tests on both Py2.7 and Py3.5, iterating a str to get its ASCII codes using map(ord, mystr) starts off taking about twice as long for a len 10 str than using bytearray(mystr) on Py2 or mystr.encode('ascii') on Py3, and as the str gets longer, the multiplier paid for map(ord, mystr) rises to ~6.5x-7x.

The only downside is that the conversion is all at once, so your first result might take a little longer, and a truly enormous str would have a proportionately large temporary bytes/bytearray, but unless this forces you into page thrashing, this isn’t likely to matter.


回答 4

要获取字符的ASCII码,您可以使用 ord()函数。

这是示例代码:

value = input("Your value here: ")
list=[ord(ch) for ch in value]
print(list)

输出:

Your value here: qwerty
[113, 119, 101, 114, 116, 121]

To get the ASCII code of a character, you can use the ord() function.

Here is an example code:

value = input("Your value here: ")
list=[ord(ch) for ch in value]
print(list)

Output:

Your value here: qwerty
[113, 119, 101, 114, 116, 121]

如何在Python中反转列表?

问题:如何在Python中反转列表?

如何在Python中执行以下操作?

array = [0, 10, 20, 40]
for (i = array.length() - 1; i >= 0; i--)

我需要一个数组的元素,但是要从头到尾。

How can I do the following in Python?

array = [0, 10, 20, 40]
for (i = array.length() - 1; i >= 0; i--)

I need to have the elements of an array, but from the end to the beginning.


回答 0

您可以通过以下方式使用该reversed函数:

>>> array=[0,10,20,40]
>>> for i in reversed(array):
...     print(i)

请注意,reversed(...)它不会返回列表。您可以使用来获得反向列表list(reversed(array))

You can make use of the reversed function for this as:

>>> array=[0,10,20,40]
>>> for i in reversed(array):
...     print(i)

Note that reversed(...) does not return a list. You can get a reversed list using list(reversed(array)).


回答 1

>>> L = [0,10,20,40]
>>> L[::-1]
[40, 20, 10, 0]

扩展片语法在Python 新增功能条目中得到了很好的解释2.3.5

根据注释中的特殊要求,这是最新的slice文档

>>> L = [0,10,20,40]
>>> L[::-1]
[40, 20, 10, 0]

Extended slice syntax is explained well in the Python What’s new Entry for release 2.3.5

By special request in a comment this is the most current slice documentation.


回答 2

>>> L = [0,10,20,40]
>>> L.reverse()
>>> L
[40, 20, 10, 0]

要么

>>> L[::-1]
[40, 20, 10, 0]
>>> L = [0,10,20,40]
>>> L.reverse()
>>> L
[40, 20, 10, 0]

Or

>>> L[::-1]
[40, 20, 10, 0]

回答 3

这是复制列表:

L = [0,10,20,40]
p = L[::-1]  #  Here p will be having reversed list

这是就地反转列表:

L.reverse() # Here L will be reversed in-place (no new list made)

This is to duplicate the list:

L = [0,10,20,40]
p = L[::-1]  #  Here p will be having reversed list

This is to reverse the list in-place:

L.reverse() # Here L will be reversed in-place (no new list made)

回答 4

我认为在Python中反转列表的最好方法是:

a = [1,2,3,4]
a = a[::-1]
print(a)
>>> [4,3,2,1]

该工作已完成,现在您有一个反向列表。

I think that the best way to reverse a list in Python is to do:

a = [1,2,3,4]
a = a[::-1]
print(a)
>>> [4,3,2,1]

The job is done, and now you have a reversed list.


回答 5

要反转相同列表,请使用:

array.reverse()

要将反向列表分配给其他列表,请使用:

newArray = array[::-1] 

For reversing the same list use:

array.reverse()

To assign reversed list into some other list use:

newArray = array[::-1] 

回答 6

使用切片,例如array = array [::-1]是一个巧妙的技巧,非常具有Python风格,但是对于新手来说可能有些晦涩。使用reverse()方法是日常编码的好方法,因为它易于阅读。

但是,如果像面试问题中那样需要在适当的位置反转列表,则可能无法使用此类内置方法。面试官将着眼于您如何解决问题,而不是深入了解Python知识,这需要一种算法方法。下面的示例使用经典交换,可能是实现此目的的一种方法:-

def reverse_in_place(lst):      # Declare a function
    size = len(lst)             # Get the length of the sequence
    hiindex = size - 1
    its = size/2                # Number of iterations required
    for i in xrange(0, its):    # i is the low index pointer
        temp = lst[hiindex]     # Perform a classic swap
        lst[hiindex] = lst[i]
        lst[i] = temp
        hiindex -= 1            # Decrement the high index pointer
    print "Done!"

# Now test it!!
array = [2, 5, 8, 9, 12, 19, 25, 27, 32, 60, 65, 1, 7, 24, 124, 654]

print array                    # Print the original sequence
reverse_in_place(array)        # Call the function passing the list
print array                    # Print reversed list


**The result:**
[2, 5, 8, 9, 12, 19, 25, 27, 32, 60, 65, 1, 7, 24, 124, 654]
Done!
[654, 124, 24, 7, 1, 65, 60, 32, 27, 25, 19, 12, 9, 8, 5, 2]

请注意,这不适用于元组或字符串序列,因为字符串和元组是不可变的,即,您无法写入它们来更改元素。

Using slicing, e.g. array = array[::-1], is a neat trick and very Pythonic, but a little obscure for newbies maybe. Using the reverse() method is a good way to go in day to day coding because it is easily readable.

However, if you need to reverse a list in place as in an interview question, you will likely not be able to use built in methods like these. The interviewer will be looking at how you approach the problem rather than the depth of Python knowledge, an algorithmic approach is required. The following example, using a classic swap, might be one way to do it:-

def reverse_in_place(lst):      # Declare a function
    size = len(lst)             # Get the length of the sequence
    hiindex = size - 1
    its = size/2                # Number of iterations required
    for i in xrange(0, its):    # i is the low index pointer
        temp = lst[hiindex]     # Perform a classic swap
        lst[hiindex] = lst[i]
        lst[i] = temp
        hiindex -= 1            # Decrement the high index pointer
    print "Done!"

# Now test it!!
array = [2, 5, 8, 9, 12, 19, 25, 27, 32, 60, 65, 1, 7, 24, 124, 654]

print array                    # Print the original sequence
reverse_in_place(array)        # Call the function passing the list
print array                    # Print reversed list


**The result:**
[2, 5, 8, 9, 12, 19, 25, 27, 32, 60, 65, 1, 7, 24, 124, 654]
Done!
[654, 124, 24, 7, 1, 65, 60, 32, 27, 25, 19, 12, 9, 8, 5, 2]

Note that this will not work on Tuples or string sequences, because strings and tuples are immutable, i.e., you cannot write into them to change elements.


回答 7

我发现(与其他建议相反)l.reverse()到目前为止,这是在Python 3和2中反转一长串列表的最快方法。我很想知道其他人是否可以复制这些计时。

l[::-1]可能较慢,因为它会在反转之前复制列表。在由list()进行的迭代器周围添加调用reversed(l)必须增加一些开销。当然,如果您想要列表的副本或迭代器,则可以使用相应的方法,但是,如果您只想反转列表,那么这l.reverse()似乎是最快的方法。

功能

def rev_list1(l):
    return l[::-1]

def rev_list2(l):
    return list(reversed(l))

def rev_list3(l):
    l.reverse()
    return l

清单

l = list(range(1000000))

Python 3.5计时

timeit(lambda: rev_list1(l), number=1000)
# 6.48
timeit(lambda: rev_list2(l), number=1000)
# 7.13
timeit(lambda: rev_list3(l), number=1000)
# 0.44

Python 2.7计时

timeit(lambda: rev_list1(l), number=1000)
# 6.76
timeit(lambda: rev_list2(l), number=1000)
# 9.18
timeit(lambda: rev_list3(l), number=1000)
# 0.46

I find (contrary to some other suggestions) that l.reverse() is by far the fastest way to reverse a long list in Python 3 and 2. I’d be interested to know if others can replicate these timings.

l[::-1] is probably slower because it copies the list prior to reversing it. Adding the list() call around the iterator made by reversed(l) must add some overhead. Of course if you want a copy of the list or an iterator then use those respective methods, but if you want to just reverse the list then l.reverse() seems to be the fastest way.

Functions

def rev_list1(l):
    return l[::-1]

def rev_list2(l):
    return list(reversed(l))

def rev_list3(l):
    l.reverse()
    return l

List

l = list(range(1000000))

Python 3.5 timings

timeit(lambda: rev_list1(l), number=1000)
# 6.48
timeit(lambda: rev_list2(l), number=1000)
# 7.13
timeit(lambda: rev_list3(l), number=1000)
# 0.44

Python 2.7 timings

timeit(lambda: rev_list1(l), number=1000)
# 6.76
timeit(lambda: rev_list2(l), number=1000)
# 9.18
timeit(lambda: rev_list3(l), number=1000)
# 0.46

回答 8

for x in array[::-1]:
    do stuff
for x in array[::-1]:
    do stuff

回答 9

reversedlist

>>> list1 = [1,2,3]
>>> reversed_list = list(reversed(list1))
>>> reversed_list
>>> [3, 2, 1]

With reversed and list:

>>> list1 = [1,2,3]
>>> reversed_list = list(reversed(list1))
>>> reversed_list
>>> [3, 2, 1]

回答 10

array=[0,10,20,40]
for e in reversed(array):
  print e
array=[0,10,20,40]
for e in reversed(array):
  print e

回答 11

使用reversed(array)可能是最佳途径。

>>> array = [1,2,3,4]
>>> for item in reversed(array):
>>>     print item

您是否需要了解如何在不使用内置的情况下实现此目标reversed

def reverse(a):
    midpoint = len(a)/2
    for item in a[:midpoint]:
        otherside = (len(a) - a.index(item)) - 1
        temp = a[otherside]
        a[otherside] = a[a.index(item)]
        a[a.index(item)] = temp
    return a

这需要O(N)时间。

Using reversed(array) would be the likely best route.

>>> array = [1,2,3,4]
>>> for item in reversed(array):
>>>     print item

Should you need to understand how could implement this without using the built in reversed.

def reverse(a):
    midpoint = len(a)/2
    for item in a[:midpoint]:
        otherside = (len(a) - a.index(item)) - 1
        temp = a[otherside]
        a[otherside] = a[a.index(item)]
        a[a.index(item)] = temp
    return a

This should take O(N) time.


回答 12

如果要将反向列表的元素存储在其他变量中,则可以使用revArray = array[::-1]revArray = list(reversed(array))

但是第一个变体要快一些:

z = range(1000000)
startTimeTic = time.time()
y = z[::-1]
print("Time: %s s" % (time.time() - startTimeTic))

f = range(1000000)
startTimeTic = time.time()
g = list(reversed(f))
print("Time: %s s" % (time.time() - startTimeTic))

输出:

Time: 0.00489711761475 s
Time: 0.00609302520752 s

If you want to store the elements of reversed list in some other variable, then you can use revArray = array[::-1] or revArray = list(reversed(array)).

But the first variant is slightly faster:

z = range(1000000)
startTimeTic = time.time()
y = z[::-1]
print("Time: %s s" % (time.time() - startTimeTic))

f = range(1000000)
startTimeTic = time.time()
g = list(reversed(f))
print("Time: %s s" % (time.time() - startTimeTic))

Output:

Time: 0.00489711761475 s
Time: 0.00609302520752 s

回答 13

组织值:

在Python中,列表的顺序也可以通过sort操作,以数字/字母顺序组织变量:

暂时:

print(sorted(my_list))

常驻:

my_list.sort(), print(my_list)

您可以使用标志“ reverse = True”进行排序:

print(sorted(my_list, reverse=True))

要么

my_list.sort(reverse=True), print(my_list)

没有组织

也许您不想对值进行排序,而只反转值。然后我们可以这样做:

print(list(reversed(my_list)))

** 数字按字母顺序排列优先于字母。Python价值观的组织很棒。

ORGANIZING VALUES:

In Python, lists’ order too can be manipulated with sort, organizing your variables in numerical/alphabetical order:

Temporarily:

print(sorted(my_list))

Permanent:

my_list.sort(), print(my_list)

You can sort with the flag “reverse=True”:

print(sorted(my_list, reverse=True))

or

my_list.sort(reverse=True), print(my_list)

WITHOUT ORGANIZING

Maybe you do not want to sort values, but only reverse the values. Then we can do it like this:

print(list(reversed(my_list)))

**Numbers have priority over alphabet in listing order. The Python values’ organization is awesome.


回答 14

带有解释和计时结果的方法摘要

有几个很好的答案,但是分布广泛,并且大多数没有指出每种方法的根本区别。

总体而言,最好使用内置函数/方法来进行反转,就像几乎所有函数一样。在这种情况下,与手动创建索引方法相比,它们在短列表(10个项目)上的速度大约快2到8倍,而在长列表上的速度快约300倍以上。这是有道理的,因为他们有专家来创建,检查和优化。它们还不太容易出现缺陷,并且更有可能处理边缘和角落情况。

还考虑是否要:

  1. 反转现有清单
    • 最好的解决方法是“ object.reverse()”方法
  2. 创建一个与列表相反的迭代器(因为您要将其馈送到for循环,生成器等)。
    • 最好的解决方案是’reversed(object)`,它可以创建迭代器
  3. 或创建相反顺序的完整副本
    • 最佳解决方案是使用-1步长的切片: object[::-1]

测试脚本

这是我所测试方法的开始。将这个答案中的所有代码片段放在一起,以创建一个脚本,该脚本将运行所有不同的方式来反转列表和时间(上一节中显示的输出)。

from timeit import timeit
from copy import copy

def time_str_ms(t):
    return '{0:8.2f} ms'.format(t * 1000)

方法1:使用obj.reverse()进行适当的反向

如果目标只是颠倒现有列表中项目的顺序,而不要遍历它们或使副本可用,请使用此<list>.reverse()功能。直接在列表对象上运行此命令,所有项目的顺序将颠倒:

请注意,以下内容将反转给定的原始变量,即使它也返回已反转的列表。即,您可以使用此函数输出创建副本。通常,您不会为此创建函数,但是我这样做是为了在最后使用时序代码。

我们将测试这两种方式的性能-首先只是就地反转一个列表(更改原始列表),然后复制该列表然后将其反转。

def rev_in_place(mylist):
    mylist.reverse()
    return mylist

def rev_copy_reverse(mylist):
    a = copy(mylist)
    a.reverse()
    return a

方法2:使用切片反向列表 obj[::-1]

内置的索引切片方法使您可以复制任何索引对象的一部分。

  • 它不影响原始对象
  • 它建立一个完整的列表,而不是一个迭代器

通用语法为:<object>[first_index:last_index:step]。要利用切片来创建简单的反向列表,请使用:<list>[::-1]。将选项保留为空时,它将其设置为对象的第一个和最后一个元素的默认值(如果步长为负,则相反)。

索引允许使用负数,该负数从对象索引的末尾开始倒数(即-2是倒数第二个项目)。当步长为负数时,它将从最后一项开始,并以该数量向后索引。有一些与此相关的开始-结束逻辑已被优化。

def rev_slice(mylist):
    a = mylist[::-1]
    return a

方法3:使用reversed(obj)迭代器功能反转列表

有一个reversed(indexed_object)功能:

  • 这将创建反向索引迭代器,而不是列表。如果您将其馈入循环以在大型列表中获得更好的性能,那就太好了
  • 这将创建一个副本,并且不会影响原始对象

使用原始迭代器进行测试,并从迭代器创建列表。

def reversed_iterator(mylist):
    a = reversed(mylist)
    return a

def reversed_with_list(mylist):
    a = list(reversed(mylist))
    return a

方法4:具有自定义/手动索引的反向列表

正如时间将显示的那样,创建自己的索引编制方法不是一个好主意。除非需要执行一些真正的自定义操作,否则请使用内置方法。

也就是说,列表大小较小不会带来很大的损失,但是当您扩大规模时,损失会变得很大。我确定下面的代码可以优化,但是我会坚持使用内置方法。

def rev_manual_pos_gen(mylist):
    max_index = len(mylist) - 1
    return [ mylist[max_index - index] for index in range(len(mylist)) ]

def rev_manual_neg_gen(mylist):
    ## index is 0 to 9, but we need -1 to -10
    return [ mylist[-index-1] for index in range(len(mylist)) ]

def rev_manual_index_loop(mylist):
    a = []
    reverse_index = len(mylist) - 1
    for index in range(len(mylist)):
        a.append(mylist[reverse_index - index])
    return a

def rev_manual_loop(mylist):
    a = []
    reverse_index = len(mylist)
    for index, _ in enumerate(mylist):
        reverse_index -= 1
        a.append(mylist[reverse_index])
    return a

定时每种方法

接下来是脚本的其余部分,以计时每种反转方法的时间。它显示obj.reverse()了使用reversed(obj)迭代器原地反转和创建迭代器始终是最快的,而使用切片是创建副本的最快方法。

事实证明,除非必须这样做,否则不要尝试创建自己的方式!

loops_to_test = 100000
number_of_items = 10
list_to_reverse = list(range(number_of_items))
if number_of_items < 15:
    print("a: {}".format(list_to_reverse))
print('Loops: {:,}'.format(loops_to_test))
# List of the functions we want to test with the timer, in print order
fcns = [rev_in_place, reversed_iterator, rev_slice, rev_copy_reverse,
        reversed_with_list, rev_manual_pos_gen, rev_manual_neg_gen,
        rev_manual_index_loop, rev_manual_loop]
max_name_string = max([ len(fcn.__name__) for fcn in fcns ])
for fcn in fcns:
    a = copy(list_to_reverse) # copy to start fresh each loop
    out_str = ' | out = {}'.format(fcn(a)) if number_of_items < 15 else ''
    # Time in ms for the given # of loops on this fcn
    time_str = time_str_ms(timeit(lambda: fcn(a), number=loops_to_test))
    # Get the output string for this function
    fcn_str = '{}(a):'.format(fcn.__name__)
    # Add the correct string length to accommodate the maximum fcn name
    format_str = '{{fx:{}s}} {{time}}{{rev}}'.format(max_name_string + 4)
    print(format_str.format(fx=fcn_str, time=time_str, rev=out_str))

计时结果

结果表明,缩放比例最适合用于给定任务的内置方法。换句话说,随着对象元素数量的增加,内置方法开始具有优越的性能结果。

与直接将事情串在一起相比,使用最好的内置方法直接实现所需的效果更好。也就是说,切片是最好的,如果您需要反向列表的副本-它比从reversed()函数创建列表要快,并且比复制列表然后就地执行要快obj.reverse()。但是,如果您真正需要这两种方法中的任何一种,它们就会更快,但速度永远不会超过两倍。同时-自定义,手动方法可能需要更长的数量级,尤其是对于非常大的列表。

对于缩放,使用1000个项目列表,该reversed(<list>)函数调用花费约30毫秒来设置迭代器,就地反转仅花费约55毫秒,使用slice方法花费约210毫秒来创建完整的反转列表的副本,但是我做的最快的手动方法花费了大约8400毫秒!

列表中有2个项目:

a: [0, 1]
Loops: 100,000
rev_in_place(a):             24.70 ms | out = [1, 0]
reversed_iterator(a):        30.48 ms | out = <list_reverseiterator object at 0x0000020242580408>
rev_slice(a):                31.65 ms | out = [1, 0]
rev_copy_reverse(a):         63.42 ms | out = [1, 0]
reversed_with_list(a):       48.65 ms | out = [1, 0]
rev_manual_pos_gen(a):       98.94 ms | out = [1, 0]
rev_manual_neg_gen(a):       88.11 ms | out = [1, 0]
rev_manual_index_loop(a):    87.23 ms | out = [1, 0]
rev_manual_loop(a):          79.24 ms | out = [1, 0]

列表中有10个项目:

rev_in_place(a):             23.39 ms | out = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
reversed_iterator(a):        30.23 ms | out = <list_reverseiterator object at 0x00000290A3CB0388>
rev_slice(a):                36.01 ms | out = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
rev_copy_reverse(a):         64.67 ms | out = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
reversed_with_list(a):       50.77 ms | out = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
rev_manual_pos_gen(a):      162.83 ms | out = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
rev_manual_neg_gen(a):      167.43 ms | out = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
rev_manual_index_loop(a):   152.04 ms | out = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
rev_manual_loop(a):         183.01 ms | out = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

并在列表中包含1000个项目:

rev_in_place(a):             56.37 ms
reversed_iterator(a):        30.47 ms
rev_slice(a):               211.42 ms
rev_copy_reverse(a):        295.74 ms
reversed_with_list(a):      418.45 ms
rev_manual_pos_gen(a):     8410.01 ms
rev_manual_neg_gen(a):    11054.84 ms
rev_manual_index_loop(a): 10543.11 ms
rev_manual_loop(a):       15472.66 ms

Summary of Methods with Explanation and Timing Results

There are several good answers, but spread out and most don’t indicate the fundamental differences of each approach.

Overall, it is better to use built-in functions/methods to reverse, as with just about any function. In this case, they are roughly 2 to 8 times faster on short lists (10 items), and up to ~300+ times faster on long lists compared to manually created a means of indexing. This makes sense as they have experts creating them, scrutiny, and optimization. They are also less prone to defects and more likely to handle edge and corner cases.

Also consider whether you want to:

  1. Reverse an existing list in-place
    • Best solution is `object.reverse()’ method
  2. Create an iterator of the reverse of the list (because you are going to feed it to a for-loop, a generator, etc.)
    • Best solution is ‘reversed(object)` which creates the iterator
  3. or create a complete copy that is in the reverse order
    • Best solution is using slices with a -1 step size: object[::-1]

Test Script

Here is the start of my test script for the methods covered. Put all the code snippets in this answer together to make a script that will run all the different ways of reversing a list and time each one (output shown in the last section).

from timeit import timeit
from copy import copy

def time_str_ms(t):
    return '{0:8.2f} ms'.format(t * 1000)

Method 1: Reverse in place with obj.reverse()

If the goal is just to reverse the order of the items in an existing list, without looping over them or getting a copy to work with, use the <list>.reverse() function. Run this directly on a list object, and the order of all items will be reversed:

Note that the following will reverse the original variable that is given, even though it also returns the reversed list back. i.e. you can create a copy by using this function output. Typically, you wouldn’t make a function for this, but I did so to use the timing code at the end.

We’ll test the performance of this two ways – first just reversing a list in-place (changes the original list), and then copying the list and reversing it afterward.

def rev_in_place(mylist):
    mylist.reverse()
    return mylist

def rev_copy_reverse(mylist):
    a = copy(mylist)
    a.reverse()
    return a

Method 2: Reverse a list using slices obj[::-1]

The built-in index slicing method allows you to make a copy of part of any indexed object.

  • It does not affect the original object
  • It builds a full list, not an iterator

The generic syntax is: <object>[first_index:last_index:step]. To exploit slicing to create a simple reversed list, use: <list>[::-1]. When leaving an option empty, it sets them to defaults of the first and last element of the object (reversed if the step size is negative).

Indexing allows one to use negative numbers, which count from the end of the object’s index backwards (i.e. -2 is the second to last item). When the step size is negative, it will start with the last item and index backward by that amount. There is some start-end logic associated with this that has be optimized.

def rev_slice(mylist):
    a = mylist[::-1]
    return a

Method 3: Reverse a list with the reversed(obj) iterator function

There is a reversed(indexed_object) function:

  • This creates a reverse index iterator, not a list. Great if you are feeding it to a loop for better performance on large lists
  • This creates a copy and does not affect the original object

Test with both a raw iterator, and creating a list from the iterator.

def reversed_iterator(mylist):
    a = reversed(mylist)
    return a

def reversed_with_list(mylist):
    a = list(reversed(mylist))
    return a

Method 4: Reverse list with Custom/Manual indexing

As the timing will show, creating your own methods of indexing is a bad idea. Use the built-in methods unless you need to do something really custom.

That said, there is not a huge penalty with smaller list sizes, but when you scale up the penalty becomes tremendous. I’m sure my code below could be optimized, but I’ll stick with the built-in methods.

def rev_manual_pos_gen(mylist):
    max_index = len(mylist) - 1
    return [ mylist[max_index - index] for index in range(len(mylist)) ]

def rev_manual_neg_gen(mylist):
    ## index is 0 to 9, but we need -1 to -10
    return [ mylist[-index-1] for index in range(len(mylist)) ]

def rev_manual_index_loop(mylist):
    a = []
    reverse_index = len(mylist) - 1
    for index in range(len(mylist)):
        a.append(mylist[reverse_index - index])
    return a

def rev_manual_loop(mylist):
    a = []
    reverse_index = len(mylist)
    for index, _ in enumerate(mylist):
        reverse_index -= 1
        a.append(mylist[reverse_index])
    return a

Timing each method

Following is the rest of the script to time each method of reversing. It shows reversing in place with obj.reverse() and creating the reversed(obj) iterator are always the fastest, while using slices is the fastest way to create a copy.

It also proves not to try to create a way of doing it on your own unless you have to!

loops_to_test = 100000
number_of_items = 10
list_to_reverse = list(range(number_of_items))
if number_of_items < 15:
    print("a: {}".format(list_to_reverse))
print('Loops: {:,}'.format(loops_to_test))
# List of the functions we want to test with the timer, in print order
fcns = [rev_in_place, reversed_iterator, rev_slice, rev_copy_reverse,
        reversed_with_list, rev_manual_pos_gen, rev_manual_neg_gen,
        rev_manual_index_loop, rev_manual_loop]
max_name_string = max([ len(fcn.__name__) for fcn in fcns ])
for fcn in fcns:
    a = copy(list_to_reverse) # copy to start fresh each loop
    out_str = ' | out = {}'.format(fcn(a)) if number_of_items < 15 else ''
    # Time in ms for the given # of loops on this fcn
    time_str = time_str_ms(timeit(lambda: fcn(a), number=loops_to_test))
    # Get the output string for this function
    fcn_str = '{}(a):'.format(fcn.__name__)
    # Add the correct string length to accommodate the maximum fcn name
    format_str = '{{fx:{}s}} {{time}}{{rev}}'.format(max_name_string + 4)
    print(format_str.format(fx=fcn_str, time=time_str, rev=out_str))

Timing Results

The results show that scaling works best with the built-in methods best suited for a given task. In other words, as the object element count increases, the built-in methods begin to have far superior performance results.

You are also better off using the best built-in method that directly achieves what you need than to string things together. i.e. slicing is best if you need a copy of the reversed list – it’s faster than creating a list from the reversed() function, and faster than making a copy of the list and then doing an in-place obj.reverse(). But if either of those methods are really all you need, they are faster, but never by more than double the speed. Meanwhile – custom, manual methods can take orders of magnitude longer, especially with very large lists.

For scaling, with a 1000 item list, the reversed(<list>) function call takes ~30 ms to setup the iterator, reversing in-place takes just ~55 ms, using the slice method takes ~210 ms to create a copy of the full reversed list, but the quickest manual method I made took ~8400 ms!!

With 2 items in the list:

a: [0, 1]
Loops: 100,000
rev_in_place(a):             24.70 ms | out = [1, 0]
reversed_iterator(a):        30.48 ms | out = <list_reverseiterator object at 0x0000020242580408>
rev_slice(a):                31.65 ms | out = [1, 0]
rev_copy_reverse(a):         63.42 ms | out = [1, 0]
reversed_with_list(a):       48.65 ms | out = [1, 0]
rev_manual_pos_gen(a):       98.94 ms | out = [1, 0]
rev_manual_neg_gen(a):       88.11 ms | out = [1, 0]
rev_manual_index_loop(a):    87.23 ms | out = [1, 0]
rev_manual_loop(a):          79.24 ms | out = [1, 0]

With 10 items in the list:

rev_in_place(a):             23.39 ms | out = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
reversed_iterator(a):        30.23 ms | out = <list_reverseiterator object at 0x00000290A3CB0388>
rev_slice(a):                36.01 ms | out = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
rev_copy_reverse(a):         64.67 ms | out = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
reversed_with_list(a):       50.77 ms | out = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
rev_manual_pos_gen(a):      162.83 ms | out = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
rev_manual_neg_gen(a):      167.43 ms | out = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
rev_manual_index_loop(a):   152.04 ms | out = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
rev_manual_loop(a):         183.01 ms | out = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

And with 1000 items in the list:

rev_in_place(a):             56.37 ms
reversed_iterator(a):        30.47 ms
rev_slice(a):               211.42 ms
rev_copy_reverse(a):        295.74 ms
reversed_with_list(a):      418.45 ms
rev_manual_pos_gen(a):     8410.01 ms
rev_manual_neg_gen(a):    11054.84 ms
rev_manual_index_loop(a): 10543.11 ms
rev_manual_loop(a):       15472.66 ms

回答 15

使用一些逻辑

使用一些古老的逻辑来练习面试。

从前到后交换数字。使用两个指针index[0] and index[last]

def reverse(array):
    n = array
    first = 0
    last = len(array) - 1
    while first < last:
      holder = n[first]
      n[first] = n[last]
      n[last] = holder
      first += 1
      last -= 1
    return n

input -> [-1 ,1, 2, 3, 4, 5, 6]
output -> [6, 1, 2, 3, 4, 5, -1]

Using some logic

Using some old school logic to practice for interviews.

Swapping numbers front to back. Using two pointers index[0] and index[last]

def reverse(array):
    n = array
    first = 0
    last = len(array) - 1
    while first < last:
      holder = n[first]
      n[first] = n[last]
      n[last] = holder
      first += 1
      last -= 1
    return n

input -> [-1 ,1, 2, 3, 4, 5, 6]
output -> [6, 1, 2, 3, 4, 5, -1]

回答 16

您还可以使用数组索引的按位补码来反向遍历数组:

>>> array = [0, 10, 20, 40]
>>> [array[~i] for i, _ in enumerate(array)]
[40, 20, 10, 0]

无论您做什么,都不要这样。

You can also use the bitwise complement of the array index to step through the array in reverse:

>>> array = [0, 10, 20, 40]
>>> [array[~i] for i, _ in enumerate(array)]
[40, 20, 10, 0]

Whatever you do, don’t do it this way.


回答 17

使用清单理解:

[array[n] for n in range(len(array)-1, -1, -1)]

Use list comprehension:

[array[n] for n in range(len(array)-1, -1, -1)]

回答 18

另一种解决方案是使用numpy.flip

import numpy as np
array = [0, 10, 20, 40]
list(np.flip(array))
[40, 20, 10, 0]

Another solution would be to use numpy.flip for this

import numpy as np
array = [0, 10, 20, 40]
list(np.flip(array))
[40, 20, 10, 0]

回答 19

严格来说,问题不是如何反向返回列表,而是如何反向显示带有示例列表名称的列表array

要反转一个名为"array"use 的列表array.reverse()

通过使用将列表定义为自身的切片修改,还可以使用如上所述非常有用的切片方法将列表反向显示array = array[::-1]

Strictly speaking, the question is not how to return a list in reverse but rather how to reverse a list with an example list name array.

To reverse a list named "array" use array.reverse().

The incredibly useful slice method as described can also be used to reverse a list in place by defining the list as a sliced modification of itself using array = array[::-1].


回答 20

def reverse(text):
    output = []
    for i in range(len(text)-1, -1, -1):
        output.append(text[i])
    return output
def reverse(text):
    output = []
    for i in range(len(text)-1, -1, -1):
        output.append(text[i])
    return output

回答 21

使用最少的内置功能(假设它是采访设置)

array = [1, 2, 3, 4, 5, 6,7, 8]
inverse = [] #create container for inverse array
length = len(array)  #to iterate later, returns 8 
counter = length - 1  #because the 8th element is on position 7 (as python starts from 0)

for i in range(length): 
   inverse.append(array[counter])
   counter -= 1
print(inverse)

With minimum amount of built-in functions, assuming it’s interview settings

array = [1, 2, 3, 4, 5, 6,7, 8]
inverse = [] #create container for inverse array
length = len(array)  #to iterate later, returns 8 
counter = length - 1  #because the 8th element is on position 7 (as python starts from 0)

for i in range(length): 
   inverse.append(array[counter])
   counter -= 1
print(inverse)

回答 22

您的需求到Python中最直接的翻译是以下for语句:

for i in xrange(len(array) - 1, -1, -1):
   print i, array[i]

这相当神秘,但可能有用。

The most direct translation of your requirement into Python is this for statement:

for i in xrange(len(array) - 1, -1, -1):
   print i, array[i]

This is rather cryptic but may be useful.


回答 23

def reverse(my_list):
  L = len(my_list)
  for i in range(L/2):
    my_list[i], my_list[L-i - 1] = my_list[L-i-1], my_list[i]
  return my_list
def reverse(my_list):
  L = len(my_list)
  for i in range(L/2):
    my_list[i], my_list[L-i - 1] = my_list[L-i-1], my_list[i]
  return my_list

回答 24

您总是可以像对待堆栈一样对待列表,只是将元素从列表的后端弹出堆栈顶部。这样,您就可以利用堆栈的后进先出特性。当然,您正在使用第一个数组。我确实喜欢这种方法,因为它非常直观,您可以看到一个列表是从后端使用的,而另一个列表是从前端构建的。

>>> l = [1,2,3,4,5,6]; nl=[]
>>> while l:
        nl.append(l.pop())  
>>> print nl
[6, 5, 4, 3, 2, 1]

You could always treat the list like a stack just popping the elements off the top of the stack from the back end of the list. That way you take advantage of first in last out characteristics of a stack. Of course you are consuming the 1st array. I do like this method in that it’s pretty intuitive in that you see one list being consumed from the back end while the other is being built from the front end.

>>> l = [1,2,3,4,5,6]; nl=[]
>>> while l:
        nl.append(l.pop())  
>>> print nl
[6, 5, 4, 3, 2, 1]

回答 25

list_data = [1,2,3,4,5]
l = len(list_data)
i=l+1
rev_data = []
while l>0:
  j=i-l
  l-=1
  rev_data.append(list_data[-j])
print "After Rev:- %s" %rev_data 
list_data = [1,2,3,4,5]
l = len(list_data)
i=l+1
rev_data = []
while l>0:
  j=i-l
  l-=1
  rev_data.append(list_data[-j])
print "After Rev:- %s" %rev_data 

回答 26

采用

print(reversed(list_name))

use

print(reversed(list_name))

回答 27

这是一种使用生成器懒洋洋地求逆的方法:

def reverse(seq):
    for x in range(len(seq), -1, -1): #Iterate through a sequence starting from -1 and increasing by -1.
        yield seq[x] #Yield a value to the generator

现在像这样迭代:

for x in reverse([1, 2, 3]):
    print(x)

如果需要列表:

l = list(reverse([1, 2, 3]))

Here’s a way to lazily evaluate the reverse using a generator:

def reverse(seq):
    for x in range(len(seq), -1, -1): #Iterate through a sequence starting from -1 and increasing by -1.
        yield seq[x] #Yield a value to the generator

Now iterate through like this:

for x in reverse([1, 2, 3]):
    print(x)

If you need a list:

l = list(reverse([1, 2, 3]))

回答 28

有3种方法可以获取反向列表:

  1. 切片方法1: reversed_array = array[-1::-1]

  2. 切片方法2: reversed_array2 = array[::-1]

  3. 使用内置函数: reversed_array = array.reverse()

第三个功能实际上是将列表对象反转到位。这意味着不保留原始数据的副本。如果您不想维护旧版本,这是一个好方法。但是,如果您确实想要原始版本和反向版本,这似乎不是解决方案。

There are 3 methods to get the reversed list:

  1. Slicing Method 1: reversed_array = array[-1::-1]

  2. Slicing Method 2: reversed_array2 = array[::-1]

  3. Using the builtin function: reversed_array = array.reverse()

The third function actually reversed the list object in place. That means no copy of pristine data is maintained. This is a good approach if you don’t want to maintain the old version. But doesn’t seem to be a solution if you do want the pristine and reversed version.


回答 29

>>> l = [1, 2, 3, 4, 5]
>>> print(reduce(lambda acc, x: [x] + acc, l, []))
[5, 4, 3, 2, 1]
>>> l = [1, 2, 3, 4, 5]
>>> print(reduce(lambda acc, x: [x] + acc, l, []))
[5, 4, 3, 2, 1]

Python使用哪种IDE?[关闭]

问题:Python使用哪种IDE?[关闭]

其他人使用哪些IDE(“ GUI /编辑器”)进行Python编码?

What IDEs (“GUIs/editors”) do others use for Python coding?


回答 0

结果

电子表格版本

电子表格屏幕截图

或者,以纯文本格式:(也可以作为aa 屏幕截图获得

                         Bracket Matching -.  .- Line Numbering
                          Smart Indent -.  |  |  .- UML Editing / Viewing
         Source Control Integration -.  |  |  |  |  .- Code Folding
                    Error Markup -.  |  |  |  |  |  |  .- Code Templates
  Integrated Python Debugging -.  |  |  |  |  |  |  |  |  .- Unit Testing
    Multi-Language Support -.  |  |  |  |  |  |  |  |  |  |  .- GUI Designer (Qt, Eric, etc)
   Auto Code Completion -.  |  |  |  |  |  |  |  |  |  |  |  |  .- Integrated DB Support
     Commercial/Free -.  |  |  |  |  |  |  |  |  |  |  |  |  |  |  .- Refactoring
   Cross Platform -.  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |     
                  +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
Atom              |Y |F |Y |Y*|Y |Y |Y |Y |Y |Y |  |Y |Y |  |  |  |  |*many plugins
Editra            |Y |F |Y |Y |  |  |Y |Y |Y |Y |  |Y |  |  |  |  |  |
Emacs             |Y |F |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |  |  |  |
Eric Ide          |Y |F |Y |  |Y |Y |  |Y |  |Y |  |Y |  |Y |  |  |  |
Geany             |Y |F |Y*|Y |  |  |  |Y |Y |Y |  |Y |  |  |  |  |  |*very limited
Gedit             |Y |F |Y¹|Y |  |  |  |Y |Y |Y |  |  |Y²|  |  |  |  |¹with plugin; ²sort of
Idle              |Y |F |Y |  |Y |  |  |Y |Y |  |  |  |  |  |  |  |  |
IntelliJ          |Y |CF|Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |
JEdit             |Y |F |  |Y |  |  |  |  |Y |Y |  |Y |  |  |  |  |  |
KDevelop          |Y |F |Y*|Y |  |  |Y |Y |Y |Y |  |Y |  |  |  |  |  |*no type inference
Komodo            |Y |CF|Y |Y |Y |Y |Y |Y |Y |Y |  |Y |Y |Y |  |Y |  |
NetBeans*         |Y |F |Y |Y |Y |  |Y |Y |Y |Y |Y |Y |Y |Y |  |  |Y |*pre-v7.0
Notepad++         |W |F |Y |Y |  |Y*|Y*|Y*|Y |Y |  |Y |Y*|  |  |  |  |*with plugin
Pfaide            |W |C |Y |Y |  |  |  |Y |Y |Y |  |Y |Y |  |  |  |  |
PIDA              |LW|F |Y |Y |  |  |  |Y |Y |Y |  |Y |  |  |  |  |  |VIM based
PTVS              |W |F |Y |Y |Y |Y |Y |Y |Y |Y |  |Y |  |  |Y*|  |Y |*WPF bsed
PyCharm           |Y |CF|Y |Y*|Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |*JavaScript
PyDev (Eclipse)   |Y |F |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |  |  |  |
PyScripter        |W |F |Y |  |Y |Y |  |Y |Y |Y |  |Y |Y |Y |  |  |  |
PythonWin         |W |F |Y |  |Y |  |  |Y |Y |  |  |Y |  |  |  |  |  |
SciTE             |Y |F¹|  |Y |  |Y |  |Y |Y |Y |  |Y |Y |  |  |  |  |¹Mac version is
ScriptDev         |W |C |Y |Y |Y |Y |  |Y |Y |Y |  |Y |Y |  |  |  |  |    commercial
Spyder            |Y |F |Y |  |Y |Y |  |Y |Y |Y |  |  |  |  |  |  |  |
Sublime Text      |Y |CF|Y |Y |  |Y |Y |Y |Y |Y |  |Y |Y |Y*|  |  |  |extensible w/Python,
TextMate          |M |F |  |Y |  |  |Y |Y |Y |Y |  |Y |Y |  |  |  |  |    *PythonTestRunner
UliPad            |Y |F |Y |Y |Y |  |  |Y |Y |  |  |  |Y |Y |  |  |  |
Vim               |Y |F |Y |Y |Y |Y |Y |Y |Y |Y |  |Y |Y |Y |  |  |  |
Visual Studio     |W |CF|Y |Y |Y |Y |Y |Y |Y |Y |? |Y |? |? |Y |? |Y |
Visual Studio Code|Y |F |Y |Y |Y |Y |Y |Y |Y |Y |? |Y |? |? |? |? |Y |uses plugins
WingIde           |Y |C |Y |Y*|Y |Y |Y |Y |Y |Y |  |Y |Y |Y |  |  |  |*support for C
Zeus              |W |C |  |  |  |  |Y |Y |Y |Y |  |Y |Y |  |  |  |  |
                  +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
   Cross Platform -'  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |     
     Commercial/Free -'  |  |  |  |  |  |  |  |  |  |  |  |  |  |  '- Refactoring
   Auto Code Completion -'  |  |  |  |  |  |  |  |  |  |  |  |  '- Integrated DB Support
    Multi-Language Support -'  |  |  |  |  |  |  |  |  |  |  '- GUI Designer (Qt, Eric, etc)
  Integrated Python Debugging -'  |  |  |  |  |  |  |  |  '- Unit Testing
                    Error Markup -'  |  |  |  |  |  |  '- Code Templates
         Source Control Integration -'  |  |  |  |  '- Code Folding
                          Smart Indent -'  |  |  '- UML Editing / Viewing
                         Bracket Matching -'  '- Line Numbering

缩略语:

 L  - Linux
 W  - Windows
 M  - Mac
 C  - Commercial
 F  - Free
 CF - Commercial with Free limited edition
 ?  - To be confirmed

我没有提到语法高亮之类的基础知识,因为我期望默认情况下这些。


这只是一份反映您的反馈和评论的清单,我不主张使用这些工具。当您继续发布答案时,我将不断更新此列表。

PS。您能帮我将上述编辑器的功能添加到列表中吗(例如自动完成,调试等)?

对于这个问题,我们有一个综合的Wiki页面https://wiki.python.org/moin/IntegratedDevelopmentEnvironments

将修改提交到电子表格

Results

Spreadsheet version

spreadsheet screenshot

Alternatively, in plain text: (also available as a a screenshot)

                         Bracket Matching -.  .- Line Numbering
                          Smart Indent -.  |  |  .- UML Editing / Viewing
         Source Control Integration -.  |  |  |  |  .- Code Folding
                    Error Markup -.  |  |  |  |  |  |  .- Code Templates
  Integrated Python Debugging -.  |  |  |  |  |  |  |  |  .- Unit Testing
    Multi-Language Support -.  |  |  |  |  |  |  |  |  |  |  .- GUI Designer (Qt, Eric, etc)
   Auto Code Completion -.  |  |  |  |  |  |  |  |  |  |  |  |  .- Integrated DB Support
     Commercial/Free -.  |  |  |  |  |  |  |  |  |  |  |  |  |  |  .- Refactoring
   Cross Platform -.  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |     
                  +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
Atom              |Y |F |Y |Y*|Y |Y |Y |Y |Y |Y |  |Y |Y |  |  |  |  |*many plugins
Editra            |Y |F |Y |Y |  |  |Y |Y |Y |Y |  |Y |  |  |  |  |  |
Emacs             |Y |F |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |  |  |  |
Eric Ide          |Y |F |Y |  |Y |Y |  |Y |  |Y |  |Y |  |Y |  |  |  |
Geany             |Y |F |Y*|Y |  |  |  |Y |Y |Y |  |Y |  |  |  |  |  |*very limited
Gedit             |Y |F |Y¹|Y |  |  |  |Y |Y |Y |  |  |Y²|  |  |  |  |¹with plugin; ²sort of
Idle              |Y |F |Y |  |Y |  |  |Y |Y |  |  |  |  |  |  |  |  |
IntelliJ          |Y |CF|Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |
JEdit             |Y |F |  |Y |  |  |  |  |Y |Y |  |Y |  |  |  |  |  |
KDevelop          |Y |F |Y*|Y |  |  |Y |Y |Y |Y |  |Y |  |  |  |  |  |*no type inference
Komodo            |Y |CF|Y |Y |Y |Y |Y |Y |Y |Y |  |Y |Y |Y |  |Y |  |
NetBeans*         |Y |F |Y |Y |Y |  |Y |Y |Y |Y |Y |Y |Y |Y |  |  |Y |*pre-v7.0
Notepad++         |W |F |Y |Y |  |Y*|Y*|Y*|Y |Y |  |Y |Y*|  |  |  |  |*with plugin
Pfaide            |W |C |Y |Y |  |  |  |Y |Y |Y |  |Y |Y |  |  |  |  |
PIDA              |LW|F |Y |Y |  |  |  |Y |Y |Y |  |Y |  |  |  |  |  |VIM based
PTVS              |W |F |Y |Y |Y |Y |Y |Y |Y |Y |  |Y |  |  |Y*|  |Y |*WPF bsed
PyCharm           |Y |CF|Y |Y*|Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |*JavaScript
PyDev (Eclipse)   |Y |F |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |  |  |  |
PyScripter        |W |F |Y |  |Y |Y |  |Y |Y |Y |  |Y |Y |Y |  |  |  |
PythonWin         |W |F |Y |  |Y |  |  |Y |Y |  |  |Y |  |  |  |  |  |
SciTE             |Y |F¹|  |Y |  |Y |  |Y |Y |Y |  |Y |Y |  |  |  |  |¹Mac version is
ScriptDev         |W |C |Y |Y |Y |Y |  |Y |Y |Y |  |Y |Y |  |  |  |  |    commercial
Spyder            |Y |F |Y |  |Y |Y |  |Y |Y |Y |  |  |  |  |  |  |  |
Sublime Text      |Y |CF|Y |Y |  |Y |Y |Y |Y |Y |  |Y |Y |Y*|  |  |  |extensible w/Python,
TextMate          |M |F |  |Y |  |  |Y |Y |Y |Y |  |Y |Y |  |  |  |  |    *PythonTestRunner
UliPad            |Y |F |Y |Y |Y |  |  |Y |Y |  |  |  |Y |Y |  |  |  |
Vim               |Y |F |Y |Y |Y |Y |Y |Y |Y |Y |  |Y |Y |Y |  |  |  |
Visual Studio     |W |CF|Y |Y |Y |Y |Y |Y |Y |Y |? |Y |? |? |Y |? |Y |
Visual Studio Code|Y |F |Y |Y |Y |Y |Y |Y |Y |Y |? |Y |? |? |? |? |Y |uses plugins
WingIde           |Y |C |Y |Y*|Y |Y |Y |Y |Y |Y |  |Y |Y |Y |  |  |  |*support for C
Zeus              |W |C |  |  |  |  |Y |Y |Y |Y |  |Y |Y |  |  |  |  |
                  +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
   Cross Platform -'  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |     
     Commercial/Free -'  |  |  |  |  |  |  |  |  |  |  |  |  |  |  '- Refactoring
   Auto Code Completion -'  |  |  |  |  |  |  |  |  |  |  |  |  '- Integrated DB Support
    Multi-Language Support -'  |  |  |  |  |  |  |  |  |  |  '- GUI Designer (Qt, Eric, etc)
  Integrated Python Debugging -'  |  |  |  |  |  |  |  |  '- Unit Testing
                    Error Markup -'  |  |  |  |  |  |  '- Code Templates
         Source Control Integration -'  |  |  |  |  '- Code Folding
                          Smart Indent -'  |  |  '- UML Editing / Viewing
                         Bracket Matching -'  '- Line Numbering

Acronyms used:

 L  - Linux
 W  - Windows
 M  - Mac
 C  - Commercial
 F  - Free
 CF - Commercial with Free limited edition
 ?  - To be confirmed

I don’t mention basics like syntax highlighting as I expect these by default.


This is a just dry list reflecting your feedback and comments, I am not advocating any of these tools. I will keep updating this list as you keep posting your answers.

PS. Can you help me to add features of the above editors to the list (like auto-complete, debugging, etc.)?

We have a comprehensive wiki page for this question https://wiki.python.org/moin/IntegratedDevelopmentEnvironments

Submit edits to the spreadsheet


列表和元组之间有什么区别?

问题:列表和元组之间有什么区别?

有什么不同?

元组/列表的优点/缺点是什么?

What’s the difference?

What are the advantages / disadvantages of tuples / lists?


回答 0

除了元组是不可变的之外,还有语义上的区别应指导它们的使用。元组是异构数据结构(即,它们的条目具有不同的含义),而列表是同类序列。元组具有结构,列表具有顺序。

使用这种区别可以使代码更加明确和易于理解。

一个示例是成对的页和行号,以成对参考书中的位置,例如:

my_location = (42, 11)  # page number, line number

然后,您可以将其用作字典中的键来存储有关位置的注释。另一方面,列表可用于存储多个位置。自然地,人们可能想在列表中添加或删除位置,因此列表是可变的很有意义。另一方面,从现有位置添加或删除项目没有意义-因此,元组是不可变的。

在某些情况下,您可能想更改现有位置元组中的项目,例如在页面的各行中进行迭代时。但是元组不变性迫使您为每个新值创建一个新的位置元组。从表面上看,这似乎很不方便,但是使用这样的不可变数据是值类型和函数编程技术的基石,可以具有很多优点。

关于此问题,有一些有趣的文章,例如“ Python元组不仅仅是常量列表”“了解Python中的元组与列表”。官方Python文档也提到了这一点

“组是不可变的,并且通常包含一个异类序列…”。

在像Haskell这样的静态类型语言中,元组中的值通常具有不同的类型,并且元组的长度必须固定。在列表中,所有值都具有相同的类型,并且长度不是固定的。因此区别非常明显。

最后,在Python中有一个namedtuple,这很有意义,因为一个元组已经被认为具有结构。这强调了元组是类和实例的轻量级替代方案的思想。

Apart from tuples being immutable there is also a semantic distinction that should guide their usage. Tuples are heterogeneous data structures (i.e., their entries have different meanings), while lists are homogeneous sequences. Tuples have structure, lists have order.

Using this distinction makes code more explicit and understandable.

One example would be pairs of page and line number to reference locations in a book, e.g.:

my_location = (42, 11)  # page number, line number

You can then use this as a key in a dictionary to store notes on locations. A list on the other hand could be used to store multiple locations. Naturally one might want to add or remove locations from the list, so it makes sense that lists are mutable. On the other hand it doesn’t make sense to add or remove items from an existing location – hence tuples are immutable.

There might be situations where you want to change items within an existing location tuple, for example when iterating through the lines of a page. But tuple immutability forces you to create a new location tuple for each new value. This seems inconvenient on the face of it, but using immutable data like this is a cornerstone of value types and functional programming techniques, which can have substantial advantages.

There are some interesting articles on this issue, e.g. “Python Tuples are Not Just Constant Lists” or “Understanding tuples vs. lists in Python”. The official Python documentation also mentions this

“Tuples are immutable, and usually contain an heterogeneous sequence …”.

In a statically typed language like Haskell the values in a tuple generally have different types and the length of the tuple must be fixed. In a list the values all have the same type and the length is not fixed. So the difference is very obvious.

Finally there is the namedtuple in Python, which makes sense because a tuple is already supposed to have structure. This underlines the idea that tuples are a light-weight alternative to classes and instances.


回答 1

列表和元组之间的区别

  1. 文字

    someTuple = (1,2)
    someList  = [1,2] 
  2. 尺寸

    a = tuple(range(1000))
    b = list(range(1000))
    
    a.__sizeof__() # 8024
    b.__sizeof__() # 9088

    由于元组操作的大小较小,因此它变得更快一些,但是在您拥有大量元素之前,不必多说。

  3. 允许的操作

    b    = [1,2]   
    b[0] = 3       # [3, 2]
    
    a    = (1,2)
    a[0] = 3       # Error

    这也意味着您不能删除元素或对元组进行排序。但是,您可以在列表和元组中都添加一个新元素,唯一的区别是,由于元组是不可变的,因此您并不是真正在添加元素,而是要创建一个新的元组,因此id将会改变

    a     = (1,2)
    b     = [1,2]  
    
    id(a)          # 140230916716520
    id(b)          # 748527696
    
    a   += (3,)    # (1, 2, 3)
    b   += [3]     # [1, 2, 3]
    
    id(a)          # 140230916878160
    id(b)          # 748527696
  4. 用法

    由于列表是可变的,因此不能用作字典中的键,而可以使用元组。

    a    = (1,2)
    b    = [1,2] 
    
    c = {a: 1}     # OK
    c = {b: 1}     # Error

Difference between list and tuple

  1. Literal

    someTuple = (1,2)
    someList  = [1,2] 
    
  2. Size

    a = tuple(range(1000))
    b = list(range(1000))
    
    a.__sizeof__() # 8024
    b.__sizeof__() # 9088
    

    Due to the smaller size of a tuple operation, it becomes a bit faster, but not that much to mention about until you have a huge number of elements.

  3. Permitted operations

    b    = [1,2]   
    b[0] = 3       # [3, 2]
    
    a    = (1,2)
    a[0] = 3       # Error
    

    That also means that you can’t delete an element or sort a tuple. However, you could add a new element to both list and tuple with the only difference that since the tuple is immutable, you are not really adding an element but you are creating a new tuple, so the id of will change

    a     = (1,2)
    b     = [1,2]  
    
    id(a)          # 140230916716520
    id(b)          # 748527696
    
    a   += (3,)    # (1, 2, 3)
    b   += [3]     # [1, 2, 3]
    
    id(a)          # 140230916878160
    id(b)          # 748527696
    
  4. Usage

    As a list is mutable, it can’t be used as a key in a dictionary, whereas a tuple can be used.

    a    = (1,2)
    b    = [1,2] 
    
    c = {a: 1}     # OK
    c = {b: 1}     # Error
    

回答 2

如果您去散散步,您可以随时在 (x,y)元组中。

如果要记录您的旅程,可以每隔几秒钟将您的位置附加到一个列表中。

但您无法做到这一点。

If you went for a walk, you could note your coordinates at any instant in an (x,y) tuple.

If you wanted to record your journey, you could append your location every few seconds to a list.

But you couldn’t do it the other way around.


回答 3

关键区别在于元组是不可变的。这意味着一旦创建元组,就无法更改其值。

因此,如果您需要更改值,请使用列表。

对元组的好处:

  1. 性能略有改善。
  2. 由于元组是不可变的,因此可以将其用作字典中的键。
  3. 如果您无法更改它,那么其他任何人也不能更改它,也就是说,您无需担心任何API函数等。无需询问即可更改元组。

The key difference is that tuples are immutable. This means that you cannot change the values in a tuple once you have created it.

So if you’re going to need to change the values use a List.

Benefits to tuples:

  1. Slight performance improvement.
  2. As a tuple is immutable it can be used as a key in a dictionary.
  3. If you can’t change it neither can anyone else, which is to say you don’t need to worry about any API functions etc. changing your tuple without being asked.

回答 4

列表是可变的;元组不是。

来自docs.python.org/2/tutorial/datastructures.html

元组是不可变的,通常包含一个异类元素序列,这些元素可以通过拆包(请参阅本节后面的内容)或索引(甚至在命名元组的情况下通过属性)进行访问。列表是可变的,并且它们的元素通常是同类的,并且可以通过遍历列表来访问。

Lists are mutable; tuples are not.

From docs.python.org/2/tutorial/datastructures.html

Tuples are immutable, and usually contain an heterogeneous sequence of elements that are accessed via unpacking (see later in this section) or indexing (or even by attribute in the case of namedtuples). Lists are mutable, and their elements are usually homogeneous and are accessed by iterating over the list.


回答 5

被提及的差异主要语义:人们期待一个元组和列表来表示不同的信息。但这远远超出了指导原则。有些库实际上根据传递的内容而有所不同。以NumPy为例(从我要求更多示例的另一篇文章中复制):

>>> import numpy as np
>>> a = np.arange(9).reshape(3,3)
>>> a
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> idx = (1,1)
>>> a[idx]
4
>>> idx = [1,1]
>>> a[idx]
array([[3, 4, 5],
       [3, 4, 5]])

关键是,虽然NumPy可能不是标准库的一部分,但它是一个主要的 Python库,在NumPy列表和元组中是完全不同的东西。

It’s been mentioned that the difference is largely semantic: people expect a tuple and list to represent different information. But this goes further than a guideline; some libraries actually behave differently based on what they are passed. Take NumPy for example (copied from another post where I ask for more examples):

>>> import numpy as np
>>> a = np.arange(9).reshape(3,3)
>>> a
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> idx = (1,1)
>>> a[idx]
4
>>> idx = [1,1]
>>> a[idx]
array([[3, 4, 5],
       [3, 4, 5]])

The point is, while NumPy may not be part of the standard library, it’s a major Python library, and within NumPy lists and tuples are completely different things.


回答 6

列表用于循环,元组用于结构,即"%s %s" %tuple

列表通常是同质的,元组通常是异类的。

列表用于可变长度,元组用于固定长度。

Lists are for looping, tuples are for structures i.e. "%s %s" %tuple.

Lists are usually homogeneous, tuples are usually heterogeneous.

Lists are for variable length, tuples are for fixed length.


回答 7

这是Python列表的示例:

my_list = [0,1,2,3,4]
top_rock_list = ["Bohemian Rhapsody","Kashmir","Sweet Emotion", "Fortunate Son"]

这是Python元组的示例:

my_tuple = (a,b,c,d,e)
celebrity_tuple = ("John", "Wayne", 90210, "Actor", "Male", "Dead")

Python列表和元组的相似之处在于它们都是值的有序集合。除了使用括号“ […,…]”创建列表的浅层差异以及使用括号“(…,…)”创建的元组之外,它们之间的核心技术“用Python语法进行硬编码”之间的差异是特定元组的元素是不可变的,而列表是可变的(…因此,只有元组是可哈希的,并且可以用作字典/哈希键!)。这就导致了它们的使用方式或不使用方式的差异(通过语法先验地实现)以及人们选择使用它们的方式上的差异(鼓励作为“最佳实践”,后验,这就是智能程序员所做的事情)。 人们赋予元素顺序。

对于元组,“顺序”仅表示存储信息的特定“结构”。在第一个字段中找到的值可以很容易地切换到第二个字段,因为每个值都提供跨两个不同维度或比例的值。它们为不同类型的问题提供答案,并且通常采用以下形式:对于给定的对象/对象,其属性是什么?对象/对象保持不变,属性不同。

对于列表,“顺序”表示顺序或方向。第二个元素必须位于第一个元素之后,因为它基于特定且通用的比例或维度位于第二位。这些元素是一个整体,并且通常针对一个给定属性的形式单个问题提供答案,对于给定的属性,这些对象/对象如何比较?属性保持不变,对象/主题不同。

有无数流行文化的人和不符合这些差异的程序员的例子,有无数人可能在主菜上使用色叉。一天结束后,一切都很好,通常都可以完成工作。

总结一些更好的细节

相似之处:

  1. 重复项 -元组和列表都允许重复项
  2. 索引,选择和切片 -元组和列表都使用括号内的整数值进行索引。因此,如果要给定列表或元组的前三个值,语法将是相同的:

    >>> my_list[0:3]
    [0,1,2]
    >>> my_tuple[0:3]
    [a,b,c]
  3. 比较和排序 -两个元组或两个列表都通过它们的第一个元素进行比较,如果有平局,则通过第二个元素进行比较,依此类推。在较早的元素显示出不同之后,不再关注后续元素。

    >>> [0,2,0,0,0,0]>[0,0,0,0,0,500]
    True
    >>> (0,2,0,0,0,0)>(0,0,0,0,0,500)
    True

区别: -先验,根据定义

  1. 语法 -列表使用[],元组使用()

  2. 可变性 -给定列表中的元素是可变的,给定元组中的元素不是可变的。

    # Lists are mutable:
    >>> top_rock_list
    ['Bohemian Rhapsody', 'Kashmir', 'Sweet Emotion', 'Fortunate Son']
    >>> top_rock_list[1]
    'Kashmir'
    >>> top_rock_list[1] = "Stairway to Heaven"
    >>> top_rock_list
    ['Bohemian Rhapsody', 'Stairway to Heaven', 'Sweet Emotion', 'Fortunate Son']
    
    # Tuples are NOT mutable:       
    >>> celebrity_tuple
    ('John', 'Wayne', 90210, 'Actor', 'Male', 'Dead')
    >>> celebrity_tuple[5]
    'Dead'
    >>> celebrity_tuple[5]="Alive"
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: 'tuple' object does not support item assignment
  3. 哈希表(字典) -由于哈希表(字典)要求其键是可哈希的,因此是不可变的,因此只有元组可以用作字典键,而不能用作列表。

    #Lists CAN'T act as keys for hashtables(dictionaries)
    >>> my_dict = {[a,b,c]:"some value"}
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: unhashable type: 'list'
    
    #Tuples CAN act as keys for hashtables(dictionaries)
    >>> my_dict = {("John","Wayne"): 90210}
    >>> my_dict
    {('John', 'Wayne'): 90210}

差异-后验用法

  1. 元素的均质性与异质性-通常,列表对象是同质的,而元组对象是异质的。也就是说,列表用于相同类型的对象/对象(例如所有总统候选人,所有歌曲或所有跑步者),而虽然不是强制的,但元组更多地用于异构对象。

  2. 循环与结构-尽管两者都允许循环(对于my_list中的x,…),但实际上对于列表而言才有意义。元组更适合于结构化和呈现信息(驻留在%s中的%s%s是%s,当前是%s%(“ John”,“ Wayne”,90210,“ Actor”,“ Dead”))

This is an example of Python lists:

my_list = [0,1,2,3,4]
top_rock_list = ["Bohemian Rhapsody","Kashmir","Sweet Emotion", "Fortunate Son"]

This is an example of Python tuple:

my_tuple = (a,b,c,d,e)
celebrity_tuple = ("John", "Wayne", 90210, "Actor", "Male", "Dead")

Python lists and tuples are similar in that they both are ordered collections of values. Besides the shallow difference that lists are created using brackets “[ … , … ]” and tuples using parentheses “( … , … )”, the core technical “hard coded in Python syntax” difference between them is that the elements of a particular tuple are immutable whereas lists are mutable (…so only tuples are hashable and can be used as dictionary/hash keys!). This gives rise to differences in how they can or can’t be used (enforced a priori by syntax) and differences in how people choose to use them (encouraged as ‘best practices,’ a posteriori, this is what smart programers do). The main difference a posteriori in differentiating when tuples are used versus when lists are used lies in what meaning people give to the order of elements.

For tuples, ‘order’ signifies nothing more than just a specific ‘structure’ for holding information. What values are found in the first field can easily be switched into the second field as each provides values across two different dimensions or scales. They provide answers to different types of questions and are typically of the form: for a given object/subject, what are its attributes? The object/subject stays constant, the attributes differ.

For lists, ‘order’ signifies a sequence or a directionality. The second element MUST come after the first element because it’s positioned in the 2nd place based on a particular and common scale or dimension. The elements are taken as a whole and mostly provide answers to a single question typically of the form, for a given attribute, how do these objects/subjects compare? The attribute stays constant, the object/subject differs.

There are countless examples of people in popular culture and programmers who don’t conform to these differences and there are countless people who might use a salad fork for their main course. At the end of the day, it’s fine and both can usually get the job done.

To summarize some of the finer details

Similarities:

  1. Duplicates – Both tuples and lists allow for duplicates
  2. Indexing, Selecting, & Slicing – Both tuples and lists index using integer values found within brackets. So, if you want the first 3 values of a given list or tuple, the syntax would be the same:

    >>> my_list[0:3]
    [0,1,2]
    >>> my_tuple[0:3]
    [a,b,c]
    
  3. Comparing & Sorting – Two tuples or two lists are both compared by their first element, and if there is a tie, then by the second element, and so on. No further attention is paid to subsequent elements after earlier elements show a difference.

    >>> [0,2,0,0,0,0]>[0,0,0,0,0,500]
    True
    >>> (0,2,0,0,0,0)>(0,0,0,0,0,500)
    True
    

Differences: – A priori, by definition

  1. Syntax – Lists use [], tuples use ()

  2. Mutability – Elements in a given list are mutable, elements in a given tuple are NOT mutable.

    # Lists are mutable:
    >>> top_rock_list
    ['Bohemian Rhapsody', 'Kashmir', 'Sweet Emotion', 'Fortunate Son']
    >>> top_rock_list[1]
    'Kashmir'
    >>> top_rock_list[1] = "Stairway to Heaven"
    >>> top_rock_list
    ['Bohemian Rhapsody', 'Stairway to Heaven', 'Sweet Emotion', 'Fortunate Son']
    
    # Tuples are NOT mutable:       
    >>> celebrity_tuple
    ('John', 'Wayne', 90210, 'Actor', 'Male', 'Dead')
    >>> celebrity_tuple[5]
    'Dead'
    >>> celebrity_tuple[5]="Alive"
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: 'tuple' object does not support item assignment
    
  3. Hashtables (Dictionaries) – As hashtables (dictionaries) require that its keys are hashable and therefore immutable, only tuples can act as dictionary keys, not lists.

    #Lists CAN'T act as keys for hashtables(dictionaries)
    >>> my_dict = {[a,b,c]:"some value"}
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: unhashable type: 'list'
    
    #Tuples CAN act as keys for hashtables(dictionaries)
    >>> my_dict = {("John","Wayne"): 90210}
    >>> my_dict
    {('John', 'Wayne'): 90210}
    

Differences – A posteriori, in usage

  1. Homo vs. Heterogeneity of Elements – Generally list objects are homogenous and tuple objects are heterogeneous. That is, lists are used for objects/subjects of the same type (like all presidential candidates, or all songs, or all runners) whereas although it’s not forced by), whereas tuples are more for heterogenous objects.

  2. Looping vs. Structures – Although both allow for looping (for x in my_list…), it only really makes sense to do it for a list. Tuples are more appropriate for structuring and presenting information (%s %s residing in %s is an %s and presently %s % (“John”,”Wayne”,90210, “Actor”,”Dead”))


回答 8

list的值可以随时更改,但是元组的值不能更改。

优点和缺点取决于使用。如果您拥有从未更改过的数据,则必须使用元组,否则list是最佳选择。

The values of list can be changed any time but the values of tuples can’t be change.

The advantages and disadvantages depends upon the use. If you have such a data which you never want to change then you should have to use tuple, otherwise list is the best option.


回答 9

列表和元组之间的区别

元组和列表在Python中似乎都是相似的序列类型。

  1. 文字语法

    我们使用括号()构造元组和方括号[ ]以获取新列表。另外,我们可以使用适当类型的调用来获取所需的结构-元组或列表。

    someTuple = (4,6)
    someList  = [2,6] 
  2. 变异性

    元组是不可变的,而列表是可变的。这是以下几点的基础。

  3. 内存使用情况

    由于可变性,您需要更多的内存用于列表,而更少的内存用于元组。

  4. 延伸

    您可以将新元素添加到元组和列表中,唯一的区别是将更改元组的ID(即,我们将有一个新的对象)。

  5. 散列

    元组可散列,而列表则不可。这意味着您可以将元组用作字典中的键。该列表不能用作字典中的键,而可以使用元组

    tup      = (1,2)
    list_    = [1,2] 
    
    c = {tup   : 1}     # ok
    c = {list_ : 1}     # error
  6. 语义学

    这一点是关于最佳实践的。您应该将元组用作异构数据结构,而列表则是同质序列。

Difference between list and tuple

Tuples and lists are both seemingly similar sequence types in Python.

  1. Literal syntax

    We use parenthesis () to construct tuples and square brackets [ ] to get a new list. Also, we can use call of the appropriate type to get required structure — tuple or list.

    someTuple = (4,6)
    someList  = [2,6] 
    
  2. Mutability

    Tuples are immutable, while lists are mutable. This point is the base the for the following ones.

  3. Memory usage

    Due to mutability, you need more memory for lists and less memory for tuples.

  4. Extending

    You can add a new element to both tuples and lists with the only difference that the id of the tuple will be changed (i.e., we’ll have a new object).

  5. Hashing

    Tuples are hashable and lists are not. It means that you can use a tuple as a key in a dictionary. The list can’t be used as a key in a dictionary, whereas a tuple can be used

    tup      = (1,2)
    list_    = [1,2] 
    
    c = {tup   : 1}     # ok
    c = {list_ : 1}     # error
    
  6. Semantics

    This point is more about best practice. You should use tuples as heterogeneous data structures, while lists are homogenous sequences.


回答 10

列表旨在为同质序列,而元组为异构数据结构。

Lists are intended to be homogeneous sequences, while tuples are heterogeneous data structures.


回答 11

正如人们已经在这里回答的那样tuples,虽然lists可变但可变是不变的,但是使用元组有一个重要方面,我们必须记住

如果中tuple包含一个listdictionary内部,则即使它们tuple本身是不可变的,也可以更改它们。

例如,假设我们有一个元组,其中包含一个列表和一个字典,如下所示

my_tuple = (10,20,30,[40,50],{ 'a' : 10})

我们可以将列表的内容更改为

my_tuple[3][0] = 400
my_tuple[3][1] = 500

这使得新的元组看起来像

(10, 20, 30, [400, 500], {'a': 10})

我们也可以将元组中的字典更改为

my_tuple[4]['a'] = 500

这将使整个元组看起来像

(10, 20, 30, [400, 500], {'a': 500})

这是因为 listdictionary是对象,而这些对象并没有改变,而是其指向的内容。

因此,这些tuple遗物毫无exceptions地保持不变

As people have already answered here that tuples are immutable while lists are mutable, but there is one important aspect of using tuples which we must remember

If the tuple contains a list or a dictionary inside it, those can be changed even if the tuple itself is immutable.

For example, let’s assume we have a tuple which contains a list and a dictionary as

my_tuple = (10,20,30,[40,50],{ 'a' : 10})

we can change the contents of the list as

my_tuple[3][0] = 400
my_tuple[3][1] = 500

which makes new tuple looks like

(10, 20, 30, [400, 500], {'a': 10})

we can also change the dictionary inside tuple as

my_tuple[4]['a'] = 500

which will make the overall tuple looks like

(10, 20, 30, [400, 500], {'a': 500})

This happens because list and dictionary are the objects and these objects are not changing, but the contents its pointing to.

So the tuple remains immutable without any exception


回答 12

PEP 484 -类型提示说,该类型的元素tuple可以单独输入; 这样你可以说Tuple[str, int, float]; 但是list,随着List键入类可以采取仅一种类型的参数:List[str],这提示了2的差异确实是,前者是异质的,而后者本质上是均匀的。

另外,标准库通常使用元组作为C会返回a的标准函数的返回值struct

The PEP 484 — Type Hints says that the types of elements of a tuple can be individually typed; so that you can say Tuple[str, int, float]; but a list, with List typing class can take only one type parameter: List[str], which hints that the difference of the 2 really is that the former is heterogeneous, whereas the latter intrinsically homogeneous.

Also, the standard library mostly uses the tuple as a return value from such standard functions where the C would return a struct.


回答 13

正如人们已经提到的差异一样,我将写有关元组的原因。

为什么首选元组?

小元组的分配优化

为了减少内存碎片并加快分配速度,Python重用了旧的元组。如果不再需要一个元组,并且元组少于20个,而不是将其永久删除,Python会将其移至空闲列表。

一个空闲列表分为20组,其中每个组代表长度为n的0至20之间的元组列表。每个组最多可以存储2000个元组。第一个(零)组仅包含一个元素,代表一个空的元组。

>>> a = (1,2,3)
>>> id(a)
4427578104
>>> del a
>>> b = (1,2,4)
>>> id(b)
4427578104

在上面的示例中,我们可以看到a和b具有相同的ID。那是因为我们立即占领了一个在空闲列表中的被破坏的元组。

列表分配优化

由于可以修改列表,因此Python不会使用与元组相同的优化。但是,Python列表也有一个空闲列表,但仅用于空对象。如果GC删除或收集了一个空列表,则以后可以重复使用。

>>> a = []
>>> id(a)
4465566792
>>> del a
>>> b = []
>>> id(b)
4465566792

资料来源:https : //rushter.com/blog/python-lists-and-tuples/

为什么元组比列表高效?-> https://stackoverflow.com/a/22140115

As people have already mentioned the differences I will write about why tuples.

Why tuples are preferred?

Allocation optimization for small tuples

To reduce memory fragmentation and speed up allocations, Python reuses old tuples. If a tuple no longer needed and has less than 20 items instead of deleting it permanently Python moves it to a free list.

A free list is divided into 20 groups, where each group represents a list of tuples of length n between 0 and 20. Each group can store up to 2 000 tuples. The first (zero) group contains only 1 element and represents an empty tuple.

>>> a = (1,2,3)
>>> id(a)
4427578104
>>> del a
>>> b = (1,2,4)
>>> id(b)
4427578104

In the example above we can see that a and b have the same id. That is because we immediately occupied a destroyed tuple which was on the free list.

Allocation optimization for lists

Since lists can be modified, Python does not use the same optimization as in tuples. However, Python lists also have a free list, but it is used only for empty objects. If an empty list is deleted or collected by GC, it can be reused later.

>>> a = []
>>> id(a)
4465566792
>>> del a
>>> b = []
>>> id(b)
4465566792

Source: https://rushter.com/blog/python-lists-and-tuples/

Why tuples are efficient than lists? -> https://stackoverflow.com/a/22140115


回答 14

5.3文档中的方向引文元组和序列

尽管元组看起来类似于列表,但是它们通常用于不同的情况和不同的目的。元组是不可变的,并且通常包含异类元素序列,这些元素可以通过拆包(请参阅本节后面的内容)或索引(甚至在namedtuple的情况下通过属性)进行访问。列表是可变的,并且它们的元素通常是同质的,可以通过迭代列表来访问。

A direction quotation from the documentation on 5.3. Tuples and Sequences:

Though tuples may seem similar to lists, they are often used in different situations and for different purposes. Tuples are immutable, and usually contain a heterogeneous sequence of elements that are accessed via unpacking (see later in this section) or indexing (or even by attribute in the case of namedtuples). Lists are mutable, and their elements are usually homogeneous and are accessed by iterating over the list.


回答 15

首先,它们都是Python中的非标量对象(也称为复合对象)。

  • 元组,元素的有序序列(可以包含任何对象,而不会出现别名问题)
    • 不可变的(元组,整数,浮点数,str)
    • 使用串联+(当然会创建全新的元组)
    • 索引编制
    • 切片
    • 单例(3,) # -> (3)而不是(3) # -> 3
  • 列表(其他语言的数组),值的有序序列
    • 可变的
    • 辛格尔顿 [3]
    • 克隆 new_array = origin_array[:]
    • 列表理解[x**2 for x in range(1,7)]给您 [1,4,9,16,25,36](不可读)

使用列表可能还会导致混淆错误(指向同一对象的两个不同路径)。

First of all, they both are the non-scalar objects (also known as a compound objects) in Python.

  • Tuples, ordered sequence of elements (which can contain any object with no aliasing issue)
    • Immutable (tuple, int, float, str)
    • Concatenation using + (brand new tuple will be created of course)
    • Indexing
    • Slicing
    • Singleton (3,) # -> (3) instead of (3) # -> 3
  • List (Array in other languages), ordered sequence of values
    • Mutable
    • Singleton [3]
    • Cloning new_array = origin_array[:]
    • List comprehension [x**2 for x in range(1,7)] gives you [1,4,9,16,25,36] (Not readable)

Using list may also cause an aliasing bug (two distinct paths pointing to the same object).


回答 16

列表是可变的,元组是不可变的。只要考虑这个例子。

a = ["1", "2", "ra", "sa"]    #list
b = ("1", "2", "ra", "sa")    #tuple

现在更改list和tuple的索引值。

a[2] = 1000
print a     #output : ['1', '2', 1000, 'sa']
b[2] = 1000
print b     #output : TypeError: 'tuple' object does not support item assignment.

因此证明了以下代码对元组无效,因为我们试图更新一个元组,这是不允许的。

Lists are mutable and tuples are immutable. Just consider this example.

a = ["1", "2", "ra", "sa"]    #list
b = ("1", "2", "ra", "sa")    #tuple

Now change index values of list and tuple.

a[2] = 1000
print a     #output : ['1', '2', 1000, 'sa']
b[2] = 1000
print b     #output : TypeError: 'tuple' object does not support item assignment.

Hence proved the following code is invalid with tuple, because we attempted to update a tuple, which is not allowed.


回答 17

列表是可变的,元组是不可变的。可变项和不可变项之间的主要区别是在尝试附加项目时的内存使用情况。

创建变量时,会将一些固定内存分配给该变量。如果是列表,则分配的内存将大于实际使用的内存。例如,如果当前内存分配为100字节,则当您要追加第101个字节时,可能会另外分配100个字节(在这种情况下,总共为200个字节)。

但是,如果您知道不经常添加新元素,则应使用元组。元组精确分配所需的内存大小,从而节省了内存,尤其是在使用大容量内存块时。

List is mutable and tuples is immutable. The main difference between mutable and immutable is memory usage when you are trying to append an item.

When you create a variable, some fixed memory is assigned to the variable. If it is a list, more memory is assigned than actually used. E.g. if current memory assignment is 100 bytes, when you want to append the 101th byte, maybe another 100 bytes will be assigned (in total 200 bytes in this case).

However, if you know that you are not frequently add new elements, then you should use tuples. Tuples assigns exactly size of the memory needed, and hence saves memory, especially when you use large blocks of memory.


venv,pyvenv,pyenv,virtualenv,virtualenvwrapper,pipenv等有什么区别?

问题:venv,pyvenv,pyenv,virtualenv,virtualenvwrapper,pipenv等有什么区别?

Python 3.3在其标准库中包含了新软件包venv。它有什么作用?与似乎与regex匹配的所有其他软件包(py)?(v|virtual|pip)?env有何不同?

Python 3.3 includes in its standard library the new package venv. What does it do, and how does it differ from all the other packages that seem to match the regex (py)?(v|virtual|pip)?env?


回答 0

PyPI软件包不在标准库中:

  • virtualenv是一个非常流行的工具,可为Python库创建隔离的Python环境。如果您不熟悉此工具,我强烈建议您学习它,因为它是非常有用的工具,在本答案的其余部分中,我将对其进行比较。

    它的工作方式是在目录(例如:)中安装一堆文件env/,然后修改PATH环境变量以在其之前添加自定义bin目录(例如:)env/bin/。在完全相同的副本pythonpython3二进制文件放在这个目录中,但是Python编程寻找相对于其路径优先库,环境中的目录。它不是Python标准库的一部分,但是受到PyPA(Python包装管理局)的正式认可。激活后,您可以使用在虚拟环境中安装软件包pip

  • pyenv用于隔离Python版本。例如,您可能想针对Python 2.7、3.6、3.7和3.8测试代码,因此需要一种在它们之间切换的方法。一旦被激活,它的前缀PATH与环境变量~/.pyenv/shims,那里有专用的文件相匹配的Python命令(pythonpip)。这些不是Python附带命令的副本。它们是特殊的脚本,它们可以根据PYENV_VERSION环境变量,.python-version文件或~/.pyenv/version文件即时确定要运行哪个版本的Python 。pyenv使用命令,还可以简化下载和安装多个Python版本的过程pyenv install

  • pyenv-virtualenv是一个插件pyenv由同一作者的pyenv,允许你使用pyenvvirtualenv在同一时间方便。但是,如果您使用的是Python 3.3或更高版本,请pyenv-virtualenv尝试运行python -m venv它(如果有),而不是virtualenv。如果您不希望使用便利功能,则可以在不使用的情况下一起使用virtualenv和。pyenvpyenv-virtualenv

  • virtualenvwrappervirtualenv(参见docs)的一组扩展。它为您提供诸如mkvirtualenv,的命令,lssitepackages尤其是workon在不同virtualenv目录之间切换时。如果您需要多个virtualenv目录,此工具特别有用。

  • pyenv-virtualenvwrapperpyenv与作者相同的插件pyenv,可以方便地集成virtualenvwrapperpyenv

  • pipenv旨在结合Pipfilepipvirtualenv为在命令行一个命令。该virtualenv目录通常放置在中~/.local/share/virtualenvs/XXXXXX是项目目录路径的哈希值。这与不同virtualenv,后者的目录通常位于当前工作目录中。pipenv是指在开发Python应用程序(而不是库)时使用。还有的替代品pipenv,例如poetry,我将不在此处列出,因为该问题仅与名称相似的软件包有关。

标准库:

  • pyvenv是Python 3附带的脚本,但由于存在问题(更不用说混乱的名称了)而在Python 3.6中不推荐使用。在Python 3.6及更高版本中,确切的等效项是python3 -m venv

  • venv是Python 3附带的软件包,您可以使用它运行python3 -m venv(尽管出于某些原因,某些发行版将其分成了单独的发行版软件包,例如python3-venv在Ubuntu / Debian上)。它的作用与相同virtualenv,但仅具有部分功能(请参见此处的比较)。virtualenv继续比受欢迎venv,尤其是因为前者同时支持Python 2和3。

给初学者的建议:

这是我对初学者的个人建议:首先学习virtualenvpip,这些工具可在各种情况下与Python 2和3一起使用,并在需要时选择其他工具。

PyPI packages not in the standard library:

  • virtualenv is a very popular tool that creates isolated Python environments for Python libraries. If you’re not familiar with this tool, I highly recommend learning it, as it is a very useful tool, and I’ll be making comparisons to it for the rest of this answer.

    It works by installing a bunch of files in a directory (eg: env/), and then modifying the PATH environment variable to prefix it with a custom bin directory (eg: env/bin/). An exact copy of the python or python3 binary is placed in this directory, but Python is programmed to look for libraries relative to its path first, in the environment directory. It’s not part of Python’s standard library, but is officially blessed by the PyPA (Python Packaging Authority). Once activated, you can install packages in the virtual environment using pip.

  • pyenv is used to isolate Python versions. For example, you may want to test your code against Python 2.7, 3.6, 3.7 and 3.8, so you’ll need a way to switch between them. Once activated, it prefixes the PATH environment variable with ~/.pyenv/shims, where there are special files matching the Python commands (python, pip). These are not copies of the Python-shipped commands; they are special scripts that decide on the fly which version of Python to run based on the PYENV_VERSION environment variable, or the .python-version file, or the ~/.pyenv/version file. pyenv also makes the process of downloading and installing multiple Python versions easier, using the command pyenv install.

  • pyenv-virtualenv is a plugin for pyenv by the same author as pyenv, to allow you to use pyenv and virtualenv at the same time conveniently. However, if you’re using Python 3.3 or later, pyenv-virtualenv will try to run python -m venv if it is available, instead of virtualenv. You can use virtualenv and pyenv together without pyenv-virtualenv, if you don’t want the convenience features.

  • virtualenvwrapper is a set of extensions to virtualenv (see docs). It gives you commands like mkvirtualenv, lssitepackages, and especially workon for switching between different virtualenv directories. This tool is especially useful if you want multiple virtualenv directories.

  • pyenv-virtualenvwrapper is a plugin for pyenv by the same author as pyenv, to conveniently integrate virtualenvwrapper into pyenv.

  • pipenv aims to combine Pipfile, pip and virtualenv into one command on the command-line. The virtualenv directory typically gets placed in ~/.local/share/virtualenvs/XXX, with XXX being a hash of the path of the project directory. This is different from virtualenv, where the directory is typically in the current working directory. pipenv is meant to be used when developing Python applications (as opposed to libraries). There are alternatives to pipenv, such as poetry, which I won’t list here since this question is only about the packages that are similarly named.

Standard library:

  • pyvenv is a script shipped with Python 3 but deprecated in Python 3.6 as it had problems (not to mention the confusing name). In Python 3.6+, the exact equivalent is python3 -m venv.

  • venv is a package shipped with Python 3, which you can run using python3 -m venv (although for some reason some distros separate it out into a separate distro package, such as python3-venv on Ubuntu/Debian). It serves the same purpose as virtualenv, but only has a subset of its features (see a comparison here). virtualenv continues to be more popular than venv, especially since the former supports both Python 2 and 3.

Recommendation for beginners:

This is my personal recommendation for beginners: start by learning virtualenv and pip, tools which work with both Python 2 and 3 and in a variety of situations, and pick up other tools once you start needing them.


回答 1

我只是避免virtualenv在Python3.3 +之后使用,而是使用标准附带的库venv。要创建新的虚拟环境,请输入:

$ python3 -m venv <MYVENV>  

virtualenv尝试将Python二进制文件复制到虚拟环境的bin目录中。但是,它不会更新嵌入到该二进制文件中的库文件链接,因此,如果您将Python从源代码构建到具有相对路径名的非系统目录中,则Python二进制文件会中断。由于这是使副本可分发的Python的方式,因此这是一个很大的缺陷。BTW使用来检查OS X上的嵌入式库文件链接otool。例如,在您的虚拟环境中,键入:

$ otool -L bin/python
python:
    @executable_path/../Python (compatibility version 3.4.0, current version 3.4.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.0.0)

因此,我会避免virtualenvwrapperpipenvpyvenv不推荐使用。pyenv似乎是经常使用的地方virtualenv使用,但我会远离它也因为我觉得venv还做什么pyenv是对建。

venv使用用户可安装的库在外壳中创建新的沙盒化的虚拟环境,并且它是多Python安全的新鲜的,因为虚拟环境只能用标准库启动船舶与Python,你必须与各地重新安装任何其他库,而虚拟环境是积极的。沙盒化,因为在虚拟环境外部看不到这些新库安装,因此您可以删除整个环境并重新启动,而不必担心会影响基本的python安装。用户可安装的库,因为创建虚拟环境的目标文件夹时没有pip installsudo在您已经拥有的某个目录中,因此您不需要sudo权限就可以在其中安装库。最终,它是多python安全的,因为在激活虚拟环境时,shell仅看到用于构建该虚拟环境的python版本(3.4、3.5等)。

pyenv类似于venv,它可以让您管理多个python环境。但是,pyenv由于无法方便地将库安装回滚到某些开始状态,因此您admin有时可能需要特权来更新库。所以我认为也最好使用venv

在过去的两年中,我发现了构建系统中的许多问题(emacs软件包,python独立应用程序构建器,安装程序…),最终归结为virtualenv。我认为当我们取消此附加选项并仅使用时,python将是一个更好的平台venv

I would just avoid the use of virtualenv after Python3.3+ and instead use the standard shipped library venv. To create a new virtual environment you would type:

$ python3 -m venv <MYVENV>  

virtualenv tries to copy the Python binary into the virtual environment’s bin directory. However it does not update library file links embedded into that binary, so if you build Python from source into a non-system directory with relative path names, the Python binary breaks. Since this is how you make a copy distributable Python, it is a big flaw. BTW to inspect embedded library file links on OS X, use otool. For example from within your virtual environment, type:

$ otool -L bin/python
python:
    @executable_path/../Python (compatibility version 3.4.0, current version 3.4.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.0.0)

Consequently I would avoid virtualenvwrapper and pipenv. pyvenv is deprecated. pyenv seems to be used often where virtualenv is used but I would stay away from it also since I think venv also does what pyenv is built for.

venv creates virtual environments in the shell that are fresh and sandboxed, with user-installable libraries, and it’s multi-python safe. Fresh because virtual environments only start with the standard libraries that ship with python, you have to install any other libraries all over again with pip install while the virtual environment is active. Sandboxed because none of these new library installs are visible outside the virtual environment, so you can delete the whole environment and start again without worrying about impacting your base python install. User-installable libraries because the virtual environment’s target folder is created without sudo in some directory you already own, so you won’t need sudo permissions to install libraries into it. Finally it is multi-python safe, since when virtual environments activate, the shell only sees the python version (3.4, 3.5 etc.) that was used to build that virtual environment.

pyenv is similar to venv in that it lets you manage multiple python environments. However with pyenv you can’t conveniently rollback library installs to some start state and you will likely need admin privileges at some point to update libraries. So I think it is also best to use venv.

In the last couple of years I have found many problems in build systems (emacs packages, python standalone application builders, installers…) that ultimately come down to issues with virtualenv. I think python will be a better platform when we eliminate this additional option and only use venv.


回答 2

我掉进了pipenv兔子洞(的确是个黑洞和黑洞……),因为最后一个答案是两年多以前的,所以觉得有必要用有关Python虚拟信封主题的最新进展来更新讨论非常有用。找到了。

免责声明:

这个答案是不是对继续有关的优点的激烈参数pipenv VENV如信封解决方案- 我并没有任代言。这是关于PyPA赞同冲突的标准,以及如何未来发展的virtualenv承诺否定制造要么/或它们之间选择的话。我专注于这两个工具正是因为它们是PyPA的受膏工具。

静脉

如OP所述,venv是用于虚拟化环境的工具。不是第三方解决方案,而是本机工具。PyPA认可venv用于创建虚拟信封:“ 在3.5版中进行了更改:现在建议使用venv创建虚拟环境 ”。

吹牛

pipenv- venv-可以用于创建虚拟信封,但是还可以引入包管理和漏洞检查功能。通过使用 Pipfile交付软件包管理requirements.txt,而不是使用。当 PyPA认可pipenv用于包管理时,这似乎意味着取代了。pipenvpipfilerequirements.txt

但是pipenv使用virtualenv作为创建虚拟信封的工具,而不是 venvPyPA认可它为创建虚拟信封的必备工具。

标准冲突:

因此,如果解决虚拟信封解决方案还不够困难,那么我们现在让PyPA认可使用不同虚拟信封解决方案的两个不同工具。Github关于venv vs virtualenv的激烈辩论可以在这里找到该冲突的重点。

解决冲突:

上面链接中提到的Github辩论已经引导了virtualenv的开发,以适应将来的发行版中venv发展

首选内置venv:如果目标python拥有venv,我们将使用该环境创建环境(然后对其进行后续操作以促进我们提供的其他保证)

结论:

因此,看起来这两个相互竞争的虚拟信封解决方案之间将会有一些未来的融合,但是截至目前,pipenv(使用的)virtualenv与有所不同venv

鉴于pipenv解决的问题以及PyPA给予的祝福,它似乎拥有光明的前景。而且,如果virtualenv实现了其建议的开发目标,那么选择pipenvvenv不再是选择虚拟信封解决方案的理由

I’ve went down the pipenv rabbit hole (it’s a deep and dark hole indeed…) and since the last answer is over 2 years ago, felt it was useful to update the discussion with the latest developments on the Python virtual envelopes topic I’ve found.

DISCLAIMER:

This answer is NOT about continuing the raging debate about the merits of pipenv versus venv as envelope solutions- I make no endorsement of either. It’s about PyPA endorsing conflicting standards and how future development of virtualenv promises to negate making an either/or choice between them at all. I focused on these two tools precisely because they are the anointed ones by PyPA.

venv

As the OP notes, venv is a tool for virtualizing environments. NOT a third party solution, but native tool. PyPA endorses venv for creating VIRTUAL ENVELOPES: “Changed in version 3.5: The use of venv is now recommended for creating virtual environments“.

pipenv

pipenv– like venv – can be used to create virtual envelopes but additionally rolls-in package management and vulnerability checking functionality. Instead of using requirements.txt, pipenv delivers package management via Pipfile. As PyPA endorses pipenv for PACKAGE MANAGEMENT, that would seem to imply pipfile is to supplant requirements.txt.

HOWEVER: pipenv uses virtualenv as its tool for creating virtual envelopes, NOT venv which is endorsed by PyPA as the go-to tool for creating virtual envelopes.

Conflicting Standards:

So if settling on a virtual envelope solution wasn’t difficult enough, we now have PyPA endorsing two different tools which use different virtual envelope solutions. The raging Github debate on venv vs virtualenv which highlights this conflict can be found here.

Conflict Resolution:

The Github debate referenced in above link has steered virtualenv development in the direction of accommodating venv in future releases:

prefer built-in venv: if the target python has venv we’ll create the environment using that (and then perform subsequent operations on that to facilitate other guarantees we offer)

Conclusion:

So it looks like there will be some future convergence between the two rival virtual envelope solutions, but as of now pipenv– which uses virtualenv – varies materially from venv.

Given the problems pipenv solves and the fact that PyPA has given its blessing, it appears to have a bright future. And if virtualenv delivers on its proposed development objectives, choosing a virtual envelope solution should no longer be a case of either pipenv OR venv.


回答 3

2020年4月更新

当我看到这篇文章时,我正在寻找相同的内容。我认为对于像我这样的新Python用户而言,使用什么工具这个问题非常令人困惑和困难。这直接来自PyPA网站上关于pipenv的信息:

虽然本教程将pipenv项目作为工具主要集中在Python应用程序开发而不是Python库开发上,但该项目本身目前正在解决多个流程和维护问题,这些问题阻止了bug修复和新功能的发布(整个2019年过去了,而没有新版本)。这意味着,在短期内,pipenv仍然会遇到一些怪癖和性能问题,而没有明确解决这些问题的时间表。

尽管情况仍然如此,但项目维护人员可能希望研究其他用于应用程序依赖性管理的工具,以代替pipenv或与之一起使用。

假设2020年4月发布的pipenv按计划进行,此后的发布也仍在进行中,那么该教程中的警告将被删除。如果这些发行版不符合要求,那么教程本身将被删除,并替换为可用的依赖项管理选项上的讨论页。

April 2020 Update

I was searching for same when I came across this post. I think this issue of what tool to use is quite confusing and difficult for new Python users like me. This is directly from PyPA website regarding pipenv:

While this tutorial covers the pipenv project as a tool that focuses primarily on the needs of Python application development rather than Python library development, the project itself is currently working through several process and maintenance issues that are preventing bug fixes and new features from being published (with the entirety of 2019 passing without a new release). This means that in the near term, pipenv still suffers from several quirks and performance problems without a clear timeline for resolution of those isses.

While this remains the case, project maintainers are likely to want to investigate Other Tools for Application Dependency Management for use instead of, or together with, pipenv.

Assuming the April 2020 pipenv release goes ahead as planned, and the release after that also remains on track, then this caveat on the tutorial will be removed. If those releases don’t remain on track, then the tutorial itself will be removed, and replaced with a discussion page on the available dependency management options.


从pandas DataFrame列标题获取列表

问题:从pandas DataFrame列标题获取列表

我想从pandas DataFrame获取列标题的列表。DataFrame来自用户输入,所以我不知道会有多少列或它们将被称为什么。

例如,如果给我这样的数据框:

>>> my_dataframe
    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7
5   4    8    3
6   8    2    8
7   9    9   10
8   6    6    4
9  10   10    7

我想要一个这样的列表:

>>> header_list
['y', 'gdp', 'cap']

I want to get a list of the column headers from a pandas DataFrame. The DataFrame will come from user input so I won’t know how many columns there will be or what they will be called.

For example, if I’m given a DataFrame like this:

>>> my_dataframe
    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7
5   4    8    3
6   8    2    8
7   9    9   10
8   6    6    4
9  10   10    7

I would want to get a list like this:

>>> header_list
['y', 'gdp', 'cap']

回答 0

您可以通过执行以下操作以列表形式获取值:

list(my_dataframe.columns.values)

您也可以简单地使用:(如Ed Chum的答案所示):

list(my_dataframe)

You can get the values as a list by doing:

list(my_dataframe.columns.values)

Also you can simply use: (as shown in Ed Chum’s answer):

list(my_dataframe)

回答 1

有一个内置的方法是最有效的:

my_dataframe.columns.values.tolist()

.columns返回一个索引,.columns.values返回一个数组,并且它具有一个帮助函数.tolist来返回列表。

如果性能对您不那么重要,则Index对象定义一种.tolist()可以直接调用的方法:

my_dataframe.columns.tolist()

性能差异很明显:

%timeit df.columns.tolist()
16.7 µs ± 317 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit df.columns.values.tolist()
1.24 µs ± 12.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

对于那些谁讨厌打字,你可以叫listdf,像这样:

list(df)

There is a built in method which is the most performant:

my_dataframe.columns.values.tolist()

.columns returns an Index, .columns.values returns an array and this has a helper function .tolist to return a list.

If performance is not as important to you, Index objects define a .tolist() method that you can call directly:

my_dataframe.columns.tolist()

The difference in performance is obvious:

%timeit df.columns.tolist()
16.7 µs ± 317 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit df.columns.values.tolist()
1.24 µs ± 12.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

For those who hate typing, you can just call list on df, as so:

list(df)

回答 2

做了一些快速测试,使用内置版本dataframe.columns.values.tolist()最快的也许并不奇怪:

In [1]: %timeit [column for column in df]
1000 loops, best of 3: 81.6 µs per loop

In [2]: %timeit df.columns.values.tolist()
10000 loops, best of 3: 16.1 µs per loop

In [3]: %timeit list(df)
10000 loops, best of 3: 44.9 µs per loop

In [4]: % timeit list(df.columns.values)
10000 loops, best of 3: 38.4 µs per loop

list(dataframe)尽管我还是很喜欢,所以谢谢EdChum!)

Did some quick tests, and perhaps unsurprisingly the built-in version using dataframe.columns.values.tolist() is the fastest:

In [1]: %timeit [column for column in df]
1000 loops, best of 3: 81.6 µs per loop

In [2]: %timeit df.columns.values.tolist()
10000 loops, best of 3: 16.1 µs per loop

In [3]: %timeit list(df)
10000 loops, best of 3: 44.9 µs per loop

In [4]: % timeit list(df.columns.values)
10000 loops, best of 3: 38.4 µs per loop

(I still really like the list(dataframe) though, so thanks EdChum!)


回答 3

它变得更加简单(通过pandas 0.16.0):

df.columns.tolist()

将在一个不错的列表中为您提供列名。

Its gets even simpler (by pandas 0.16.0) :

df.columns.tolist()

will give you the column names in a nice list.


回答 4

>>> list(my_dataframe)
['y', 'gdp', 'cap']

要在调试器模式下列出数据帧的列,请使用列表推导:

>>> [c for c in my_dataframe]
['y', 'gdp', 'cap']

顺便说一句,您可以使用sorted以下命令简单地得到一个排序列表:

>>> sorted(my_dataframe)
['cap', 'gdp', 'y']
>>> list(my_dataframe)
['y', 'gdp', 'cap']

To list the columns of a dataframe while in debugger mode, use a list comprehension:

>>> [c for c in my_dataframe]
['y', 'gdp', 'cap']

By the way, you can get a sorted list simply by using sorted:

>>> sorted(my_dataframe)
['cap', 'gdp', 'y']

回答 5

很奇怪,到目前为止我还没有看到这个帖子,所以我就把它留在这里。

扩展的可迭代解压缩(python3.5 +):[*df]和Friends

Python 3.5引入了拆包概述(PEP 448)。因此,以下操作都是可能的。

df = pd.DataFrame('x', columns=['A', 'B', 'C'], index=range(5))
df

   A  B  C
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x 

如果你想要一个list….

[*df]
# ['A', 'B', 'C']

或者,如果您想要一个set

{*df}
# {'A', 'B', 'C'}

或者,如果您想要一个tuple

*df,  # Please note the trailing comma
# ('A', 'B', 'C')

或者,如果您要将结果存储在某处,

*cols, = df  # A wild comma appears, again
cols
# ['A', 'B', 'C']

…如果您是那种将咖啡转换成打字声音的人,那么,这将更有效地消耗您的咖啡;)

PS:如果性能很重要,那么您最好放弃上述解决方案,而选择

df.columns.to_numpy().tolist()
# ['A', 'B', 'C']

这与Ed Chum的答案类似,但针对v0.24进行了更新,而v0.24 .to_numpy()则首选使用.values。有关更多信息,请参阅 此答案(我本人)。

视觉检查
由于我已经在其他答案中看到了这一点,因此可以使用可迭代的拆包(无需显式循环)。

print(*df)
A B C

print(*df, sep='\n')
A
B
C

批判其他方法

不要for对可以在一行中完成的操作使用显式循环(列表理解是可以的)。

接下来,using sorted(df) 不会保留的原始顺序。为此,您应该list(df)改用。

接下来,list(df.columns)list(df.columns.values)差的建议(为当前版本,v0.24)。两者Index(从返回df.columns)和NumPy的阵列(由返回df.columns.values)限定.tolist()方法,该方法是更快和更惯用。

最后,列表化,即,list(df)仅应作为上述python <= 3.4方法的简明替代方法,其中python <= 3.4无法扩展扩展。

Surprised I haven’t seen this posted so far, so I’ll just leave this here.

Extended Iterable Unpacking (python3.5+): [*df] and Friends

Unpacking generalizations (PEP 448) have been introduced with Python 3.5. So, the following operations are all possible.

df = pd.DataFrame('x', columns=['A', 'B', 'C'], index=range(5))
df

   A  B  C
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x 

If you want a list….

[*df]
# ['A', 'B', 'C']

Or, if you want a set,

{*df}
# {'A', 'B', 'C'}

Or, if you want a tuple,

*df,  # Please note the trailing comma
# ('A', 'B', 'C')

Or, if you want to store the result somewhere,

*cols, = df  # A wild comma appears, again
cols
# ['A', 'B', 'C']

… if you’re the kind of person who converts coffee to typing sounds, well, this is going consume your coffee more efficiently ;)

P.S.: if performance is important, you will want to ditch the solutions above in favour of

df.columns.to_numpy().tolist()
# ['A', 'B', 'C']

This is similar to Ed Chum’s answer, but updated for v0.24 where .to_numpy() is preferred to the use of .values. See this answer (by me) for more information.

Visual Check
Since I’ve seen this discussed in other answers, you can utilise iterable unpacking (no need for explicit loops).

print(*df)
A B C

print(*df, sep='\n')
A
B
C

Critique of Other Methods

Don’t use an explicit for loop for an operation that can be done in a single line (List comprehensions are okay).

Next, using sorted(df) does not preserve the original order of the columns. For that, you should use list(df) instead.

Next, list(df.columns) and list(df.columns.values) are poor suggestions (as of the current version, v0.24). Both Index (returned from df.columns) and NumPy arrays (returned by df.columns.values) define .tolist() method which is faster and more idiomatic.

Lastly, listification i.e., list(df) should only be used as a concise alternative to the aforementioned methods for python <= 3.4 where extended unpacking is not available.


回答 6

可以作为my_dataframe.columns

That’s available as my_dataframe.columns.


回答 7

这很有趣,但是df.columns.values.tolist()快了将近三倍,df.columns.tolist()但我认为它们是相同的:

In [97]: %timeit df.columns.values.tolist()
100000 loops, best of 3: 2.97 µs per loop

In [98]: %timeit df.columns.tolist()
10000 loops, best of 3: 9.67 µs per loop

It’s interesting but df.columns.values.tolist() is almost 3 times faster then df.columns.tolist() but I thought that they are the same:

In [97]: %timeit df.columns.values.tolist()
100000 loops, best of 3: 2.97 µs per loop

In [98]: %timeit df.columns.tolist()
10000 loops, best of 3: 9.67 µs per loop

回答 8

一个数据帧遵循类似字典的遍历对象的“钥匙”的约定。

my_dataframe.keys()

创建键/列的列表-对象方法to_list()和pythonic方式

my_dataframe.keys().to_list()
list(my_dataframe.keys())

DataFrame的基本迭代返回列标签

[column for column in my_dataframe]

不要仅仅为了获取列标签而将DataFrame转换为列表。寻找方便的代码示例时,请不要停止思考。

xlarge = pd.DataFrame(np.arange(100000000).reshape(10000,10000))
list(xlarge) #compute time and memory consumption depend on dataframe size - O(N)
list(xlarge.keys()) #constant time operation - O(1)

A DataFrame follows the dict-like convention of iterating over the “keys” of the objects.

my_dataframe.keys()

Create a list of keys/columns – object method to_list() and pythonic way

my_dataframe.keys().to_list()
list(my_dataframe.keys())

Basic iteration on a DataFrame returns column labels

[column for column in my_dataframe]

Do not convert a DataFrame into a list, just to get the column labels. Do not stop thinking while looking for convenient code samples.

xlarge = pd.DataFrame(np.arange(100000000).reshape(10000,10000))
list(xlarge) #compute time and memory consumption depend on dataframe size - O(N)
list(xlarge.keys()) #constant time operation - O(1)

回答 9

在笔记本中

对于在IPython笔记本中进行数据探索,我的首选方式是:

sorted(df)

这将产生一个易于阅读的字母顺序列表。

在代码库中

在代码中,我发现这样做更加明确

df.columns

因为它告诉其他人阅读您的代码,您在做什么。

In the Notebook

For data exploration in the IPython notebook, my preferred way is this:

sorted(df)

Which will produce an easy to read alphabetically ordered list.

In a code repository

In code I find it more explicit to do

df.columns

Because it tells others reading your code what you are doing.


回答 10

%%timeit
final_df.columns.values.tolist()
948 ns ± 19.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%%timeit
list(final_df.columns)
14.2 µs ± 79.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
list(final_df.columns.values)
1.88 µs ± 11.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%%timeit
final_df.columns.tolist()
12.3 µs ± 27.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
list(final_df.head(1).columns)
163 µs ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit
final_df.columns.values.tolist()
948 ns ± 19.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%%timeit
list(final_df.columns)
14.2 µs ± 79.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
list(final_df.columns.values)
1.88 µs ± 11.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%%timeit
final_df.columns.tolist()
12.3 µs ± 27.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
list(final_df.head(1).columns)
163 µs ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

回答 11

正如Simeon Visser回答的那样…您可以

list(my_dataframe.columns.values) 

要么

list(my_dataframe) # for less typing.

但是我认为最甜蜜的地方是:

list(my_dataframe.columns)

很明显,与此同时不必太长。

as answered by Simeon Visser…you could do

list(my_dataframe.columns.values) 

or

list(my_dataframe) # for less typing.

But I think most the sweet spot is:

list(my_dataframe.columns)

It is explicit, at the same time not unnecessarily long.


回答 12

为了进行快速,整洁的外观检查,请尝试以下操作:

for col in df.columns:
    print col

For a quick, neat, visual check, try this:

for col in df.columns:
    print col

回答 13

这为我们提供了列表中列的名称:

list(my_dataframe.columns)

也可以使用另一个称为tolist()的函数:

my_dataframe.columns.tolist()

This gives us the names of columns in a list:

list(my_dataframe.columns)

Another function called tolist() can be used too:

my_dataframe.columns.tolist()

回答 14

我觉得问题值得进一步解释。

正如@fixxxer指出的,答案取决于您在项目中使用的熊猫版本。您可以通过pd.__version__命令获得。

如果您出于某种原因(在我的Debian jessie上使用0.14.1)使用了比0.16.0更旧的熊猫,那么您需要使用:

df.keys().tolist()因为还没有df.columns实现任何方法。

这种密钥方法的优点是,即使在较新版本的熊猫中也可以使用,因此更加通用。

I feel question deserves additional explanation.

As @fixxxer noted, the answer depends on the pandas version you are using in your project. Which you can get with pd.__version__ command.

If you are for some reason like me (on debian jessie I use 0.14.1) using older version of pandas than 0.16.0, then you need to use:

df.keys().tolist() because there is no df.columns method implemented yet.

The advantage of this keys method is, that it works even in newer version of pandas, so it’s more universal.


回答 15

n = []
for i in my_dataframe.columns:
    n.append(i)
print n
n = []
for i in my_dataframe.columns:
    n.append(i)
print n

回答 16

即使上面提供的解决方案很好。我也希望像frame.column_names()这样的东西在熊猫中是一个函数,但是由于不是,所以使用以下语法可能会很好。通过调用“ tolist”函数,它以某种方式保留了您以正确方式使用熊猫的感觉:frame.columns.tolist()

frame.columns.tolist() 

Even though the solution that was provided above is nice. I would also expect something like frame.column_names() to be a function in pandas, but since it is not, maybe it would be nice to use the following syntax. It somehow preserves the feeling that you are using pandas in a proper way by calling the “tolist” function: frame.columns.tolist()

frame.columns.tolist() 

回答 17

如果DataFrame恰好有一个Index或MultiIndex,并且您也希望将它们作为列名包括在内:

names = list(filter(None, df.index.names + df.columns.values.tolist()))

它避免了调用reset_index(),因为这种简单的操作会对性能造成不必要的影响。

我经常遇到这种情况,因为我正在从数据帧索引映射到主键/唯一键的数据库中穿梭数据,但实际上这只是我的另一个“列”。对于大熊猫来说,为这样的事情提供内置方法可能是有道理的(我完全可能错过了它)。

If the DataFrame happens to have an Index or MultiIndex and you want those included as column names too:

names = list(filter(None, df.index.names + df.columns.values.tolist()))

It avoids calling reset_index() which has an unnecessary performance hit for such a simple operation.

I’ve run into needing this more often because I’m shuttling data from databases where the dataframe index maps to a primary/unique key, but is really just another “column” to me. It would probably make sense for pandas to have a built-in method for something like this (totally possible I’ve missed it).


回答 18

此解决方案列出了对象my_dataframe的所有列:

print(list(my_dataframe))

This solution lists all the columns of your object my_dataframe:

print(list(my_dataframe))

如何在Python中便宜地获取大文件的行数?

问题:如何在Python中便宜地获取大文件的行数?

我需要在python中获取一个大文件(数十万行)的行数。内存和时间最有效的方法是什么?

目前,我这样做:

def file_len(fname):
    with open(fname) as f:
        for i, l in enumerate(f):
            pass
    return i + 1

有可能做得更好吗?

I need to get a line count of a large file (hundreds of thousands of lines) in python. What is the most efficient way both memory- and time-wise?

At the moment I do:

def file_len(fname):
    with open(fname) as f:
        for i, l in enumerate(f):
            pass
    return i + 1

is it possible to do any better?


回答 0

没有比这更好的了。

毕竟,任何解决方案都必须读取整个文件,找出有多少文件\n,然后返回该结果。

您是否有一种更好的方法,而无需读取整个文件?不确定…最好的解决方案将永远是受I / O约束的,您可以做的最好的事情是确保您不使用不必要的内存,但是看起来您已经解决了这一问题。

You can’t get any better than that.

After all, any solution will have to read the entire file, figure out how many \n you have, and return that result.

Do you have a better way of doing that without reading the entire file? Not sure… The best solution will always be I/O-bound, best you can do is make sure you don’t use unnecessary memory, but it looks like you have that covered.


回答 1

一行,可能非常快:

num_lines = sum(1 for line in open('myfile.txt'))

One line, probably pretty fast:

num_lines = sum(1 for line in open('myfile.txt'))

回答 2

我相信内存映射文件将是最快的解决方案。我尝试了四个函数:OP(opcount)发布的函数;文件(simplecount)中各行的简单迭代;带有内存映射字段(mmap)的readline(mapcount); 以及Mykola Kharechko(bufcount)提供的缓冲区读取解决方案。

我对每个函数运行了五次,并计算了120万行文本文件的平均运行时间。

Windows XP,Python 2.5、2GB RAM,2 GHz AMD处理器

这是我的结果:

mapcount : 0.465599966049
simplecount : 0.756399965286
bufcount : 0.546800041199
opcount : 0.718600034714

编辑:Python 2.6的数字:

mapcount : 0.471799945831
simplecount : 0.634400033951
bufcount : 0.468800067902
opcount : 0.602999973297

因此,缓冲区读取策略似乎对于Windows / Python 2.6是最快的

这是代码:

from __future__ import with_statement
import time
import mmap
import random
from collections import defaultdict

def mapcount(filename):
    f = open(filename, "r+")
    buf = mmap.mmap(f.fileno(), 0)
    lines = 0
    readline = buf.readline
    while readline():
        lines += 1
    return lines

def simplecount(filename):
    lines = 0
    for line in open(filename):
        lines += 1
    return lines

def bufcount(filename):
    f = open(filename)                  
    lines = 0
    buf_size = 1024 * 1024
    read_f = f.read # loop optimization

    buf = read_f(buf_size)
    while buf:
        lines += buf.count('\n')
        buf = read_f(buf_size)

    return lines

def opcount(fname):
    with open(fname) as f:
        for i, l in enumerate(f):
            pass
    return i + 1


counts = defaultdict(list)

for i in range(5):
    for func in [mapcount, simplecount, bufcount, opcount]:
        start_time = time.time()
        assert func("big_file.txt") == 1209138
        counts[func].append(time.time() - start_time)

for key, vals in counts.items():
    print key.__name__, ":", sum(vals) / float(len(vals))

I believe that a memory mapped file will be the fastest solution. I tried four functions: the function posted by the OP (opcount); a simple iteration over the lines in the file (simplecount); readline with a memory-mapped filed (mmap) (mapcount); and the buffer read solution offered by Mykola Kharechko (bufcount).

I ran each function five times, and calculated the average run-time for a 1.2 million-line text file.

Windows XP, Python 2.5, 2GB RAM, 2 GHz AMD processor

Here are my results:

mapcount : 0.465599966049
simplecount : 0.756399965286
bufcount : 0.546800041199
opcount : 0.718600034714

Edit: numbers for Python 2.6:

mapcount : 0.471799945831
simplecount : 0.634400033951
bufcount : 0.468800067902
opcount : 0.602999973297

So the buffer read strategy seems to be the fastest for Windows/Python 2.6

Here is the code:

from __future__ import with_statement
import time
import mmap
import random
from collections import defaultdict

def mapcount(filename):
    f = open(filename, "r+")
    buf = mmap.mmap(f.fileno(), 0)
    lines = 0
    readline = buf.readline
    while readline():
        lines += 1
    return lines

def simplecount(filename):
    lines = 0
    for line in open(filename):
        lines += 1
    return lines

def bufcount(filename):
    f = open(filename)                  
    lines = 0
    buf_size = 1024 * 1024
    read_f = f.read # loop optimization

    buf = read_f(buf_size)
    while buf:
        lines += buf.count('\n')
        buf = read_f(buf_size)

    return lines

def opcount(fname):
    with open(fname) as f:
        for i, l in enumerate(f):
            pass
    return i + 1


counts = defaultdict(list)

for i in range(5):
    for func in [mapcount, simplecount, bufcount, opcount]:
        start_time = time.time()
        assert func("big_file.txt") == 1209138
        counts[func].append(time.time() - start_time)

for key, vals in counts.items():
    print key.__name__, ":", sum(vals) / float(len(vals))

回答 3

我不得不将其发布到一个类似的问题上,直到我的声誉得分略有提高(这要感谢任何碰到我的人!)。

所有这些解决方案都忽略了一种使运行速度显着提高的方法,即使用无缓冲(原始)接口,使用字节数组以及自己进行缓冲。(这仅适用于Python3。在Python 2中,默认情况下可能会或可能不会使用raw接口,但是在Python 3中,您将默认使用Unicode。)

使用计时工具的修改版,我相信以下代码比提供的任何解决方案都更快(并且稍微多了一些pythonic):

def rawcount(filename):
    f = open(filename, 'rb')
    lines = 0
    buf_size = 1024 * 1024
    read_f = f.raw.read

    buf = read_f(buf_size)
    while buf:
        lines += buf.count(b'\n')
        buf = read_f(buf_size)

    return lines

使用单独的生成器函数,可以更快地运行:

def _make_gen(reader):
    b = reader(1024 * 1024)
    while b:
        yield b
        b = reader(1024*1024)

def rawgencount(filename):
    f = open(filename, 'rb')
    f_gen = _make_gen(f.raw.read)
    return sum( buf.count(b'\n') for buf in f_gen )

使用itertools内联生成器表达式可以完全完成此操作,但是看起来很奇怪:

from itertools import (takewhile,repeat)

def rawincount(filename):
    f = open(filename, 'rb')
    bufgen = takewhile(lambda x: x, (f.raw.read(1024*1024) for _ in repeat(None)))
    return sum( buf.count(b'\n') for buf in bufgen )

这是我的时间安排:

function      average, s  min, s   ratio
rawincount        0.0043  0.0041   1.00
rawgencount       0.0044  0.0042   1.01
rawcount          0.0048  0.0045   1.09
bufcount          0.008   0.0068   1.64
wccount           0.01    0.0097   2.35
itercount         0.014   0.014    3.41
opcount           0.02    0.02     4.83
kylecount         0.021   0.021    5.05
simplecount       0.022   0.022    5.25
mapcount          0.037   0.031    7.46

I had to post this on a similar question until my reputation score jumped a bit (thanks to whoever bumped me!).

All of these solutions ignore one way to make this run considerably faster, namely by using the unbuffered (raw) interface, using bytearrays, and doing your own buffering. (This only applies in Python 3. In Python 2, the raw interface may or may not be used by default, but in Python 3, you’ll default into Unicode.)

Using a modified version of the timing tool, I believe the following code is faster (and marginally more pythonic) than any of the solutions offered:

def rawcount(filename):
    f = open(filename, 'rb')
    lines = 0
    buf_size = 1024 * 1024
    read_f = f.raw.read

    buf = read_f(buf_size)
    while buf:
        lines += buf.count(b'\n')
        buf = read_f(buf_size)

    return lines

Using a separate generator function, this runs a smidge faster:

def _make_gen(reader):
    b = reader(1024 * 1024)
    while b:
        yield b
        b = reader(1024*1024)

def rawgencount(filename):
    f = open(filename, 'rb')
    f_gen = _make_gen(f.raw.read)
    return sum( buf.count(b'\n') for buf in f_gen )

This can be done completely with generators expressions in-line using itertools, but it gets pretty weird looking:

from itertools import (takewhile,repeat)

def rawincount(filename):
    f = open(filename, 'rb')
    bufgen = takewhile(lambda x: x, (f.raw.read(1024*1024) for _ in repeat(None)))
    return sum( buf.count(b'\n') for buf in bufgen )

Here are my timings:

function      average, s  min, s   ratio
rawincount        0.0043  0.0041   1.00
rawgencount       0.0044  0.0042   1.01
rawcount          0.0048  0.0045   1.09
bufcount          0.008   0.0068   1.64
wccount           0.01    0.0097   2.35
itercount         0.014   0.014    3.41
opcount           0.02    0.02     4.83
kylecount         0.021   0.021    5.05
simplecount       0.022   0.022    5.25
mapcount          0.037   0.031    7.46

回答 4

您可以执行一个子流程并运行 wc -l filename

import subprocess

def file_len(fname):
    p = subprocess.Popen(['wc', '-l', fname], stdout=subprocess.PIPE, 
                                              stderr=subprocess.PIPE)
    result, err = p.communicate()
    if p.returncode != 0:
        raise IOError(err)
    return int(result.strip().split()[0])

You could execute a subprocess and run wc -l filename

import subprocess

def file_len(fname):
    p = subprocess.Popen(['wc', '-l', fname], stdout=subprocess.PIPE, 
                                              stderr=subprocess.PIPE)
    result, err = p.communicate()
    if p.returncode != 0:
        raise IOError(err)
    return int(result.strip().split()[0])

回答 5

这是一个使用多处理库在机器/内核之间分配行数的python程序。我的测试使用8核Windows 64服务器将2000万行文件的计数从26秒提高到7秒。注意:不使用内存映射会使事情变慢。

import multiprocessing, sys, time, os, mmap
import logging, logging.handlers

def init_logger(pid):
    console_format = 'P{0} %(levelname)s %(message)s'.format(pid)
    logger = logging.getLogger()  # New logger at root level
    logger.setLevel( logging.INFO )
    logger.handlers.append( logging.StreamHandler() )
    logger.handlers[0].setFormatter( logging.Formatter( console_format, '%d/%m/%y %H:%M:%S' ) )

def getFileLineCount( queues, pid, processes, file1 ):
    init_logger(pid)
    logging.info( 'start' )

    physical_file = open(file1, "r")
    #  mmap.mmap(fileno, length[, tagname[, access[, offset]]]

    m1 = mmap.mmap( physical_file.fileno(), 0, access=mmap.ACCESS_READ )

    #work out file size to divide up line counting

    fSize = os.stat(file1).st_size
    chunk = (fSize / processes) + 1

    lines = 0

    #get where I start and stop
    _seedStart = chunk * (pid)
    _seekEnd = chunk * (pid+1)
    seekStart = int(_seedStart)
    seekEnd = int(_seekEnd)

    if seekEnd < int(_seekEnd + 1):
        seekEnd += 1

    if _seedStart < int(seekStart + 1):
        seekStart += 1

    if seekEnd > fSize:
        seekEnd = fSize

    #find where to start
    if pid > 0:
        m1.seek( seekStart )
        #read next line
        l1 = m1.readline()  # need to use readline with memory mapped files
        seekStart = m1.tell()

    #tell previous rank my seek start to make their seek end

    if pid > 0:
        queues[pid-1].put( seekStart )
    if pid < processes-1:
        seekEnd = queues[pid].get()

    m1.seek( seekStart )
    l1 = m1.readline()

    while len(l1) > 0:
        lines += 1
        l1 = m1.readline()
        if m1.tell() > seekEnd or len(l1) == 0:
            break

    logging.info( 'done' )
    # add up the results
    if pid == 0:
        for p in range(1,processes):
            lines += queues[0].get()
        queues[0].put(lines) # the total lines counted
    else:
        queues[0].put(lines)

    m1.close()
    physical_file.close()

if __name__ == '__main__':
    init_logger( 'main' )
    if len(sys.argv) > 1:
        file_name = sys.argv[1]
    else:
        logging.fatal( 'parameters required: file-name [processes]' )
        exit()

    t = time.time()
    processes = multiprocessing.cpu_count()
    if len(sys.argv) > 2:
        processes = int(sys.argv[2])
    queues=[] # a queue for each process
    for pid in range(processes):
        queues.append( multiprocessing.Queue() )
    jobs=[]
    prev_pipe = 0
    for pid in range(processes):
        p = multiprocessing.Process( target = getFileLineCount, args=(queues, pid, processes, file_name,) )
        p.start()
        jobs.append(p)

    jobs[0].join() #wait for counting to finish
    lines = queues[0].get()

    logging.info( 'finished {} Lines:{}'.format( time.time() - t, lines ) )

Here is a python program to use the multiprocessing library to distribute the line counting across machines/cores. My test improves counting a 20million line file from 26 seconds to 7 seconds using an 8 core windows 64 server. Note: not using memory mapping makes things much slower.

import multiprocessing, sys, time, os, mmap
import logging, logging.handlers

def init_logger(pid):
    console_format = 'P{0} %(levelname)s %(message)s'.format(pid)
    logger = logging.getLogger()  # New logger at root level
    logger.setLevel( logging.INFO )
    logger.handlers.append( logging.StreamHandler() )
    logger.handlers[0].setFormatter( logging.Formatter( console_format, '%d/%m/%y %H:%M:%S' ) )

def getFileLineCount( queues, pid, processes, file1 ):
    init_logger(pid)
    logging.info( 'start' )

    physical_file = open(file1, "r")
    #  mmap.mmap(fileno, length[, tagname[, access[, offset]]]

    m1 = mmap.mmap( physical_file.fileno(), 0, access=mmap.ACCESS_READ )

    #work out file size to divide up line counting

    fSize = os.stat(file1).st_size
    chunk = (fSize / processes) + 1

    lines = 0

    #get where I start and stop
    _seedStart = chunk * (pid)
    _seekEnd = chunk * (pid+1)
    seekStart = int(_seedStart)
    seekEnd = int(_seekEnd)

    if seekEnd < int(_seekEnd + 1):
        seekEnd += 1

    if _seedStart < int(seekStart + 1):
        seekStart += 1

    if seekEnd > fSize:
        seekEnd = fSize

    #find where to start
    if pid > 0:
        m1.seek( seekStart )
        #read next line
        l1 = m1.readline()  # need to use readline with memory mapped files
        seekStart = m1.tell()

    #tell previous rank my seek start to make their seek end

    if pid > 0:
        queues[pid-1].put( seekStart )
    if pid < processes-1:
        seekEnd = queues[pid].get()

    m1.seek( seekStart )
    l1 = m1.readline()

    while len(l1) > 0:
        lines += 1
        l1 = m1.readline()
        if m1.tell() > seekEnd or len(l1) == 0:
            break

    logging.info( 'done' )
    # add up the results
    if pid == 0:
        for p in range(1,processes):
            lines += queues[0].get()
        queues[0].put(lines) # the total lines counted
    else:
        queues[0].put(lines)

    m1.close()
    physical_file.close()

if __name__ == '__main__':
    init_logger( 'main' )
    if len(sys.argv) > 1:
        file_name = sys.argv[1]
    else:
        logging.fatal( 'parameters required: file-name [processes]' )
        exit()

    t = time.time()
    processes = multiprocessing.cpu_count()
    if len(sys.argv) > 2:
        processes = int(sys.argv[2])
    queues=[] # a queue for each process
    for pid in range(processes):
        queues.append( multiprocessing.Queue() )
    jobs=[]
    prev_pipe = 0
    for pid in range(processes):
        p = multiprocessing.Process( target = getFileLineCount, args=(queues, pid, processes, file_name,) )
        p.start()
        jobs.append(p)

    jobs[0].join() #wait for counting to finish
    lines = queues[0].get()

    logging.info( 'finished {} Lines:{}'.format( time.time() - t, lines ) )

回答 6

使用现代函数的类似于此答案的单行bash解决方案subprocess.check_output

def line_count(filename):
    return int(subprocess.check_output(['wc', '-l', filename]).split()[0])

A one-line bash solution similar to this answer, using the modern subprocess.check_output function:

def line_count(filename):
    return int(subprocess.check_output(['wc', '-l', filename]).split()[0])

回答 7

我将使用Python的文件对象方法readlines,如下所示:

with open(input_file) as foo:
    lines = len(foo.readlines())

这将打开文件,在文件中创建行列表,计算列表的长度,将其保存到变量中,然后再次关闭文件。

I would use Python’s file object method readlines, as follows:

with open(input_file) as foo:
    lines = len(foo.readlines())

This opens the file, creates a list of lines in the file, counts the length of the list, saves that to a variable and closes the file again.


回答 8

def file_len(full_path):
  """ Count number of lines in a file."""
  f = open(full_path)
  nr_of_lines = sum(1 for line in f)
  f.close()
  return nr_of_lines
def file_len(full_path):
  """ Count number of lines in a file."""
  f = open(full_path)
  nr_of_lines = sum(1 for line in f)
  f.close()
  return nr_of_lines

回答 9

这是我用的,看起来很干净:

import subprocess

def count_file_lines(file_path):
    """
    Counts the number of lines in a file using wc utility.
    :param file_path: path to file
    :return: int, no of lines
    """
    num = subprocess.check_output(['wc', '-l', file_path])
    num = num.split(' ')
    return int(num[0])

更新:这比使用纯python快一点,但是以内存使用为代价。子进程将在执行命令时派生一个与父进程具有相同内存占用量的新进程。

Here is what I use, seems pretty clean:

import subprocess

def count_file_lines(file_path):
    """
    Counts the number of lines in a file using wc utility.
    :param file_path: path to file
    :return: int, no of lines
    """
    num = subprocess.check_output(['wc', '-l', file_path])
    num = num.split(' ')
    return int(num[0])

UPDATE: This is marginally faster than using pure python but at the cost of memory usage. Subprocess will fork a new process with the same memory footprint as the parent process while it executes your command.


回答 10

这是我发现使用纯python最快的东西。您可以通过设置缓冲区使用任意数量的内存,尽管2 ** 16似乎是我计算机上的最佳选择。

from functools import partial

buffer=2**16
with open(myfile) as f:
        print sum(x.count('\n') for x in iter(partial(f.read,buffer), ''))

我在这里找到了答案,为什么在C ++中从stdin读取行比在Python中慢?并稍作调整。这是一本很好的文章,可以理解如何快速计数行数,尽管wc -l仍然比其他任何东西都要快约75%。

This is the fastest thing I have found using pure python. You can use whatever amount of memory you want by setting buffer, though 2**16 appears to be a sweet spot on my computer.

from functools import partial

buffer=2**16
with open(myfile) as f:
        print sum(x.count('\n') for x in iter(partial(f.read,buffer), ''))

I found the answer here Why is reading lines from stdin much slower in C++ than Python? and tweaked it just a tiny bit. Its a very good read to understand how to count lines quickly, though wc -l is still about 75% faster than anything else.


回答 11

我对该版本进行了小幅改进(4-8%),该版本重新使用了常量缓冲区,因此应避免任何内存或GC开销:

lines = 0
buffer = bytearray(2048)
with open(filename) as f:
  while f.readinto(buffer) > 0:
      lines += buffer.count('\n')

您可以尝试使用缓冲区大小,并且可能会看到一些改进。

I got a small (4-8%) improvement with this version which re-uses a constant buffer so it should avoid any memory or GC overhead:

lines = 0
buffer = bytearray(2048)
with open(filename) as f:
  while f.readinto(buffer) > 0:
      lines += buffer.count('\n')

You can play around with the buffer size and maybe see a little improvement.


回答 12

凯尔的答案

num_lines = sum(1 for line in open('my_file.txt'))

可能是最好的,对此的替代方法是

num_lines =  len(open('my_file.txt').read().splitlines())

这是两者的性能比较

In [20]: timeit sum(1 for line in open('Charts.ipynb'))
100000 loops, best of 3: 9.79 µs per loop

In [21]: timeit len(open('Charts.ipynb').read().splitlines())
100000 loops, best of 3: 12 µs per loop

Kyle’s answer

num_lines = sum(1 for line in open('my_file.txt'))

is probably best, an alternative for this is

num_lines =  len(open('my_file.txt').read().splitlines())

Here is the comparision of performance of both

In [20]: timeit sum(1 for line in open('Charts.ipynb'))
100000 loops, best of 3: 9.79 µs per loop

In [21]: timeit len(open('Charts.ipynb').read().splitlines())
100000 loops, best of 3: 12 µs per loop

回答 13

一线解决方案:

import os
os.system("wc -l  filename")  

我的片段:

>>> os.system('wc -l *.txt')

0 bar.txt
1000 command.txt
3 test_file.txt
1003 total

One line solution:

import os
os.system("wc -l  filename")  

My snippet:

>>> os.system('wc -l *.txt')

0 bar.txt
1000 command.txt
3 test_file.txt
1003 total

回答 14

为了完成上述方法,我尝试了使用fileinput模块的变体:

import fileinput as fi   
def filecount(fname):
        for line in fi.input(fname):
            pass
        return fi.lineno()

并向上述所有方法传递了6000万行文件:

mapcount : 6.1331050396
simplecount : 4.588793993
opcount : 4.42918205261
filecount : 43.2780818939
bufcount : 0.170812129974

我感到有些惊讶的是,fileinput的性能和扩展性都比其他方法差很多。

Just to complete the above methods I tried a variant with the fileinput module:

import fileinput as fi   
def filecount(fname):
        for line in fi.input(fname):
            pass
        return fi.lineno()

And passed a 60mil lines file to all the above stated methods:

mapcount : 6.1331050396
simplecount : 4.588793993
opcount : 4.42918205261
filecount : 43.2780818939
bufcount : 0.170812129974

It’s a little surprise to me that fileinput is that bad and scales far worse than all the other methods…


回答 15

对于我来说,这种变体将是最快的:

#!/usr/bin/env python

def main():
    f = open('filename')                  
    lines = 0
    buf_size = 1024 * 1024
    read_f = f.read # loop optimization

    buf = read_f(buf_size)
    while buf:
        lines += buf.count('\n')
        buf = read_f(buf_size)

    print lines

if __name__ == '__main__':
    main()

原因:缓冲比逐行读取更快,而且string.count速度也非常快

As for me this variant will be the fastest:

#!/usr/bin/env python

def main():
    f = open('filename')                  
    lines = 0
    buf_size = 1024 * 1024
    read_f = f.read # loop optimization

    buf = read_f(buf_size)
    while buf:
        lines += buf.count('\n')
        buf = read_f(buf_size)

    print lines

if __name__ == '__main__':
    main()

reasons: buffering faster than reading line by line and string.count is also very fast


回答 16

该代码更短,更清晰。这可能是最好的方法:

num_lines = open('yourfile.ext').read().count('\n')

This code is shorter and clearer. It’s probably the best way:

num_lines = open('yourfile.ext').read().count('\n')

回答 17

我修改了这样的缓冲情况:

def CountLines(filename):
    f = open(filename)
    try:
        lines = 1
        buf_size = 1024 * 1024
        read_f = f.read # loop optimization
        buf = read_f(buf_size)

        # Empty file
        if not buf:
            return 0

        while buf:
            lines += buf.count('\n')
            buf = read_f(buf_size)

        return lines
    finally:
        f.close()

现在,空文件和最后一行(不带\ n)也被计算在内。

I have modified the buffer case like this:

def CountLines(filename):
    f = open(filename)
    try:
        lines = 1
        buf_size = 1024 * 1024
        read_f = f.read # loop optimization
        buf = read_f(buf_size)

        # Empty file
        if not buf:
            return 0

        while buf:
            lines += buf.count('\n')
            buf = read_f(buf_size)

        return lines
    finally:
        f.close()

Now also empty files and the last line (without \n) are counted.


回答 18

那这个呢

def file_len(fname):
  counts = itertools.count()
  with open(fname) as f: 
    for _ in f: counts.next()
  return counts.next()

What about this

def file_len(fname):
  counts = itertools.count()
  with open(fname) as f: 
    for _ in f: counts.next()
  return counts.next()

回答 19

count = max(enumerate(open(filename)))[0]

count = max(enumerate(open(filename)))[0]


回答 20

print open('file.txt', 'r').read().count("\n") + 1
print open('file.txt', 'r').read().count("\n") + 1

回答 21

def line_count(path):
    count = 0
    with open(path) as lines:
        for count, l in enumerate(lines, start=1):
            pass
    return count
def line_count(path):
    count = 0
    with open(path) as lines:
        for count, l in enumerate(lines, start=1):
            pass
    return count

回答 22

如果要在Linux的Python中便宜地获得行数,我建议使用以下方法:

import os
print os.popen("wc -l file_path").readline().split()[0]

file_path可以是抽象文件路径,也可以是相对路径。希望这会有所帮助。

If one wants to get the line count cheaply in Python in Linux, I recommend this method:

import os
print os.popen("wc -l file_path").readline().split()[0]

file_path can be both abstract file path or relative path. Hope this may help.


回答 23

这个怎么样?

import fileinput
import sys

counter=0
for line in fileinput.input([sys.argv[1]]):
    counter+=1

fileinput.close()
print counter

How about this?

import fileinput
import sys

counter=0
for line in fileinput.input([sys.argv[1]]):
    counter+=1

fileinput.close()
print counter

回答 24

一线如何?

file_length = len(open('myfile.txt','r').read().split('\n'))

使用此方法花费0.003秒在3900线文件上计时

def c():
  import time
  s = time.time()
  file_length = len(open('myfile.txt','r').read().split('\n'))
  print time.time() - s

How about this one-liner:

file_length = len(open('myfile.txt','r').read().split('\n'))

Takes 0.003 sec using this method to time it on a 3900 line file

def c():
  import time
  s = time.time()
  file_length = len(open('myfile.txt','r').read().split('\n'))
  print time.time() - s

回答 25

def count_text_file_lines(path):
    with open(path, 'rt') as file:
        line_count = sum(1 for _line in file)
    return line_count
def count_text_file_lines(path):
    with open(path, 'rt') as file:
        line_count = sum(1 for _line in file)
    return line_count

回答 26

简单方法:

1)

>>> f = len(open("myfile.txt").readlines())
>>> f

430

2)

>>> f = open("myfile.txt").read().count('\n')
>>> f
430
>>>

3)

num_lines = len(list(open('myfile.txt')))

Simple method:

1)

>>> f = len(open("myfile.txt").readlines())
>>> f

430

2)

>>> f = open("myfile.txt").read().count('\n')
>>> f
430
>>>

3)

num_lines = len(list(open('myfile.txt')))

回答 27

打开文件的结果是一个迭代器,可以将其转换为长度为以下的序列:

with open(filename) as f:
   return len(list(f))

这比您的显式循环更为简洁,并且避免了enumerate

the result of opening a file is an iterator, which can be converted to a sequence, which has a length:

with open(filename) as f:
   return len(list(f))

this is more concise than your explicit loop, and avoids the enumerate.


回答 28

您可以通过os.path以下方式使用该模块:

import os
import subprocess
Number_lines = int( (subprocess.Popen( 'wc -l {0}'.format( Filename ), shell=True, stdout=subprocess.PIPE).stdout).readlines()[0].split()[0] )

,其中Filename是文件的绝对路径。

You can use the os.path module in the following way:

import os
import subprocess
Number_lines = int( (subprocess.Popen( 'wc -l {0}'.format( Filename ), shell=True, stdout=subprocess.PIPE).stdout).readlines()[0].split()[0] )

, where Filename is the absolute path of the file.


回答 29

如果文件可以放入内存,则

with open(fname) as f:
    count = len(f.read().split(b'\n')) - 1

If the file can fit into memory, then

with open(fname) as f:
    count = len(f.read().split(b'\n')) - 1

如何在Python中解析XML?

问题:如何在Python中解析XML?

在包含XML的数据库中,我有很多行,并且我试图编写一个Python脚本来计算特定节点属性的实例。

我的树看起来像:

<foo>
   <bar>
      <type foobar="1"/>
      <type foobar="2"/>
   </bar>
</foo>

如何使用Python 访问属性"1""2"XML?

I have many rows in a database that contains XML and I’m trying to write a Python script to count instances of a particular node attribute.

My tree looks like:

<foo>
   <bar>
      <type foobar="1"/>
      <type foobar="2"/>
   </bar>
</foo>

How can I access the attributes "1" and "2" in the XML using Python?


回答 0

我建议ElementTree。相同的API还有其他兼容的实现,例如lxmlcElementTree在Python标准库本身中。但是在这种情况下,他们主要添加的是更高的速度-编程的难易程度取决于ElementTree定义的API 。

首先root从XML 构建Element实例,例如,使用XML函数,或者通过解析文件,例如:

import xml.etree.ElementTree as ET
root = ET.parse('thefile.xml').getroot()

或中显示的许多其他方式中的任何一种ElementTree。然后执行以下操作:

for type_tag in root.findall('bar/type'):
    value = type_tag.get('foobar')
    print(value)

和类似的,通常很简单的代码模式。

I suggest ElementTree. There are other compatible implementations of the same API, such as lxml, and cElementTree in the Python standard library itself; but, in this context, what they chiefly add is even more speed — the ease of programming part depends on the API, which ElementTree defines.

First build an Element instance root from the XML, e.g. with the XML function, or by parsing a file with something like:

import xml.etree.ElementTree as ET
root = ET.parse('thefile.xml').getroot()

Or any of the many other ways shown at ElementTree. Then do something like:

for type_tag in root.findall('bar/type'):
    value = type_tag.get('foobar')
    print(value)

And similar, usually pretty simple, code patterns.


回答 1

minidom 是最快,最简单的方法。

XML:

<data>
    <items>
        <item name="item1"></item>
        <item name="item2"></item>
        <item name="item3"></item>
        <item name="item4"></item>
    </items>
</data>

Python:

from xml.dom import minidom
xmldoc = minidom.parse('items.xml')
itemlist = xmldoc.getElementsByTagName('item')
print(len(itemlist))
print(itemlist[0].attributes['name'].value)
for s in itemlist:
    print(s.attributes['name'].value)

输出:

4
item1
item1
item2
item3
item4

minidom is the quickest and pretty straight forward.

XML:

<data>
    <items>
        <item name="item1"></item>
        <item name="item2"></item>
        <item name="item3"></item>
        <item name="item4"></item>
    </items>
</data>

Python:

from xml.dom import minidom
xmldoc = minidom.parse('items.xml')
itemlist = xmldoc.getElementsByTagName('item')
print(len(itemlist))
print(itemlist[0].attributes['name'].value)
for s in itemlist:
    print(s.attributes['name'].value)

Output:

4
item1
item1
item2
item3
item4

回答 2

您可以使用BeautifulSoup

from bs4 import BeautifulSoup

x="""<foo>
   <bar>
      <type foobar="1"/>
      <type foobar="2"/>
   </bar>
</foo>"""

y=BeautifulSoup(x)
>>> y.foo.bar.type["foobar"]
u'1'

>>> y.foo.bar.findAll("type")
[<type foobar="1"></type>, <type foobar="2"></type>]

>>> y.foo.bar.findAll("type")[0]["foobar"]
u'1'
>>> y.foo.bar.findAll("type")[1]["foobar"]
u'2'

You can use BeautifulSoup:

from bs4 import BeautifulSoup

x="""<foo>
   <bar>
      <type foobar="1"/>
      <type foobar="2"/>
   </bar>
</foo>"""

y=BeautifulSoup(x)
>>> y.foo.bar.type["foobar"]
u'1'

>>> y.foo.bar.findAll("type")
[<type foobar="1"></type>, <type foobar="2"></type>]

>>> y.foo.bar.findAll("type")[0]["foobar"]
u'1'
>>> y.foo.bar.findAll("type")[1]["foobar"]
u'2'

回答 3

有很多选择。如果速度和内存使用成为问题,则cElementTree看起来很棒。与仅使用读取文件相比,它的开销很小readlines

可以从cElementTree网站复制的下表中找到相关指标:

library                         time    space
xml.dom.minidom (Python 2.1)    6.3 s   80000K
gnosis.objectify                2.0 s   22000k
xml.dom.minidom (Python 2.4)    1.4 s   53000k
ElementTree 1.2                 1.6 s   14500k  
ElementTree 1.2.4/1.3           1.1 s   14500k  
cDomlette (C extension)         0.540 s 20500k
PyRXPU (C extension)            0.175 s 10850k
libxml2 (C extension)           0.098 s 16000k
readlines (read as utf-8)       0.093 s 8850k
cElementTree (C extension)  --> 0.047 s 4900K <--
readlines (read as ascii)       0.032 s 5050k   

正如@jfs所指出的那样cElementTreePython捆绑了它:

  • Python 2 :from xml.etree import cElementTree as ElementTree
  • Python 3 :(from xml.etree import ElementTree自动使用加速的C版本)。

There are many options out there. cElementTree looks excellent if speed and memory usage are an issue. It has very little overhead compared to simply reading in the file using readlines.

The relevant metrics can be found in the table below, copied from the cElementTree website:

library                         time    space
xml.dom.minidom (Python 2.1)    6.3 s   80000K
gnosis.objectify                2.0 s   22000k
xml.dom.minidom (Python 2.4)    1.4 s   53000k
ElementTree 1.2                 1.6 s   14500k  
ElementTree 1.2.4/1.3           1.1 s   14500k  
cDomlette (C extension)         0.540 s 20500k
PyRXPU (C extension)            0.175 s 10850k
libxml2 (C extension)           0.098 s 16000k
readlines (read as utf-8)       0.093 s 8850k
cElementTree (C extension)  --> 0.047 s 4900K <--
readlines (read as ascii)       0.032 s 5050k   

As pointed out by @jfs, cElementTree comes bundled with Python:

  • Python 2: from xml.etree import cElementTree as ElementTree.
  • Python 3: from xml.etree import ElementTree (the accelerated C version is used automatically).

回答 4

我建议 为了简单起见, xmltodict

它将您的XML解析为OrderedDict;

>>> e = '<foo>
             <bar>
                 <type foobar="1"/>
                 <type foobar="2"/>
             </bar>
        </foo> '

>>> import xmltodict
>>> result = xmltodict.parse(e)
>>> result

OrderedDict([(u'foo', OrderedDict([(u'bar', OrderedDict([(u'type', [OrderedDict([(u'@foobar', u'1')]), OrderedDict([(u'@foobar', u'2')])])]))]))])

>>> result['foo']

OrderedDict([(u'bar', OrderedDict([(u'type', [OrderedDict([(u'@foobar', u'1')]), OrderedDict([(u'@foobar', u'2')])])]))])

>>> result['foo']['bar']

OrderedDict([(u'type', [OrderedDict([(u'@foobar', u'1')]), OrderedDict([(u'@foobar', u'2')])])])

I suggest xmltodict for simplicity.

It parses your XML to an OrderedDict;

>>> e = '<foo>
             <bar>
                 <type foobar="1"/>
                 <type foobar="2"/>
             </bar>
        </foo> '

>>> import xmltodict
>>> result = xmltodict.parse(e)
>>> result

OrderedDict([(u'foo', OrderedDict([(u'bar', OrderedDict([(u'type', [OrderedDict([(u'@foobar', u'1')]), OrderedDict([(u'@foobar', u'2')])])]))]))])

>>> result['foo']

OrderedDict([(u'bar', OrderedDict([(u'type', [OrderedDict([(u'@foobar', u'1')]), OrderedDict([(u'@foobar', u'2')])])]))])

>>> result['foo']['bar']

OrderedDict([(u'type', [OrderedDict([(u'@foobar', u'1')]), OrderedDict([(u'@foobar', u'2')])])])

回答 5

lxml.objectify非常简单。

以您的示例文本:

from lxml import objectify
from collections import defaultdict

count = defaultdict(int)

root = objectify.fromstring(text)

for item in root.bar.type:
    count[item.attrib.get("foobar")] += 1

print dict(count)

输出:

{'1': 1, '2': 1}

lxml.objectify is really simple.

Taking your sample text:

from lxml import objectify
from collections import defaultdict

count = defaultdict(int)

root = objectify.fromstring(text)

for item in root.bar.type:
    count[item.attrib.get("foobar")] += 1

print dict(count)

Output:

{'1': 1, '2': 1}

回答 6

Python具有与Expat XML解析器的接口。

xml.parsers.expat

这是一个非验证解析器,因此不会发现错误的XML。但是,如果您知道文件正确无误,那么这很好,您可能会获得所需的确切信息,并且可以即时丢弃其余信息。

stringofxml = """<foo>
    <bar>
        <type arg="value" />
        <type arg="value" />
        <type arg="value" />
    </bar>
    <bar>
        <type arg="value" />
    </bar>
</foo>"""
count = 0
def start(name, attr):
    global count
    if name == 'type':
        count += 1

p = expat.ParserCreate()
p.StartElementHandler = start
p.Parse(stringofxml)

print count # prints 4

Python has an interface to the expat XML parser.

xml.parsers.expat

It’s a non-validating parser, so bad XML will not be caught. But if you know your file is correct, then this is pretty good, and you’ll probably get the exact info you want and you can discard the rest on the fly.

stringofxml = """<foo>
    <bar>
        <type arg="value" />
        <type arg="value" />
        <type arg="value" />
    </bar>
    <bar>
        <type arg="value" />
    </bar>
</foo>"""
count = 0
def start(name, attr):
    global count
    if name == 'type':
        count += 1

p = expat.ParserCreate()
p.StartElementHandler = start
p.Parse(stringofxml)

print count # prints 4

回答 7

我可能会建议declxml

全面披露:我之所以写这个库,是因为我在寻找一种在XML和Python数据结构之间进行转换的方法,而无需使用ElementTree编写数十行命令式解析/序列化代码。

使用declxml,您可以使用处理器以声明方式定义XML文档的结构以及如何在XML和Python数据结构之间进行映射。处理器用于序列化和解析以及基本的验证。

解析为Python数据结构非常简单:

import declxml as xml

xml_string = """
<foo>
   <bar>
      <type foobar="1"/>
      <type foobar="2"/>
   </bar>
</foo>
"""

processor = xml.dictionary('foo', [
    xml.dictionary('bar', [
        xml.array(xml.integer('type', attribute='foobar'))
    ])
])

xml.parse_from_string(processor, xml_string)

产生输出:

{'bar': {'foobar': [1, 2]}}

您还可以使用同一处理器将数据序列化为XML

data = {'bar': {
    'foobar': [7, 3, 21, 16, 11]
}}

xml.serialize_to_string(processor, data, indent='    ')

产生以下输出

<?xml version="1.0" ?>
<foo>
    <bar>
        <type foobar="7"/>
        <type foobar="3"/>
        <type foobar="21"/>
        <type foobar="16"/>
        <type foobar="11"/>
    </bar>
</foo>

如果要使用对象而不是字典,则可以定义处理器以将数据与对象之间进行转换。

import declxml as xml

class Bar:

    def __init__(self):
        self.foobars = []

    def __repr__(self):
        return 'Bar(foobars={})'.format(self.foobars)


xml_string = """
<foo>
   <bar>
      <type foobar="1"/>
      <type foobar="2"/>
   </bar>
</foo>
"""

processor = xml.dictionary('foo', [
    xml.user_object('bar', Bar, [
        xml.array(xml.integer('type', attribute='foobar'), alias='foobars')
    ])
])

xml.parse_from_string(processor, xml_string)

产生以下输出

{'bar': Bar(foobars=[1, 2])}

I might suggest declxml.

Full disclosure: I wrote this library because I was looking for a way to convert between XML and Python data structures without needing to write dozens of lines of imperative parsing/serialization code with ElementTree.

With declxml, you use processors to declaratively define the structure of your XML document and how to map between XML and Python data structures. Processors are used to for both serialization and parsing as well as for a basic level of validation.

Parsing into Python data structures is straightforward:

import declxml as xml

xml_string = """
<foo>
   <bar>
      <type foobar="1"/>
      <type foobar="2"/>
   </bar>
</foo>
"""

processor = xml.dictionary('foo', [
    xml.dictionary('bar', [
        xml.array(xml.integer('type', attribute='foobar'))
    ])
])

xml.parse_from_string(processor, xml_string)

Which produces the output:

{'bar': {'foobar': [1, 2]}}

You can also use the same processor to serialize data to XML

data = {'bar': {
    'foobar': [7, 3, 21, 16, 11]
}}

xml.serialize_to_string(processor, data, indent='    ')

Which produces the following output

<?xml version="1.0" ?>
<foo>
    <bar>
        <type foobar="7"/>
        <type foobar="3"/>
        <type foobar="21"/>
        <type foobar="16"/>
        <type foobar="11"/>
    </bar>
</foo>

If you want to work with objects instead of dictionaries, you can define processors to transform data to and from objects as well.

import declxml as xml

class Bar:

    def __init__(self):
        self.foobars = []

    def __repr__(self):
        return 'Bar(foobars={})'.format(self.foobars)


xml_string = """
<foo>
   <bar>
      <type foobar="1"/>
      <type foobar="2"/>
   </bar>
</foo>
"""

processor = xml.dictionary('foo', [
    xml.user_object('bar', Bar, [
        xml.array(xml.integer('type', attribute='foobar'), alias='foobars')
    ])
])

xml.parse_from_string(processor, xml_string)

Which produces the following output

{'bar': Bar(foobars=[1, 2])}

回答 8

为了增加另一种可能性,您可以使用untangle,因为它是一个简单的xml-to-python-object库。这里有一个例子:

安装:

pip install untangle

用法:

您的XML文件(有所更改):

<foo>
   <bar name="bar_name">
      <type foobar="1"/>
   </bar>
</foo>

通过访问属性untangle

import untangle

obj = untangle.parse('/path_to_xml_file/file.xml')

print obj.foo.bar['name']
print obj.foo.bar.type['foobar']

输出将是:

bar_name
1

有关解缠的更多信息,请参见“解 ”。

另外,如果您好奇,可以在“ Python和XML ”中找到使用XML和Python的工具列表。您还将看到以前的答案提到了最常见的答案。

Just to add another possibility, you can use untangle, as it is a simple xml-to-python-object library. Here you have an example:

Installation:

pip install untangle

Usage:

Your XML file (a little bit changed):

<foo>
   <bar name="bar_name">
      <type foobar="1"/>
   </bar>
</foo>

Accessing the attributes with untangle:

import untangle

obj = untangle.parse('/path_to_xml_file/file.xml')

print obj.foo.bar['name']
print obj.foo.bar.type['foobar']

The output will be:

bar_name
1

More information about untangle can be found in “untangle“.

Also, if you are curious, you can find a list of tools for working with XML and Python in “Python and XML“. You will also see that the most common ones were mentioned by previous answers.


回答 9

这里是一个非常简单但有效的代码cElementTree

try:
    import cElementTree as ET
except ImportError:
  try:
    # Python 2.5 need to import a different module
    import xml.etree.cElementTree as ET
  except ImportError:
    exit_err("Failed to import cElementTree from any known place")      

def find_in_tree(tree, node):
    found = tree.find(node)
    if found == None:
        print "No %s in file" % node
        found = []
    return found  

# Parse a xml file (specify the path)
def_file = "xml_file_name.xml"
try:
    dom = ET.parse(open(def_file, "r"))
    root = dom.getroot()
except:
    exit_err("Unable to open and parse input definition file: " + def_file)

# Parse to find the child nodes list of node 'myNode'
fwdefs = find_in_tree(root,"myNode")

这来自“ python xml parse ”。

Here a very simple but effective code using cElementTree.

try:
    import cElementTree as ET
except ImportError:
  try:
    # Python 2.5 need to import a different module
    import xml.etree.cElementTree as ET
  except ImportError:
    exit_err("Failed to import cElementTree from any known place")      

def find_in_tree(tree, node):
    found = tree.find(node)
    if found == None:
        print "No %s in file" % node
        found = []
    return found  

# Parse a xml file (specify the path)
def_file = "xml_file_name.xml"
try:
    dom = ET.parse(open(def_file, "r"))
    root = dom.getroot()
except:
    exit_err("Unable to open and parse input definition file: " + def_file)

# Parse to find the child nodes list of node 'myNode'
fwdefs = find_in_tree(root,"myNode")

This is from “python xml parse“.


回答 10

XML:

<foo>
   <bar>
      <type foobar="1"/>
      <type foobar="2"/>
   </bar>
</foo>

Python代码:

import xml.etree.cElementTree as ET

tree = ET.parse("foo.xml")
root = tree.getroot() 
root_tag = root.tag
print(root_tag) 

for form in root.findall("./bar/type"):
    x=(form.attrib)
    z=list(x)
    for i in z:
        print(x[i])

输出:

foo
1
2

XML:

<foo>
   <bar>
      <type foobar="1"/>
      <type foobar="2"/>
   </bar>
</foo>

Python code:

import xml.etree.cElementTree as ET

tree = ET.parse("foo.xml")
root = tree.getroot() 
root_tag = root.tag
print(root_tag) 

for form in root.findall("./bar/type"):
    x=(form.attrib)
    z=list(x)
    for i in z:
        print(x[i])

Output:

foo
1
2

回答 11

xml.etree.ElementTree与lxml

这些是我会在选择它们之前使用的两个最常用库的一些优点。

xml.etree.ElementTree:

  1. 来自标准库:无需安装任何模块

xml文件

  1. 轻松编写XML声明:例如,您需要添加standalone="no"吗?
  2. 印刷精美:无需额外代码即可拥有漂亮的缩进 XML。
  3. 对象化功能:它使您可以像处理普通的Python对象层次结构一样使用XML .node

xml.etree.ElementTree vs. lxml

These are some pros of the two most used libraries I would have benefit to know before choosing between them.

xml.etree.ElementTree:

  1. From the standard library: no needs of installing any module

lxml

  1. Easily write XML declaration: for instance do you need to add standalone="no"?
  2. Pretty printing: you can have a nice indented XML without extra code.
  3. Objectify functionality: It allows you to use XML as if you were dealing with a normal Python object hierarchy.node.

回答 12

import xml.etree.ElementTree as ET
data = '''<foo>
           <bar>
               <type foobar="1"/>
               <type foobar="2"/>
          </bar>
       </foo>'''
tree = ET.fromstring(data)
lst = tree.findall('bar/type')
for item in lst:
    print item.get('foobar')

这将打印foobar属性的值。

import xml.etree.ElementTree as ET
data = '''<foo>
           <bar>
               <type foobar="1"/>
               <type foobar="2"/>
          </bar>
       </foo>'''
tree = ET.fromstring(data)
lst = tree.findall('bar/type')
for item in lst:
    print item.get('foobar')

This will print the value of the foobar attribute.


回答 13

我发现Python xml.domxml.dom.minidom非常简单。请记住,DOM不适用于大量XML,但是如果您的输入很小,那么它将很好用。

I find the Python xml.dom and xml.dom.minidom quite easy. Keep in mind that DOM isn’t good for large amounts of XML, but if your input is fairly small then this will work fine.


回答 14

没有必要使用一个lib特定的API,如果你使用python-benedict。只需从XML初始化一个新实例并对其进行轻松管理,因为它是dict子类。

安装简单: pip install python-benedict

from benedict import benedict as bdict

# data-source can be an url, a filepath or data-string (as in this example)
data_source = """
<foo>
   <bar>
      <type foobar="1"/>
      <type foobar="2"/>
   </bar>
</foo>"""

data = bdict.from_xml(data_source)
t_list = data['foo.bar'] # yes, keypath supported
for t in t_list:
   print(t['@foobar'])

它支持和标准化的I / O操作多种格式:Base64CSVJSONTOMLXMLYAMLquery-string

它已在GitHub上经过良好测试和开源。

There’s no need to use a lib specific API if you use python-benedict. Just initialize a new instance from your XML and manage it easily since it is a dict subclass.

Installation is easy: pip install python-benedict

from benedict import benedict as bdict

# data-source can be an url, a filepath or data-string (as in this example)
data_source = """
<foo>
   <bar>
      <type foobar="1"/>
      <type foobar="2"/>
   </bar>
</foo>"""

data = bdict.from_xml(data_source)
t_list = data['foo.bar'] # yes, keypath supported
for t in t_list:
   print(t['@foobar'])

It supports and normalizes I/O operations with many formats: Base64, CSV, JSON, TOML, XML, YAML and query-string.

It is well tested and open-source on GitHub.


回答 15

#If the xml is in the form of a string as shown below then
from lxml  import etree, objectify
'''sample xml as a string with a name space {http://xmlns.abc.com}'''
message =b'<?xml version="1.0" encoding="UTF-8"?>\r\n<pa:Process xmlns:pa="http://xmlns.abc.com">\r\n\t<pa:firsttag>SAMPLE</pa:firsttag></pa:Process>\r\n'  # this is a sample xml which is a string


print('************message coversion and parsing starts*************')

message=message.decode('utf-8') 
message=message.replace('<?xml version="1.0" encoding="UTF-8"?>\r\n','') #replace is used to remove unwanted strings from the 'message'
message=message.replace('pa:Process>\r\n','pa:Process>')
print (message)

print ('******Parsing starts*************')
parser = etree.XMLParser(remove_blank_text=True) #the name space is removed here
root = etree.fromstring(message, parser) #parsing of xml happens here
print ('******Parsing completed************')


dict={}
for child in root: # parsed xml is iterated using a for loop and values are stored in a dictionary
    print(child.tag,child.text)
    print('****Derving from xml tree*****')
    if child.tag =="{http://xmlns.abc.com}firsttag":
        dict["FIRST_TAG"]=child.text
        print(dict)


### output
'''************message coversion and parsing starts*************
<pa:Process xmlns:pa="http://xmlns.abc.com">

    <pa:firsttag>SAMPLE</pa:firsttag></pa:Process>
******Parsing starts*************
******Parsing completed************
{http://xmlns.abc.com}firsttag SAMPLE
****Derving from xml tree*****
{'FIRST_TAG': 'SAMPLE'}'''
#If the xml is in the form of a string as shown below then
from lxml  import etree, objectify
'''sample xml as a string with a name space {http://xmlns.abc.com}'''
message =b'<?xml version="1.0" encoding="UTF-8"?>\r\n<pa:Process xmlns:pa="http://xmlns.abc.com">\r\n\t<pa:firsttag>SAMPLE</pa:firsttag></pa:Process>\r\n'  # this is a sample xml which is a string


print('************message coversion and parsing starts*************')

message=message.decode('utf-8') 
message=message.replace('<?xml version="1.0" encoding="UTF-8"?>\r\n','') #replace is used to remove unwanted strings from the 'message'
message=message.replace('pa:Process>\r\n','pa:Process>')
print (message)

print ('******Parsing starts*************')
parser = etree.XMLParser(remove_blank_text=True) #the name space is removed here
root = etree.fromstring(message, parser) #parsing of xml happens here
print ('******Parsing completed************')


dict={}
for child in root: # parsed xml is iterated using a for loop and values are stored in a dictionary
    print(child.tag,child.text)
    print('****Derving from xml tree*****')
    if child.tag =="{http://xmlns.abc.com}firsttag":
        dict["FIRST_TAG"]=child.text
        print(dict)


### output
'''************message coversion and parsing starts*************
<pa:Process xmlns:pa="http://xmlns.abc.com">

    <pa:firsttag>SAMPLE</pa:firsttag></pa:Process>
******Parsing starts*************
******Parsing completed************
{http://xmlns.abc.com}firsttag SAMPLE
****Derving from xml tree*****
{'FIRST_TAG': 'SAMPLE'}'''

回答 16

如果源是xml文件,请像下面的示例一样说

<pa:Process xmlns:pa="http://sssss">
        <pa:firsttag>SAMPLE</pa:firsttag>
    </pa:Process>

您可以尝试以下代码

from lxml import etree, objectify
metadata = 'C:\\Users\\PROCS.xml' # this is sample xml file the contents are shown above
parser = etree.XMLParser(remove_blank_text=True) # this line removes the  name space from the xml in this sample the name space is --> http://sssss
tree = etree.parse(metadata, parser) # this line parses the xml file which is PROCS.xml
root = tree.getroot() # we get the root of xml which is process and iterate using a for loop
for elem in root.getiterator():
    if not hasattr(elem.tag, 'find'): continue  # (1)
    i = elem.tag.find('}')
    if i >= 0:
        elem.tag = elem.tag[i+1:]

dict={}  # a python dictionary is declared
for elem in tree.iter(): #iterating through the xml tree using a for loop
    if elem.tag =="firsttag": # if the tag name matches the name that is equated then the text in the tag is stored into the dictionary
        dict["FIRST_TAG"]=str(elem.text)
        print(dict)

输出将是

{'FIRST_TAG': 'SAMPLE'}

If the source is an xml file, say like this sample

<pa:Process xmlns:pa="http://sssss">
        <pa:firsttag>SAMPLE</pa:firsttag>
    </pa:Process>

you may try the following code

from lxml import etree, objectify
metadata = 'C:\\Users\\PROCS.xml' # this is sample xml file the contents are shown above
parser = etree.XMLParser(remove_blank_text=True) # this line removes the  name space from the xml in this sample the name space is --> http://sssss
tree = etree.parse(metadata, parser) # this line parses the xml file which is PROCS.xml
root = tree.getroot() # we get the root of xml which is process and iterate using a for loop
for elem in root.getiterator():
    if not hasattr(elem.tag, 'find'): continue  # (1)
    i = elem.tag.find('}')
    if i >= 0:
        elem.tag = elem.tag[i+1:]

dict={}  # a python dictionary is declared
for elem in tree.iter(): #iterating through the xml tree using a for loop
    if elem.tag =="firsttag": # if the tag name matches the name that is equated then the text in the tag is stored into the dictionary
        dict["FIRST_TAG"]=str(elem.text)
        print(dict)

Output would be

{'FIRST_TAG': 'SAMPLE'}