分类目录归档:知识问答

在Python中运行Bash命令

问题:在Python中运行Bash命令

在我的本地计算机上,我运行一个包含此行的python脚本

bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
os.system(bashCommand)

这很好。

然后,我在服务器上运行相同的代码,并收到以下错误消息

'import site' failed; use -v for traceback
Traceback (most recent call last):
File "/usr/bin/cwm", line 48, in <module>
from swap import  diag
ImportError: No module named swap

因此,我要做的就是print bashCommand在运行之前,在终端中插入了一个比命令更清晰的信息os.system()

当然,我再次收到错误(由引起os.system(bashCommand)),但是在该错误出现之前,它将在终端中打印命令。然后我只是复制了输出,然后将复制粘贴到终端中,然后按回车,它就可以工作了…

有人知道发生了什么吗?

On my local machine, I run a python script which contains this line

bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
os.system(bashCommand)

This works fine.

Then I run the same code on a server and I get the following error message

'import site' failed; use -v for traceback
Traceback (most recent call last):
File "/usr/bin/cwm", line 48, in <module>
from swap import  diag
ImportError: No module named swap

So what I did then is I inserted a print bashCommand which prints me than the command in the terminal before it runs it with os.system().

Of course, I get again the error (caused by os.system(bashCommand)) but before that error it prints the command in the terminal. Then I just copied that output and did a copy paste into the terminal and hit enter and it works…

Does anyone have a clue what’s going on?


回答 0

不要使用os.system。不推荐使用它,而推荐使用subprocess。从文档:“此模块旨在取代旧的几个模块和功能:os.systemos.spawn”。

就像您的情况一样:

bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
import subprocess
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()

Don’t use os.system. It has been deprecated in favor of subprocess. From the docs: “This module intends to replace several older modules and functions: os.system, os.spawn“.

Like in your case:

bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
import subprocess
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()

回答 1

为了稍微扩展此处的早期答案,通常会忽略许多细节。

  • 身高subprocess.run()subprocess.check_call()和朋友过subprocess.call()subprocess.Popen()os.system()os.popen()
  • 理解并可能使用text=True,又名universal_newlines=True
  • 了解shell=Trueor 的含义shell=False以及它如何更改报价以及shell便利的可用性。
  • 了解sh和Bash 之间的差异
  • 了解子流程如何与其父流程分离,并且通常无法更改父流程。
  • 避免将Python解释器作为Python的子进程运行。

这些主题将在下面更详细地介绍。

更喜欢subprocess.run()还是subprocess.check_call()

subprocess.Popen()函数是低级主力,但正确使用起来很棘手,最终您会复制/粘贴多行代码…这些代码已经方便地存在于标准库中,作为一组用于各种用途的高级包装函数,下面将更详细地介绍。

这是文档中的一段:

推荐的调用子流程的方法是将该run()功能用于它可以处理的所有用例。对于更高级的用例,Popen可以直接使用基础接口。

不幸的是,这些包装函数的可用性在Python版本之间有所不同。

  • subprocess.run()在Python 3.5中正式引入。它旨在替换以下所有内容。
  • subprocess.check_output()是在Python 2.7 / 3.1中引入的。它基本上相当于subprocess.run(..., check=True, stdout=subprocess.PIPE).stdout
  • subprocess.check_call()是在Python 2.5中引入的。它基本上相当于subprocess.run(..., check=True)
  • subprocess.call()是在Python 2.4中的原始subprocess模块(PEP-324)中引入的。它基本上相当于subprocess.run(...).returncode

高级API与 subprocess.Popen()

subprocess.run()它替代的旧版旧功能相比,重构和扩展的逻辑更加丰富,用途更广。它返回一个CompletedProcess具有各种方法的对象,使您可以从完成的子流程中检索退出状态,标准输出以及其他一些结果和状态指示符。

subprocess.run()如果您只需要一个程序来运行并将控制权返回给Python,则可以采用这种方法。对于更复杂的场景(后台进程,也许通过Python父程序使用交互式I / O),您仍然需要使用subprocess.Popen()并照顾好所有管道。这需要对所有运动部件有相当复杂的了解,因此不应掉以轻心。更简单的Popen对象表示(可能仍在运行)的进程,在子进程的剩余生命周期中,需要从您的代码中对其进行管理。

也许应该强调,仅仅subprocess.Popen()是创造一个过程。如果不这样做的话,您将有一个子进程与Python同时运行,因此是一个“后台”进程。如果不需要进行输入或输出或与您进行协调,则可以与Python程序并行进行有用的工作。

避免os.system()os.popen()

自从永恒(从Python 2.5开始)以来,os模块文档中就包含了建议优先subprocessos.system()

subprocess模块提供了更强大的功能来生成新流程并检索其结果。使用该模块优于使用此功能。

问题system()在于它显然依赖于系统,并且没有提供与子流程进行交互的方法。它运行简单,标准输出和标准错误超出了Python的范围。Python收到的唯一信息是命令的退出状态(零表示成功,尽管非零值的含义在某种程度上也取决于系统)。

PEP-324(上面已经提到过)包含了更详细的理由,说明了为什么os.system会出现问题以及如何subprocess尝试解决这些问题。

os.popen()过去更不鼓励

从2.6版开始不推荐使用:此功能已过时。使用subprocess模块。

但是,自从Python 3发行以来,它已经重新实现为仅使用subprocess,并重定向到subprocess.Popen()文档以获取详细信息。

了解并通常使用 check=True

您还会注意到,它与subprocess.call()有许多相同的限制os.system()。在常规使用中,通常应该检查流程是否成功完成,执行subprocess.check_call()subprocess.check_output()执行(其中后者还返回完成的子流程的标准输出)。同样,除非特别需要允许子流程返回错误状态check=Truesubprocess.run()否则通常应使用with 。

实际上,使用check=True或时subprocess.check_*,如果子进程返回非零退出状态,Python将抛出CalledProcessError异常

一个常见的错误subprocess.run()check=True如果子进程失败,则在下游代码失败时忽略并感到惊讶。

在另一方面,有一个共同的问题check_call()check_output()是谁盲目使用这些功能的用户,当异常发生时,如感到惊讶grep并没有找到匹配。(您可能应该grep仍然用本机Python代码替换,如下所述。)

所有事情都计算在内,您需要了解shell命令如何返回退出代码,以及在什么条件下它们将返回非零(错误)退出代码,并做出有意识的决定,如何精确地处理它。

了解并且可能使用text=Trueakauniversal_newlines=True

从Python 3开始,Python内部的字符串是Unicode字符串。但是,不能保证子进程会生成Unicode输出或字符串。

(如果差异不是立即显而易见的,则建议使用Ned Batchelder的实用Unicode(如果不是必须的话)阅读。如果您愿意,可以在链接后进行36分钟的视频演示,尽管您自己阅读页面的时间可能会大大减少。 )

深入地讲,Python必须获取bytes缓冲区并以某种方式解释它。如果它包含二进制数据的斑点,则不应将其解码为Unicode字符串,因为这是容易出错和引起错误的行为-正是这种讨厌的行为,使许多Python 2脚本充满了麻烦,之后才有办法正确区分编码文本和二进制数据。

使用text=True,您可以告诉Python您实际上希望以系统的默认编码返回文本数据,并且应将其解码为Python(Unicode)字符串,以达到Python的最佳能力(通常,UTF-8在不超过日期系统,也许除了Windows?)

如果您没有要求,Python只会bytesstdoutstderr字符串中提供字符串。也许稍后您确实知道它们是文本字符串,并且知道了它们的编码。然后,您可以解码它们。

normal = subprocess.run([external, arg],
    stdout=subprocess.PIPE, stderr=subprocess.PIPE,
    check=True,
    text=True)
print(normal.stdout)

convoluted = subprocess.run([external, arg],
    stdout=subprocess.PIPE, stderr=subprocess.PIPE,
    check=True)
# You have to know (or guess) the encoding
print(convoluted.stdout.decode('utf-8'))

Python 3.7 text为关键字参数引入了更简短,更具描述性和可理解性的别名,该别名以前曾被误导使用universal_newlines

了解shell=Trueshell=False

随着shell=True您将单个字符串传递给您的外壳,外壳便从那里接收它。

随着shell=False您将参数列表传递给操作系统,绕过了外壳程序。

当没有外壳时,您可以保存进程并摆脱相当多的隐藏复杂性,这些复杂性可能会也可能不会包含错误甚至安全问题。

另一方面,当您没有外壳程序时,就没有重定向,通配符扩展,作业控制和大量其他外壳程序功能。

一个常见的错误是使用shell=TruePython,然后仍将令牌列表传递给Python,反之亦然。这在某些情况下可能会起作用,但确实定义不清,并可能以有趣的方式破坏。

# XXX AVOID THIS BUG
buggy = subprocess.run('dig +short stackoverflow.com')

# XXX AVOID THIS BUG TOO
broken = subprocess.run(['dig', '+short', 'stackoverflow.com'],
    shell=True)

# XXX DEFINITELY AVOID THIS
pathological = subprocess.run(['dig +short stackoverflow.com'],
    shell=True)

correct = subprocess.run(['dig', '+short', 'stackoverflow.com'],
    # Probably don't forget these, too
    check=True, text=True)

# XXX Probably better avoid shell=True
# but this is nominally correct
fixed_but_fugly = subprocess.run('dig +short stackoverflow.com',
    shell=True,
    # Probably don't forget these, too
    check=True, text=True)

常见的反驳“但对我有用”不是一个有用的反驳,除非您确切地了解它在什么情况下会停止工作。

重构实例

通常,shell的功能可以用本地Python代码替换。简单的Awk或sed脚本可能应该简单地翻译成Python。

为了部分说明这一点,这是一个典型但有些愚蠢的示例,其中涉及许多外壳功能。

cmd = '''while read -r x;
   do ping -c 3 "$x" | grep 'round-trip min/avg/max'
   done <hosts.txt'''

# Trivial but horrible
results = subprocess.run(
    cmd, shell=True, universal_newlines=True, check=True)
print(results.stdout)

# Reimplement with shell=False
with open('hosts.txt') as hosts:
    for host in hosts:
        host = host.rstrip('\n')  # drop newline
        ping = subprocess.run(
             ['ping', '-c', '3', host],
             text=True,
             stdout=subprocess.PIPE,
             check=True)
        for line in ping.stdout.split('\n'):
             if 'round-trip min/avg/max' in line:
                 print('{}: {}'.format(host, line))

这里要注意一些事情:

  • 随着shell=False你不需要引用的外壳需要大约字符串。无论如何用引号引起来可能是一个错误。
  • 在子流程中运行尽可能少的代码通常是有意义的。这使您可以从Python代码中更好地控制执行。
  • 话虽这么说,复杂的Shell管道非常繁琐,有时很难在Python中重新实现。

重构后的代码还以非常简洁的语法说明了shell实际为您做了多少-更好或更坏。Python说明确优于隐式,但Python代码相当冗长,可以说是看起来复杂得多,这确实是。另一方面,它提供了许多要点,您可以在其中进行控制,例如通过增强功能可以很容易地说明这一点,我们可以轻松地将主机名与shell命令输出一起包括在内。(这在shell中也绝非挑战性,但是以另一种转移和也许另一种过程为代价的。)

普通壳结构

为了完整起见,这里简要介绍了其中一些外壳程序功能,并提供了一些注释,说明如何用本地Python设施替换它们。

  • Globbing aka通配符扩展可以glob.glob()用python或类似的简单Python字符串比较代替,或者经常用for file in os.listdir('.'): if not file.endswith('.png'): continue。Bash具有各种其他扩展功能,例如.{png,jpg}大括号扩展和{1..100}波浪号扩展(~扩展到您的主目录,并且更广泛~account地扩展到另一个用户的主目录)
  • Shell变量(例如$SHELL$my_exported_var有时可以简单地用Python变量替换)。导出的shell变量可作为例如os.environ['SHELL'](的意思export是使变量提供给子进程-一个变量,它是不可用的子进程显然不会提供给Python的运行作为shell的子进程,反之亦然env=关键字subprocess方法的参数可让您将子流程的环境定义为字典,因此这是使Python变量对子流程可见的一种方法。与shell=False您将需要了解如何删除任何引号;例如,cd "$HOME"相当于os.chdir(os.environ['HOME'])目录名周围不带引号。(常常cd是不是有用的或必要的,无论如何,和很多新手忽略了双引号周围的变量,并摆脱它,直到有一天……
  • 重定向允许您从文件读取作为标准输入,并将标准输出写入文件。grep 'foo' <inputfile >outputfile打开outputfile以进行写入和inputfile阅读,并将其内容作为标准输入传递给grep,然后其标准输出进入outputfile。通常,用本机Python代码替换它并不难。
  • 管道是重定向的一种形式。echo foo | nl运行两个子进程,其中的标准输出echo是的标准输入nl(在OS级别,在类Unix系统中,这是一个文件句柄)。如果您无法用本机Python代码替换管道的一端或两端,则也许可以考虑使用外壳程序,尤其是在管道具有两个或三个以上进程的情况下(尽管请查看pipesPython标准库中的模块或多个更具现代性和多功能的第三方竞争对手)。
  • 作业控制使您可以中断作业,在后台运行它们,将它们返回到前台等。当然,Python也提供了停止和继续一个进程的基本Unix信号。但是作业是外壳程序中的一个更高层次的抽象,涉及流程组等,如果您想从Python中进行类似的工作,则必须理解。
  • 除非您了解所有内容基本上都是字符串,否则在外壳程序中进行报价可能会造成混淆。因此ls -l /等效于,'ls' '-l' '/'但文字的引号是完全可选的。包含外壳元字符的未加引号的字符串将进行参数扩展,空格标记化和通配符扩展;双引号可防止空格标记化和通配符扩展,但允许参数扩展(变量替换,命令替换和反斜杠处理)。从理论上讲这很简单,但是可能会令人困惑,尤其是在存在多层解释时(例如,远程shell命令)。

了解sh和Bash 之间的差异

subprocess/bin/sh除非另有明确要求,否则将使用shell命令运行外壳命令(当然,在Windows上除外,因为Windows使用COMSPEC变量的值)。这意味着各种仅Bash的功能(如数组[[等)不可用。

如果您需要使用仅Bash语法,则可以将路径传递为shell executable='/bin/bash'(当然,如果您的Bash安装在其他位置,则需要调整路径)。

subprocess.run('''
    # This for loop syntax is Bash only
    for((i=1;i<=$#;i++)); do
        # Arrays are Bash-only
        array[i]+=123
    done''',
    shell=True, check=True,
    executable='/bin/bash')

A subprocess与它的父项分开,并且不能对其进行更改

一个常见的错误是做类似的事情

subprocess.run('foo=bar', shell=True)
subprocess.run('echo "$foo"', shell=True)  # Doesn't work

除了缺乏优雅之外,这还背叛了人们对“ subprocess”名称中“ sub”部分的根本性了解。

子进程与Python完全独立运行,完成时,Python不知道它做了什么(除了模糊的指示,它可以从子进程的退出状态和输出中推断出来)。孩子通常不能改变父母的环境;它不能设置变量,更改工作目录,也不能在没有上级合作的情况下与其上级通信。

在这种情况下,立即解决的办法是在一个子进程中运行两个命令。

subprocess.run('foo=bar; echo "$foo"', shell=True)

尽管显然,这种特定用例根本不需要外壳。请记住,您可以通过以下方式操纵当前进程的环境(因此也可以操纵其子进程)

os.environ['foo'] = 'bar'

或通过以下方式将环境设置传递给子进程

subprocess.run('echo "$foo"', shell=True, env={'foo': 'bar'})

(更不用说明显的重构了subprocess.run(['echo', 'bar']);但是echo,当然,这是在子流程中首先要运行的一个糟糕的例子)。

不要从Python运行Python

这是些可疑的建议。当然,在某些情况下,将Python解释器作为Python脚本的子进程运行甚至是绝对必要的情况。但是通常,正确的方法只是import将另一个Python模块放入您的调用脚本中,然后直接调用其功能。

如果其他Python脚本在您的控制下,并且不是模块,请考虑将其转换为一个。(此答案已经太久了,因此在这里我将不做详细介绍。)

如果需要并行处理,则可以在带有multiprocessing模块的子进程中运行Python函数 还有一个可以threading在单个进程中运行多个任务(它更轻巧,可以为您提供更多控制权,但同时也更多地限制了进程中的线程紧密耦合并绑定到单个GIL。)

To somewhat expand on the earlier answers here, there are a number of details which are commonly overlooked.

  • Prefer subprocess.run() over subprocess.check_call() and friends over subprocess.call() over subprocess.Popen() over os.system() over os.popen()
  • Understand and probably use text=True, aka universal_newlines=True.
  • Understand the meaning of shell=True or shell=False and how it changes quoting and the availability of shell conveniences.
  • Understand differences between sh and Bash
  • Understand how a subprocess is separate from its parent, and generally cannot change the parent.
  • Avoid running the Python interpreter as a subprocess of Python.

These topics are covered in some more detail below.

Prefer subprocess.run() or subprocess.check_call()

The subprocess.Popen() function is a low-level workhorse but it is tricky to use correctly and you end up copy/pasting multiple lines of code … which conveniently already exist in the standard library as a set of higher-level wrapper functions for various purposes, which are presented in more detail in the following.

Here’s a paragraph from the documentation:

The recommended approach to invoking subprocesses is to use the run() function for all use cases it can handle. For more advanced use cases, the underlying Popen interface can be used directly.

Unfortunately, the availability of these wrapper functions differs between Python versions.

  • subprocess.run() was officially introduced in Python 3.5. It is meant to replace all of the following.
  • subprocess.check_output() was introduced in Python 2.7 / 3.1. It is basically equivalent to subprocess.run(..., check=True, stdout=subprocess.PIPE).stdout
  • subprocess.check_call() was introduced in Python 2.5. It is basically equivalent to subprocess.run(..., check=True)
  • subprocess.call() was introduced in Python 2.4 in the original subprocess module (PEP-324). It is basically equivalent to subprocess.run(...).returncode

High-level API vs subprocess.Popen()

The refactored and extended subprocess.run() is more logical and more versatile than the older legacy functions it replaces. It returns a CompletedProcess object which has various methods which allow you to retrieve the exit status, the standard output, and a few other results and status indicators from the finished subprocess.

subprocess.run() is the way to go if you simply need a program to run and return control to Python. For more involved scenarios (background processes, perhaps with interactive I/O with the Python parent program) you still need to use subprocess.Popen() and take care of all the plumbing yourself. This requires a fairly intricate understanding of all the moving parts and should not be undertaken lightly. The simpler Popen object represents the (possibly still-running) process which needs to be managed from your code for the remainder of the lifetime of the subprocess.

It should perhaps be emphasized that just subprocess.Popen() merely creates a process. If you leave it at that, you have a subprocess running concurrently alongside with Python, so a “background” process. If it doesn’t need to do input or output or otherwise coordinate with you, it can do useful work in parallel with your Python program.

Avoid os.system() and os.popen()

Since time eternal (well, since Python 2.5) the os module documentation has contained the recommendation to prefer subprocess over os.system():

The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function.

The problems with system() are that it’s obviously system-dependent and doesn’t offer ways to interact with the subprocess. It simply runs, with standard output and standard error outside of Python’s reach. The only information Python receives back is the exit status of the command (zero means success, though the meaning of non-zero values is also somewhat system-dependent).

PEP-324 (which was already mentioned above) contains a more detailed rationale for why os.system is problematic and how subprocess attempts to solve those issues.

os.popen() used to be even more strongly discouraged:

Deprecated since version 2.6: This function is obsolete. Use the subprocess module.

However, since sometime in Python 3, it has been reimplemented to simply use subprocess, and redirects to the subprocess.Popen() documentation for details.

Understand and usually use check=True

You’ll also notice that subprocess.call() has many of the same limitations as os.system(). In regular use, you should generally check whether the process finished successfully, which subprocess.check_call() and subprocess.check_output() do (where the latter also returns the standard output of the finished subprocess). Similarly, you should usually use check=True with subprocess.run() unless you specifically need to allow the subprocess to return an error status.

In practice, with check=True or subprocess.check_*, Python will throw a CalledProcessError exception if the subprocess returns a nonzero exit status.

A common error with subprocess.run() is to omit check=True and be surprised when downstream code fails if the subprocess failed.

On the other hand, a common problem with check_call() and check_output() was that users who blindly used these functions were surprised when the exception was raised e.g. when grep did not find a match. (You should probably replace grep with native Python code anyway, as outlined below.)

All things counted, you need to understand how shell commands return an exit code, and under what conditions they will return a non-zero (error) exit code, and make a conscious decision how exactly it should be handled.

Understand and probably use text=True aka universal_newlines=True

Since Python 3, strings internal to Python are Unicode strings. But there is no guarantee that a subprocess generates Unicode output, or strings at all.

(If the differences are not immediately obvious, Ned Batchelder’s Pragmatic Unicode is recommended, if not outright obligatory, reading. There is a 36-minute video presentation behind the link if you prefer, though reading the page yourself will probably take significantly less time.)

Deep down, Python has to fetch a bytes buffer and interpret it somehow. If it contains a blob of binary data, it shouldn’t be decoded into a Unicode string, because that’s error-prone and bug-inducing behavior – precisely the sort of pesky behavior which riddled many Python 2 scripts, before there was a way to properly distinguish between encoded text and binary data.

With text=True, you tell Python that you, in fact, expect back textual data in the system’s default encoding, and that it should be decoded into a Python (Unicode) string to the best of Python’s ability (usually UTF-8 on any moderately up to date system, except perhaps Windows?)

If that’s not what you request back, Python will just give you bytes strings in the stdout and stderr strings. Maybe at some later point you do know that they were text strings after all, and you know their encoding. Then, you can decode them.

normal = subprocess.run([external, arg],
    stdout=subprocess.PIPE, stderr=subprocess.PIPE,
    check=True,
    text=True)
print(normal.stdout)

convoluted = subprocess.run([external, arg],
    stdout=subprocess.PIPE, stderr=subprocess.PIPE,
    check=True)
# You have to know (or guess) the encoding
print(convoluted.stdout.decode('utf-8'))

Python 3.7 introduced the shorter and more descriptive and understandable alias text for the keyword argument which was previously somewhat misleadingly called universal_newlines.

Understand shell=True vs shell=False

With shell=True you pass a single string to your shell, and the shell takes it from there.

With shell=False you pass a list of arguments to the OS, bypassing the shell.

When you don’t have a shell, you save a process and get rid of a fairly substantial amount of hidden complexity, which may or may not harbor bugs or even security problems.

On the other hand, when you don’t have a shell, you don’t have redirection, wildcard expansion, job control, and a large number of other shell features.

A common mistake is to use shell=True and then still pass Python a list of tokens, or vice versa. This happens to work in some cases, but is really ill-defined and could break in interesting ways.

# XXX AVOID THIS BUG
buggy = subprocess.run('dig +short stackoverflow.com')

# XXX AVOID THIS BUG TOO
broken = subprocess.run(['dig', '+short', 'stackoverflow.com'],
    shell=True)

# XXX DEFINITELY AVOID THIS
pathological = subprocess.run(['dig +short stackoverflow.com'],
    shell=True)

correct = subprocess.run(['dig', '+short', 'stackoverflow.com'],
    # Probably don't forget these, too
    check=True, text=True)

# XXX Probably better avoid shell=True
# but this is nominally correct
fixed_but_fugly = subprocess.run('dig +short stackoverflow.com',
    shell=True,
    # Probably don't forget these, too
    check=True, text=True)

The common retort “but it works for me” is not a useful rebuttal unless you understand exactly under what circumstances it could stop working.

Refactoring Example

Very often, the features of the shell can be replaced with native Python code. Simple Awk or sed scripts should probably simply be translated to Python instead.

To partially illustrate this, here is a typical but slightly silly example which involves many shell features.

cmd = '''while read -r x;
   do ping -c 3 "$x" | grep 'round-trip min/avg/max'
   done <hosts.txt'''

# Trivial but horrible
results = subprocess.run(
    cmd, shell=True, universal_newlines=True, check=True)
print(results.stdout)

# Reimplement with shell=False
with open('hosts.txt') as hosts:
    for host in hosts:
        host = host.rstrip('\n')  # drop newline
        ping = subprocess.run(
             ['ping', '-c', '3', host],
             text=True,
             stdout=subprocess.PIPE,
             check=True)
        for line in ping.stdout.split('\n'):
             if 'round-trip min/avg/max' in line:
                 print('{}: {}'.format(host, line))

Some things to note here:

  • With shell=False you don’t need the quoting that the shell requires around strings. Putting quotes anyway is probably an error.
  • It often makes sense to run as little code as possible in a subprocess. This gives you more control over execution from within your Python code.
  • Having said that, complex shell pipelines are tedious and sometimes challenging to reimplement in Python.

The refactored code also illustrates just how much the shell really does for you with a very terse syntax — for better or for worse. Python says explicit is better than implicit but the Python code is rather verbose and arguably looks more complex than this really is. On the other hand, it offers a number of points where you can grab control in the middle of something else, as trivially exemplified by the enhancement that we can easily include the host name along with the shell command output. (This is by no means challenging to do in the shell, either, but at the expense of yet another diversion and perhaps another process.)

Common Shell Constructs

For completeness, here are brief explanations of some of these shell features, and some notes on how they can perhaps be replaced with native Python facilities.

  • Globbing aka wildcard expansion can be replaced with glob.glob() or very often with simple Python string comparisons like for file in os.listdir('.'): if not file.endswith('.png'): continue. Bash has various other expansion facilities like .{png,jpg} brace expansion and {1..100} as well as tilde expansion (~ expands to your home directory, and more generally ~account to the home directory of another user)
  • Shell variables like $SHELL or $my_exported_var can sometimes simply be replaced with Python variables. Exported shell variables are available as e.g. os.environ['SHELL'] (the meaning of export is to make the variable available to subprocesses — a variable which is not available to subprocesses will obviously not be available to Python running as a subprocess of the shell, or vice versa. The env= keyword argument to subprocess methods allows you to define the environment of the subprocess as a dictionary, so that’s one way to make a Python variable visible to a subprocess). With shell=False you will need to understand how to remove any quotes; for example, cd "$HOME" is equivalent to os.chdir(os.environ['HOME']) without quotes around the directory name. (Very often cd is not useful or necessary anyway, and many beginners omit the double quotes around the variable and get away with it until one day …)
  • Redirection allows you to read from a file as your standard input, and write your standard output to a file. grep 'foo' <inputfile >outputfile opens outputfile for writing and inputfile for reading, and passes its contents as standard input to grep, whose standard output then lands in outputfile. This is not generally hard to replace with native Python code.
  • Pipelines are a form of redirection. echo foo | nl runs two subprocesses, where the standard output of echo is the standard input of nl (on the OS level, in Unix-like systems, this is a single file handle). If you cannot replace one or both ends of the pipeline with native Python code, perhaps think about using a shell after all, especially if the pipeline has more than two or three processes (though look at the pipes module in the Python standard library or a number of more modern and versatile third-party competitors).
  • Job control lets you interrupt jobs, run them in the background, return them to the foreground, etc. The basic Unix signals to stop and continue a process are of course available from Python, too. But jobs are a higher-level abstraction in the shell which involve process groups etc which you have to understand if you want to do something like this from Python.
  • Quoting in the shell is potentially confusing until you understand that everything is basically a string. So ls -l / is equivalent to 'ls' '-l' '/' but the quoting around literals is completely optional. Unquoted strings which contain shell metacharacters undergo parameter expansion, whitespace tokenization and wildcard expansion; double quotes prevent whitespace tokenization and wildcard expansion but allow parameter expansions (variable substitution, command substitution, and backslash processing). This is simple in theory but can get bewildering, especially when there are several layers of interpretation (a remote shell command, for example).

Understand differences between sh and Bash

subprocess runs your shell commands with /bin/sh unless you specifically request otherwise (except of course on Windows, where it uses the value of the COMSPEC variable). This means that various Bash-only features like arrays, [[ etc are not available.

If you need to use Bash-only syntax, you can pass in the path to the shell as executable='/bin/bash' (where of course if your Bash is installed somewhere else, you need to adjust the path).

subprocess.run('''
    # This for loop syntax is Bash only
    for((i=1;i<=$#;i++)); do
        # Arrays are Bash-only
        array[i]+=123
    done''',
    shell=True, check=True,
    executable='/bin/bash')

A subprocess is separate from its parent, and cannot change it

A somewhat common mistake is doing something like

subprocess.run('cd /tmp', shell=True)
subprocess.run('pwd', shell=True)  # Oops, doesn't print /tmp

The same thing will happen if the first subprocess tries to set an environment variable, which of course will have disappeared when you run another subprocess, etc.

A child process runs completely separate from Python, and when it finishes, Python has no idea what it did (apart from the vague indicators that it can infer from the exit status and output from the child process). A child generally cannot change the parent’s environment; it cannot set a variable, change the working directory, or, in so many words, communicate with its parent without cooperation from the parent.

The immediate fix in this particular case is to run both commands in a single subprocess;

subprocess.run('cd /tmp; pwd', shell=True)

though obviously this particular use case isn’t very useful; instead, use the cwd keyword argument, or simply os.chdir() before running the subprocess. Similarly, for setting a variable, you can manipulate the environment of the current process (and thus also its children) via

os.environ['foo'] = 'bar'

or pass an environment setting to a child process with

subprocess.run('echo "$foo"', shell=True, env={'foo': 'bar'})

(not to mention the obvious refactoring subprocess.run(['echo', 'bar']); but echo is a poor example of something to run in a subprocess in the first place, of course).

Don’t run Python from Python

This is slightly dubious advice; there are certainly situations where it does make sense or is even an absolute requirement to run the Python interpreter as a subprocess from a Python script. But very frequently, the correct approach is simply to import the other Python module into your calling script and call its functions directly.

If the other Python script is under your control, and it isn’t a module, consider turning it into one. (This answer is too long already so I will not delve into details here.)

If you need parallelism, you can run Python functions in subprocesses with the multiprocessing module. There is also threading which runs multiple tasks in a single process (which is more lightweight and gives you more control, but also more constrained in that threads within a process are tightly coupled, and bound to a single GIL.)


回答 2

用子流程调用

import subprocess
subprocess.Popen("cwm --rdf test.rdf --ntriples > test.nt")

您收到的错误似乎是由于服务器上没有交换模块,您应该在服务器上安装交换,然后再次运行脚本

Call it with subprocess

import subprocess
subprocess.Popen("cwm --rdf test.rdf --ntriples > test.nt")

The error you are getting seems to be because there is no swap module on the server, you should install swap on the server then run the script again


回答 3

可以使用带有参数-c的bash程序来执行命令:

bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
output = subprocess.check_output(['bash','-c', bashCommand])

It is possible you use the bash program, with the parameter -c for execute the commands:

bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
output = subprocess.check_output(['bash','-c', bashCommand])

回答 4

您可以使用subprocess,但是我始终觉得这不是一种“ Pythonic”方式。因此,我创建了Sultan(无耻插件),使运行命令行功能变得容易。

https://github.com/aeroxis/sultan

You can use subprocess, but I always felt that it was not a ‘Pythonic’ way of doing it. So I created Sultan (shameless plug) that makes it easy to run command line functions.

https://github.com/aeroxis/sultan


回答 5

根据错误,您在服务器上缺少名为swap的软件包。这/usr/bin/cwm需要它。如果您使用的是Ubuntu / Debian,请python-swap使用aptitude 安装。

According to the error you are missing a package named swap on the server. This /usr/bin/cwm requires it. If you’re on Ubuntu/Debian, install python-swap using aptitude.


回答 6

您也可以使用“ os.popen”。例:

import os

command = os.popen('ls -al')
print(command.read())
print(command.close())

输出:

total 16
drwxr-xr-x 2 root root 4096 ago 13 21:53 .
drwxr-xr-x 4 root root 4096 ago 13 01:50 ..
-rw-r--r-- 1 root root 1278 ago 13 21:12 bot.py
-rw-r--r-- 1 root root   77 ago 13 21:53 test.py

None

Also you can use ‘os.popen’. Example:

import os

command = os.popen('ls -al')
print(command.read())
print(command.close())

Output:

total 16
drwxr-xr-x 2 root root 4096 ago 13 21:53 .
drwxr-xr-x 4 root root 4096 ago 13 01:50 ..
-rw-r--r-- 1 root root 1278 ago 13 21:12 bot.py
-rw-r--r-- 1 root root   77 ago 13 21:53 test.py

None

回答 7

要在没有外壳的情况下运行命令,请将命令作为列表传递,并使用[subprocess]以下命令在Python中实现重定向:

#!/usr/bin/env python
import subprocess

with open('test.nt', 'wb', 0) as file:
    subprocess.check_call("cwm --rdf test.rdf --ntriples".split(),
                          stdout=file)

注意:最后没有> test.ntstdout=file实现重定向。


要在Python中使用Shell运行命令,请将命令作为字符串传递并启用shell=True

#!/usr/bin/env python
import subprocess

subprocess.check_call("cwm --rdf test.rdf --ntriples > test.nt",
                      shell=True)

这是外壳程序,负责输出重定向(> test.nt在命令中)。


要运行使用bashisms的bash命令,请显式指定bash可执行文件,例如,模拟bash进程替换

#!/usr/bin/env python
import subprocess

subprocess.check_call('program <(command) <(another-command)',
                      shell=True, executable='/bin/bash')

To run the command without a shell, pass the command as a list and implement the redirection in Python using [subprocess]:

#!/usr/bin/env python
import subprocess

with open('test.nt', 'wb', 0) as file:
    subprocess.check_call("cwm --rdf test.rdf --ntriples".split(),
                          stdout=file)

Note: no > test.nt at the end. stdout=file implements the redirection.


To run the command using the shell in Python, pass the command as a string and enable shell=True:

#!/usr/bin/env python
import subprocess

subprocess.check_call("cwm --rdf test.rdf --ntriples > test.nt",
                      shell=True)

Here’s the shell is responsible for the output redirection (> test.nt is in the command).


To run a bash command that uses bashisms, specify the bash executable explicitly e.g., to emulate bash process substitution:

#!/usr/bin/env python
import subprocess

subprocess.check_call('program <(command) <(another-command)',
                      shell=True, executable='/bin/bash')

回答 8

执行此操作的pythonic方法是使用 subprocess.Popen

subprocess.Popen 接受一个列表,其中第一个元素是要运行的命令,后跟任何命令行参数。

举个例子:

import subprocess

args = ['echo', 'Hello!']
subprocess.Popen(args) // same as running `echo Hello!` on cmd line

args2 = ['echo', '-v', '"Hello Again"']
subprocess.Popen(args2) // same as running 'echo -v "Hello Again!"` on cmd line

The pythonic way of doing this is using subprocess.Popen

subprocess.Popen takes a list where the first element is the command to be run followed by any command line arguments.

As an example:

import subprocess

args = ['echo', 'Hello!']
subprocess.Popen(args) // same as running `echo Hello!` on cmd line

args2 = ['echo', '-v', '"Hello Again"']
subprocess.Popen(args2) // same as running 'echo -v "Hello Again!"` on cmd line

在此平台上不支持filename.whl的滚轮

问题:在此平台上不支持filename.whl的滚轮

我想安装scipy-0.15.1-cp33-none-win_amd64.whl已保存到本地驱动器的文件。我在用:

pip 6.0.8 from C:\Python27\Lib\site-packages
python 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit (AMD64)]

当我跑步时:

pip install scipy-0.15.1-cp33-none-win_amd64.whl

我收到以下错误:

scipy-0.15.1-cp33-none-win_amd64.whl is not supported wheel on this platform

我想知道是什么问题?

I would like to install scipy-0.15.1-cp33-none-win_amd64.whl that I have saved to local drive. I am using:

pip 6.0.8 from C:\Python27\Lib\site-packages
python 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit (AMD64)]

when I run:

pip install scipy-0.15.1-cp33-none-win_amd64.whl

I get the following error:

scipy-0.15.1-cp33-none-win_amd64.whl is not supported wheel on this platform

I would like to know what the problem is?


回答 0

cp33意味着CPython 3.3您需要scipy‑0.15.1‑cp27‑none‑win_amd64.whl

cp33 means CPython 3.3 you need scipy‑0.15.1‑cp27‑none‑win_amd64.whl instead.


回答 1

这也可能是由于使用过时pip的车轮文件而引起的。

我非常困惑,因为我正在安装numpy-1.10.4+mkl-cp27-cp27m-win_amd64.whl(从此处开始),它绝对是我的Python安装(Windows 64位Python 2.7.11)的正确版本。我收到“此平台上不支持的车轮”错误。

升级点已python -m pip install --upgrade pip解决。

This can also be caused by using an out-of-date pip with a recent wheel file.

I was very confused, because I was installing numpy-1.10.4+mkl-cp27-cp27m-win_amd64.whl (from here), and it is definitely the correct version for my Python installation (Windows 64-bit Python 2.7.11). I got the “not supported wheel on this platform” error.

Upgrading pip with python -m pip install --upgrade pip solved it.


回答 2

安装scipy-0.17.0-cp35-none-win_amd64.whl时遇到相同的问题,我的Python版本是3.5。它返回了相同的错误消息:

 scipy-0.17.0-cp35-none-win_amd64.whl is not supported wheel on this platform.

我意识到amd64与Windows不相关,而与Python版本有关。实际上,我在64位Windows上使用32位Python。安装以下文件解决了该问题:

scipy-0.17.0-cp35-none-win32.whl

I had the same problem while installing scipy-0.17.0-cp35-none-win_amd64.whl and my Python version is 3.5. It returned the same error message:

 scipy-0.17.0-cp35-none-win_amd64.whl is not supported wheel on this platform.

I realized that amd64 is not about my Windows, but about the Python version. Actually I am using a 32 bit Python on a 64 bit Windows. Installing the following file solved the issue:

scipy-0.17.0-cp35-none-win32.whl

回答 3

之所以遇到此问题,是因为scipy-0.17.0-cp27-none-win_amd64 (1)删除了(1)并将包更改为后,我的包()的名称错误scipy-0.17.0-cp27-none-win_amd64,问题得以解决。

I come across this problem because the wrong name of my package (scipy-0.17.0-cp27-none-win_amd64 (1)), after I delete the ‘(1)’ and change the package to scipy-0.17.0-cp27-none-win_amd64, the problem got resolved.


回答 4

如果您是python的新手,请分步阅读或直接直接进入第5步。请按照以下方法在Windows 64位,Python 64位上安装scipy 0.18.1。 注意以下版本: 1. Python 2. Windows 3. .whl版本的numpy和scipy文件4. 首先安装numpy,然后安装scipy。

pip install FileName.whl
  1. ForNumpy:http: //www.lfd.uci.edu/~gohlke/pythonlibs/#numpy ForScipy:http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy

注意文件名(我的意思是检查cp no)。例如:scipy-0.18.1-cp35-cp35m-win_amd64.whl要检查您的点子支持哪个cp,请转到下面的第2点。

如果您正在使用.whl文件。可能会发生以下错误。

  1. 您正在使用pip版本7.1.0,但是版本8.1.2可用。

您应该考虑通过’python -m pip install –upgrade pip’命令进行升级

  1. 在此平台上不支持scipy-0.15.1-cp33-none-win_amd64.whl.whl

对于上述错误:启动Python(以我的情况为3.5),键入: import pip print(pip.pep425tags.get_supported())

输出:

[(’cp35’,’cp35m’,’win_amd64’),(’cp35’,’none’,’win_amd64’),(’py3’,’none’,’win_amd64’),(’cp35’,’none ‘,’any’),(’cp3’,’none’,’any’),(’py35’,’none’,’any’),(’py3’,’none’,’any’),( ‘py34’,’none’,’any’),(’py33’,’none’,’any’),(’py32’,’none’,’any’),(’py31’,’none’, ‘any’),(’py30’,’none’,’any’)]

在输出中,您将看到cp35在那里,因此为numpy和scipy下载cp35。欢迎进一步编辑!!!!

If you are totally new to python read step by step or go directly to 5th step directly. Follow the below method to install scipy 0.18.1 on Windows 64-bit , Python 64-bit . Be careful with the versions of 1. Python 2. Windows 3. .whl version of numpy and scipy files 4. First install numpy and then scipy.

pip install FileName.whl
  1. ForNumpy:http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy ForScipy:http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy

Be aware of the file name ( what I mean is check the cp no). Ex :scipy-0.18.1-cp35-cp35m-win_amd64.whl To check which cp is supported by your pip , go to point No 2 below.

If you are using .whl file . Following errors are likely to occur .

  1. You are using pip version 7.1.0, however version 8.1.2 is available.

You should consider upgrading via the ‘python -m pip install –upgrade pip’ command

  1. scipy-0.15.1-cp33-none-win_amd64.whl.whl is not supported wheel on this platform

For the above error : start Python(in my case 3.5), type : import pip print(pip.pep425tags.get_supported())

output :

[(‘cp35’, ‘cp35m’, ‘win_amd64’), (‘cp35’, ‘none’, ‘win_amd64’), (‘py3’, ‘none’, ‘win_amd64’), (‘cp35’, ‘none’, ‘any’), (‘cp3’, ‘none’, ‘any’), (‘py35’, ‘none’, ‘any’), (‘py3’, ‘none’, ‘any’), (‘py34’, ‘none’, ‘any’), (‘py33’, ‘none’, ‘any’), (‘py32’, ‘none’, ‘any’), (‘py31’, ‘none’, ‘any’), (‘py30’, ‘none’, ‘any’)]

In the output you will observe cp35 is there , so download cp35 for numpy as well as scipy. Further edits are most welcome !!!!


回答 5

将文件名更改为scipy-0.15.1-cp33-none-any.whl,然后运行以下命令:

pip install scipy-0.15.1-cp33-none-any.whl

它应该工作:-)

Change the filename to scipy-0.15.1-cp33-none-any.whl and then run this command:

pip install scipy-0.15.1-cp33-none-any.whl

It should work :-)


回答 6

首先,cp33意味着在系统上运行Python 3.3时将使用它。因此,如果您的系统上装有Python 2.7,请尝试安装cp27版本。

安装scipy-0.18.1-cp27-cp27m-win_amd64.whl,需要运行python 2.7和64位系统。

如果仍然收到错误消息“此平台上不支持scipy-0.18.1-cp27-cp27m-win_amd64.whl”,请使用Win32版本。我的意思是安装scipy-0.18.1-cp27-cp27m-win32.whl而不是第一个。这是因为您可能在64位系统上运行32位python。最后一步为我成功安装了scipy。

First of all, cp33 means that it is to be used when you have Python 3.3 running on your system. So if you have Python 2.7 on your system, try installing the cp27 version.

Installing scipy-0.18.1-cp27-cp27m-win_amd64.whl, needs a Python 2.7 running and a 64-bit system.

If you are still getting an error saying “scipy-0.18.1-cp27-cp27m-win_amd64.whl is not a supported wheel on this platform”, then go for the win32 version. By this I mean install scipy-0.18.1-cp27-cp27m-win32.whl instead of the first one. This is because you might be running a 32-bit python on a 64-bit system. The last step successfully installed scipy for me.


回答 7

请注意,所有平台要求均来自* .whl文件的名称

因此,在重命名 * .whl软件包时要非常小心。我偶尔将我新编译的tensorflow包从

tensorflow-1.11.0-cp36-cp36m-linux_x86_64.whl

tensorflow-1.11.0-cp36-cp36m-linux_x86_64_gpu.whl

只是想提醒自己有关gpu支持的问题,

tensorflow-1.11.0-cp36-cp36m-linux_x86_64_gpu.whl在此平台上不受支持。

错误大约半小时。

Please do notice that all platform requirements are taken from the name of the *.whl file!

So be very careful with renaming of *.whl package. I occasionally renamed my newly compiled tensorflow package from

tensorflow-1.11.0-cp36-cp36m-linux_x86_64.whl

to

tensorflow-1.11.0-cp36-cp36m-linux_x86_64_gpu.whl

just to remind myself about gpu support and struggled with

tensorflow-1.11.0-cp36-cp36m-linux_x86_64_gpu.whl is not a supported wheel on this platform.

error for about half an hour.


回答 8

我尝试安装scikit-image,但是即使我安装的python版本是2.7 32位,当我尝试安装.whl文件时也遇到以下错误。 scikit_image-0.12.3-cp27-cp27m-win32.whl is not a supported wheel on this platform.

但是我在错误消息之前也收到了此消息:

You are using pip version 7.1.0, however version 8.1.2 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

然后,我运行命令python -m pip install --upgrade pip,然后正常pip install scikit_image-0.12.3-cp27-cp27m-win32.whl工作。我希望这可以帮助某人!

I tried to install scikit-image but got the following error when I tried to install the .whl file even though my installed version of python was 2.7 32-bit. scikit_image-0.12.3-cp27-cp27m-win32.whl is not a supported wheel on this platform.

However I also got this message before the error message:

You are using pip version 7.1.0, however version 8.1.2 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

I then ran the command python -m pip install --upgrade pip and then pip install scikit_image-0.12.3-cp27-cp27m-win32.whl worked fine. I hope this can help someone!


回答 9

我在Windows 7 64bit上为python27安装64位版本时遇到了类似的问题。一切都是最新的,但我得到了消息

scipy-0.18.1-cp27-cp27m-win_amd64.whl is not supported wheel on this platform

比我下载了32位的whl,它起作用了。

pip install scipy-0.18.1-cp27-cp27m-win32.whl

我怀疑问题可能出在我没有AMD处理器,而不是Intel处理器,而scipy 64位版本最后说的是amd64。

I had similar problem, installing a 64-bit version for python27 on windows 7 64bit. Everything was up-to-date, yet I got the message

scipy-0.18.1-cp27-cp27m-win_amd64.whl is not supported wheel on this platform

Than I donwloaded a 32-bit whl and it worked.

pip install scipy-0.18.1-cp27-cp27m-win32.whl

I suspect that the problem was probably that I dont have an AMD processor, rather and intel one, and the scipy 64bit version says amd64 in the end.


回答 10

检查事项:

  1. 您正在下载正确的版本,例如cp27(适用于python 2.7),cp36(适用于python 3.6)。
  2. 检查您的python是哪种架构(32位或64位)?(您可以通过打开python idle并输入来做到这一点)

    import platform  
    platform.architecture()

现在,无论您的系统体系结构如何,都下载该文件。

  1. 检查您使用的文件名是否正确(即,不应在文件名后附加(1),如果您两次下载文件,可能会出现这种情况)

  2. 检查您的点数是否已更新。如果没有,你可以使用

    python -m pip install -upgrade pip

Things to check:

  1. You are downloading proper version like cp27 (means for python 2.7) cp36(means for python 3.6).
  2. Check of which architecture (32 bit or 64 bit) your python is? (you can do it so by opening python idle and typing)

    import platform  
    platform.architecture()
    

Now download the file of that bit irrespective of your system architecture.

  1. Check whether you’re using the correct filename (i.e it should not be appended with (1) which might happen if you download the file twice)

  2. Check if your pip is updated or not. If not you can use

    python -m pip install -upgrade pip


回答 11

我正在IIS上使用Python34部署Flask。以下步骤对我有用

  1. 升级点
  2. 安装numpy的wheel文件
  3. 点安装熊猫

I’m deploying Flask using Python34 on IIS. The following steps worked for me

  1. Upgrade pip
  2. Install the wheel file for numpy
  3. pip install pandas

回答 12

对于将dlib安装到python [Python 3.6.9]中的情况,我发现将WHL文件名从dlib-19.8.1-cp36-cp36m-win_amd64.whl更改dlib-19.8.1-cp36-none-any .whl为我工作。

这是我运行pip install来安装dlib的方法:

pip3安装dlib-19.8.1-cp36-none-any.whl

但是,我仍然想知道是否可以通过pip命令安装WHL文件而不更改名称。

For my case with dlib installation into my python [Python 3.6.9], I have found that changing WHL file name from dlib-19.8.1-cp36-cp36m-win_amd64.whl to dlib-19.8.1-cp36-none-any.whl works for me.

Here is the way I run pip install to install dlib:

pip3 install dlib-19.8.1-cp36-none-any.whl

However, I still wonder whether there are any alternatives to install of WHL file by pip command without changing the name.


回答 13

尝试使用conda进行安装,似乎可以即时解析版本:
conda install scikit-learn

try conda for installation, seems to resolve versions on the fly:
conda install scikit-learn


回答 14

仅当您的系统上有多个python(例如2.7 / 3.4 / 3.5)时,就需要检查安装路径。:)

Simply if you have more than one python on your system for example 2.7/3.4/3.5, it’s necessary you check your installation path. :)


回答 15

我正在使用Python2.7和Windows 64位系统。我lxml-3.8.0-cp27-cp27m-win_amd64.whl在执行pip install lxml-3.8.0-cp27-cp27m-win_amd64.whl “运行” 时遇到相同的错误pip install lxml,它会自动检测并成功安装Win32版本(尽管我的系统是Windows-64bit)

C:\Python27>pip install lxml
Collecting lxml
  Downloading lxml-3.8.0-cp27-cp27m-win32.whl (2.9MB)
    100% |################################| 2.9MB 20kB/s
Installing collected packages: lxml
Successfully installed lxml-3.8.0

因此,我将使用@ 1man的答案。

I am using Python2.7 and Windows 64-bit system. I was getting the same error for lxml-3.8.0-cp27-cp27m-win_amd64.whl while doing pip install lxml-3.8.0-cp27-cp27m-win_amd64.whl Run pip install lxml and it auto-detected and successfully installed the win32 version (though my system is Windows-64bit)

C:\Python27>pip install lxml
Collecting lxml
  Downloading lxml-3.8.0-cp27-cp27m-win32.whl (2.9MB)
    100% |################################| 2.9MB 20kB/s
Installing collected packages: lxml
Successfully installed lxml-3.8.0

So, I will go with @1man’s answer.


回答 16

在Tensorflow配置期间,我指定了python3.6。但是我系统上的默认python是python2.7。因此,在我的情况下,点子等于2.7点子。为了我

pip3 install /tmp/tensorflow_pkg/NAME.whl

做到了。

During Tensorflow configuration I specified python3.6. But default python on my system is python2.7. Thus pip in my case means pip for 2.7. For me

pip3 install /tmp/tensorflow_pkg/NAME.whl

did the trick.


回答 17

在我的情况下[Win64,Python 2.7,cygwin],问题是缺少一个gcc

使用 apt-cyg install gcc-core使我能够pip2 wheel ...自动安装车轮。

In my case [Win64, Python 2.7, cygwin] the issue was with a missing gcc.

Using apt-cyg install gcc-core enabled me to then use pip2 wheel ... to install my wheels automatically.


回答 18

最好检查要在其中安装软件包的python版本。如果转轮是为python3构建的,而您的python版本是python2.x,则可能会出现此错误。使用pip安装时,请遵循以下约定

python2 -m pip install XXXXXX.whl #if .whl is for python2
python3 -m pip install XXXXXX.whl #if .whl is for python3

It’s better to check the version of python where you want to install your package. If the wheel was built for python3 and your python version is python2.x you may get this error. While installing using pip follow this convention

python2 -m pip install XXXXXX.whl #if .whl is for python2
python3 -m pip install XXXXXX.whl #if .whl is for python3

回答 19

就我而言,这与以前未安装GDAL内核有关。有关如何安装GDAL和底图库的指南,请访问:https : //github.com/felipunky/GISPython/blob/master/README.md

In my case it had to do with not having installed previously the GDAL core. For a guide on how to install the GDAL and Basemap libraries go to: https://github.com/felipunky/GISPython/blob/master/README.md


回答 20

对我来说,当我选择正确的Python版本而不是计算机版本之一时,它可以工作。

我的是32位的,而我的计算机是64位的。这就是问题所在,并修复了32位版本。

确切地说,这是我下载并为我工作的一个:

mysqlclient-1.3.13-cp37-cp37m-win32.whl

再一次,请确保选择位的python版本,而不是系统版本。

For me, it worked when I selected the correct bit of my Python version, NOT the one of my computer version.

Mine is 32bit, and my computer is 64bit. That was the problem and the 32bit version of fixed it.

To be exact, here is the one that I downloaded and worked for me:

mysqlclient-1.3.13-cp37-cp37m-win32.whl

Once again, just make sure to chose your python version of bits and not your system one.


回答 21

我尝试了上面的一堆东西,无济于事。

以前,我已升级到pip 18.1。

尝试时(对于pyFltk)保持以下错误:

从fltk import *

ImportError:DLL加载失败%1不是有效的Win32应用程序

我的机器不支持* .whl文件,或者无法从distutils中删除正确的文件,这引起了各种各样的错误。

回到我的笔记,他们指出,whl文件:

pyFltk-1.3.3.1-cp36-cp36m-win_amd64.whl,但我一直收到错误,所以…

它要求安装pip 9.0.3。

我将点子的版本降级到9.0.3

点安装pip = 9.0.3

并且.whl文件已正确安装。

这也涉及到:这里

I tried a bunch of the stuff above to no avail.

Previously, I upgraded to pip 18.1.

Kept getting the following error when trying (for pyFltk):

from fltk import *

ImportError: DLL load failed %1 is not a valid Win32 Application

I was getting all sorts of errors about the *.whl file not being supported by my machine or something about being unable to remove the correct files from distutils.

Went back to my notes and they indicated that the whl file:

pyFltk-1.3.3.1-cp36-cp36m-win_amd64.whl but I kept getting the error above sooo…

it required pip 9.0.3 to install.

I downgraded my version of pip to 9.0.3

pip install pip=9.0.3

and the .whl file installed properly.

This is also related to: here


回答 22

我正在尝试验证在Python 3.6上新创建的虚拟环境中此处指定的TensorFlow的安装。运行时:

pip3 install --ignore-installed --upgrade "/Users/Salman/Downloads/tensorflow-1.12.0-cp37-cp37m-macosx_10_13_x86_64.whl"

我收到错误和/或警告:

tensorflow-1.12.0-cp37-cp37m-macosx_10_13_x86_64.whl is not a supported wheel on this platform.

因为我以前从升级点子PIP 3,我简单地更换PIPPIP 3,如下所示:

pip3 install --ignore-installed --upgrade "/Users/Salman/Downloads/tensorflow-1.12.0-cp37-cp37m-macosx_10_13_x86_64.whl"

它就像一个魅力!

I was trying to verify the installation of TensorFlow as specified here on a newly created virtual environment on Python 3.6. On running:

pip3 install --ignore-installed --upgrade "/Users/Salman/Downloads/tensorflow-1.12.0-cp37-cp37m-macosx_10_13_x86_64.whl"

I get the error and/or warning:

tensorflow-1.12.0-cp37-cp37m-macosx_10_13_x86_64.whl is not a supported wheel on this platform.

Since I had previously upgraded from pip to pip3, I simply replaced pip with pip3 as in:

pip3 install --ignore-installed --upgrade "/Users/Salman/Downloads/tensorflow-1.12.0-cp37-cp37m-macosx_10_13_x86_64.whl"

and it worked like a charm!


回答 23

我有同样的问题

我从https://pypi.org/project/pip/#files下载了最新的pip

然后…。pip安装<<下载的文件位置>>

pygame和kivy安装成功了…谢谢… !!

I had the same problem

I downloaded latest pip from https://pypi.org/project/pip/#files

and then…. pip install << downloaded file location >>

then pygame and kivy installation worked… Thanks…!!


回答 24

好的,问题很容易。Tensorflow需要python 3.4-3.7和64bit。我看到比您正在使用python 2.7。

在此处阅读tensorflow安装说明:https ://www.tensorflow.org/install/pip

Alright, the problem is easy. Tensorflow require python 3.4 – 3.7 and 64bit. I see than you’re using python 2.7.

Read the tensorflow install instructions here: https://www.tensorflow.org/install/pip


如何对Python中的URL参数进行百分比编码?

问题:如何对Python中的URL参数进行百分比编码?

如果我做

url = "http://example.com?p=" + urllib.quote(query)
  1. 它不编码/%2F(破坏OAuth规范化)
  2. 它不处理Unicode(引发异常)

有没有更好的图书馆?

If I do

url = "http://example.com?p=" + urllib.quote(query)
  1. It doesn’t encode / to %2F (breaks OAuth normalization)
  2. It doesn’t handle Unicode (it throws an exception)

Is there a better library?


回答 0

Python 2

文档

urllib.quote(string[, safe])

使用%xx转义符替换字符串中的特殊字符。字母,数字和字符“ _.-”都不会被引用。默认情况下,此函数用于引用URL的路径部分。可选的safe参数指定不应引用的其他字符- 其默认值为’/’

这意味着通过“安全”将解决您的第一个问题:

>>> urllib.quote('/test')
'/test'
>>> urllib.quote('/test', safe='')
'%2Ftest'

关于第二个问题,有关于它的bug报告在这里。显然,它已在python 3中修复。您可以通过编码为utf8来解决此问题,如下所示:

>>> query = urllib.quote(u"Müller".encode('utf8'))
>>> print urllib.unquote(query).decode('utf8')
Müller

顺便看看urlencode

Python 3

相同的,除了更换urllib.quoteurllib.parse.quote

Python 2

From the docs:

urllib.quote(string[, safe])

Replace special characters in string using the %xx escape. Letters, digits, and the characters ‘_.-‘ are never quoted. By default, this function is intended for quoting the path section of the URL.The optional safe parameter specifies additional characters that should not be quoted — its default value is ‘/’

That means passing ” for safe will solve your first issue:

>>> urllib.quote('/test')
'/test'
>>> urllib.quote('/test', safe='')
'%2Ftest'

About the second issue, there is a bug report about it here. Apparently it was fixed in python 3. You can workaround it by encoding as utf8 like this:

>>> query = urllib.quote(u"Müller".encode('utf8'))
>>> print urllib.unquote(query).decode('utf8')
Müller

By the way have a look at urlencode

Python 3

The same, except replace urllib.quote with urllib.parse.quote.


回答 1

在Python 3中,urllib.quote已移至,urllib.parse.quote并且默认情况下确实处理unicode。

>>> from urllib.parse import quote
>>> quote('/test')
'/test'
>>> quote('/test', safe='')
'%2Ftest'
>>> quote('/El Niño/')
'/El%20Ni%C3%B1o/'

In Python 3, urllib.quote has been moved to urllib.parse.quote and it does handle unicode by default.

>>> from urllib.parse import quote
>>> quote('/test')
'/test'
>>> quote('/test', safe='')
'%2Ftest'
>>> quote('/El Niño/')
'/El%20Ni%C3%B1o/'

回答 2

我的答案类似于保罗的答案。

我认为模块requests要好得多。它基于urllib3。您可以尝试以下方法:

>>> from requests.utils import quote
>>> quote('/test')
'/test'
>>> quote('/test', safe='')
'%2Ftest'

My answer is similar to Paolo’s answer.

I think module requests is much better. It’s based on urllib3. You can try this:

>>> from requests.utils import quote
>>> quote('/test')
'/test'
>>> quote('/test', safe='')
'%2Ftest'

回答 3

如果您使用的是django,则可以使用urlquote:

>>> from django.utils.http import urlquote
>>> urlquote(u"Müller")
u'M%C3%BCller'

请注意,自发布此答案以来对Python的更改意味着它现在是旧版包装器。从django.utils.http的Django 2.1源代码中:

A legacy compatibility wrapper to Python's urllib.parse.quote() function.
(was used for unicode handling on Python 2)

If you’re using django, you can use urlquote:

>>> from django.utils.http import urlquote
>>> urlquote(u"Müller")
u'M%C3%BCller'

Note that changes to Python since this answer was published mean that this is now a legacy wrapper. From the Django 2.1 source code for django.utils.http:

A legacy compatibility wrapper to Python's urllib.parse.quote() function.
(was used for unicode handling on Python 2)

回答 4

最好在urlencode这里使用。单个参数没有太大区别,但是恕我直言使代码更清晰。(看一个函数看起来很混乱quote_plus!尤其是那些来自其他语言的函数)

In [21]: query='lskdfj/sdfkjdf/ksdfj skfj'

In [22]: val=34

In [23]: from urllib.parse import urlencode

In [24]: encoded = urlencode(dict(p=query,val=val))

In [25]: print(f"http://example.com?{encoded}")
http://example.com?p=lskdfj%2Fsdfkjdf%2Fksdfj+skfj&val=34

文件

urlencode:https//docs.python.org/3/library/urllib.parse.html#urllib.parse.urlencode

quote_plus:https ://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote_plus

It is better to use urlencode here. Not much difference for single parameter but IMHO makes the code clearer. (It looks confusing to see a function quote_plus! especially those coming from other languates)

In [21]: query='lskdfj/sdfkjdf/ksdfj skfj'

In [22]: val=34

In [23]: from urllib.parse import urlencode

In [24]: encoded = urlencode(dict(p=query,val=val))

In [25]: print(f"http://example.com?{encoded}")
http://example.com?p=lskdfj%2Fsdfkjdf%2Fksdfj+skfj&val=34

Docs

urlencode: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlencode

quote_plus: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote_plus


NameError:全局名称“ xrange”未在Python 3中定义

问题:NameError:全局名称“ xrange”未在Python 3中定义

运行python程序时出现错误:

Traceback (most recent call last):
  File "C:\Program Files (x86)\Wing IDE 101 4.1\src\debug\tserver\_sandbox.py", line 110, in <module>
  File "C:\Program Files (x86)\Wing IDE 101 4.1\src\debug\tserver\_sandbox.py", line 27, in __init__
  File "C:\Program Files (x86)\Wing IDE 101 4.1\src\debug\tserver\class\inventory.py", line 17, in __init__
builtins.NameError: global name 'xrange' is not defined

游戏是从这里开始的

是什么导致此错误?

I am getting an error when running a python program:

Traceback (most recent call last):
  File "C:\Program Files (x86)\Wing IDE 101 4.1\src\debug\tserver\_sandbox.py", line 110, in <module>
  File "C:\Program Files (x86)\Wing IDE 101 4.1\src\debug\tserver\_sandbox.py", line 27, in __init__
  File "C:\Program Files (x86)\Wing IDE 101 4.1\src\debug\tserver\class\inventory.py", line 17, in __init__
builtins.NameError: global name 'xrange' is not defined

The game is from here.

What causes this error?


回答 0

您正在尝试使用Python 3运行Python 2代码库。在Python 3中xrange()已重命名为range()

而是使用Python 2运行游戏。不要试图将它移植,除非你知道自己在做什么,很可能会出现超越更多的问题xrange()range()

作为记录,您看到的不是语法错误,而是运行时异常。


如果您确实知道自己在做什么,并且正在积极地使Python 2代码库与Python 3兼容,则可以通过将全局名称添加为模块的别名来桥接代码range。(请注意,您可能必须更新range()Python 2代码库中的所有现有用法,list(range(...))以确保仍然在Python 3中获得列表对象):

try:
    # Python 2
    xrange
except NameError:
    # Python 3, xrange is now named range
    xrange = range

# Python 2 code that uses xrange(...) unchanged, and any
# range(...) replaced with list(range(...))

或更换的所有用途xrange(...)range(...)在代码库,然后使用不同的垫片,使与Python 2的Python语法3兼容:

try:
    # Python 2 forward compatibility
    range = xrange
except NameError:
    pass

# Python 2 code transformed from range(...) -> list(range(...)) and
# xrange(...) -> range(...).

对于希望长期与Python 3兼容的代码库而言,后者是更可取的,因此只要有可能,便更容易使用Python 3语法。

You are trying to run a Python 2 codebase with Python 3. xrange() was renamed to range() in Python 3.

Run the game with Python 2 instead. Don’t try to port it unless you know what you are doing, most likely there will be more problems beyond xrange() vs. range().

For the record, what you are seeing is not a syntax error but a runtime exception instead.


If you do know what your are doing and are actively making a Python 2 codebase compatible with Python 3, you can bridge the code by adding the global name to your module as an alias for range. (Take into account that you may have to update any existing range() use in the Python 2 codebase with list(range(...)) to ensure you still get a list object in Python 3):

try:
    # Python 2
    xrange
except NameError:
    # Python 3, xrange is now named range
    xrange = range

# Python 2 code that uses xrange(...) unchanged, and any
# range(...) replaced with list(range(...))

or replace all uses of xrange(...) with range(...) in the codebase and then use a different shim to make the Python 3 syntax compatible with Python 2:

try:
    # Python 2 forward compatibility
    range = xrange
except NameError:
    pass

# Python 2 code transformed from range(...) -> list(range(...)) and
# xrange(...) -> range(...).

The latter is preferable for codebases that want to aim to be Python 3 compatible only in the long run, it is easier to then just use Python 3 syntax whenever possible.


回答 1

添加xrange=range您的代码:)对我有用。

add xrange=range in your code :) It works to me.


回答 2

我加入这个解决进口问题的
更多信息

from past.builtins import xrange

I solved the issue by adding this import
More info

from past.builtins import xrange

回答 3

在python 2.x中,xrange用于返回生成器,而range用于返回列表。在python 3.x中,xrange已被删除,并且range返回一个生成器,就像python 2.x中的xrange一样。因此,在python 3.x中,您需要使用range而不是xrange。

in python 2.x, xrange is used to return a generator while range is used to return a list. In python 3.x , xrange has been removed and range returns a generator just like xrange in python 2.x. Therefore, in python 3.x you need to use range rather than xrange.


回答 4

更换

Python 2 xrange

Python 3 range

休息都一样。

Replace

Python 2 xrange to

Python 3 range

Rest all same.


回答 5

我同意最后一个答案。但是还有另一种方法可以解决此问题。您可以下载名为future的软件包,例如pip install future。然后在.py文件中输入“ from past.builtins import xrange”。此方法用于文件中有很多xrange的情况。

I agree with the last answer.But there is another way to solve this problem.You can download the package named future,such as pip install future.And in your .py file input this “from past.builtins import xrange”.This method is for the situation that there are many xranges in your file.


熊猫-获取给定列的第一行值

问题:熊猫-获取给定列的第一行值

这似乎是一个非常简单的问题……但是我没有看到我期望的简单答案。

那么,如何获得Pandas中给定列的第n行的值?(我对第一行特别感兴趣,但也对更通用的做法感兴趣)。

例如,假设我想将Btime中的1.2值作为变量。

什么是正确的方法?

df_test =

  ATime   X   Y   Z   Btime  C   D   E
0    1.2  2  15   2    1.2  12  25  12
1    1.4  3  12   1    1.3  13  22  11
2    1.5  1  10   6    1.4  11  20  16
3    1.6  2   9  10    1.7  12  29  12
4    1.9  1   1   9    1.9  11  21  19
5    2.0  0   0   0    2.0   8  10  11
6    2.4  0   0   0    2.4  10  12  15

This seems like a ridiculously easy question… but I’m not seeing the easy answer I was expecting.

So, how do I get the value at an nth row of a given column in Pandas? (I am particularly interested in the first row, but would be interested in a more general practice as well).

For example, let’s say I want to pull the 1.2 value in Btime as a variable.

Whats the right way to do this?

df_test =

  ATime   X   Y   Z   Btime  C   D   E
0    1.2  2  15   2    1.2  12  25  12
1    1.4  3  12   1    1.3  13  22  11
2    1.5  1  10   6    1.4  11  20  16
3    1.6  2   9  10    1.7  12  29  12
4    1.9  1   1   9    1.9  11  21  19
5    2.0  0   0   0    2.0   8  10  11
6    2.4  0   0   0    2.4  10  12  15

回答 0

要选择该ith行,请使用iloc

In [31]: df_test.iloc[0]
Out[31]: 
ATime     1.2
X         2.0
Y        15.0
Z         2.0
Btime     1.2
C        12.0
D        25.0
E        12.0
Name: 0, dtype: float64

要在Btime列中选择第i个值,可以使用:

In [30]: df_test['Btime'].iloc[0]
Out[30]: 1.2

df_test['Btime'].iloc[0](推荐)和之间有区别df_test.iloc[0]['Btime']

DataFrames将数据存储在基于列的块中(每个块具有一个dtype)。如果先按列选择,则可以返回视图(比返回副本要快),并且保留原始dtype。相反,如果首先选择按行,并且DataFrame的列具有不同的dtype,则Pandas 将数据复制到新的Object dtype 系列中。因此,选择列比选择行要快一些。因此,虽然 df_test.iloc[0]['Btime']作品,df_test['Btime'].iloc[0]是多一点点效率。

在分配方面,两者之间存在很大差异。 df_test['Btime'].iloc[0] = x影响df_test,但df_test.iloc[0]['Btime'] 可能不会。有关原因的说明,请参见下文。由于索引顺序的细微差别会在行为上产生很大差异,因此最好使用单个索引分配:

df.iloc[0, df.columns.get_loc('Btime')] = x

df.iloc[0, df.columns.get_loc('Btime')] = x (推荐的):

为DataFrame分配新值的推荐方法避免链接索引,而应使用andrew所示的方法,

df.loc[df.index[n], 'Btime'] = x

要么

df.iloc[n, df.columns.get_loc('Btime')] = x

后一种方法要快一些,因为df.loc必须将行和列标签转换为位置索引,因此,如果使用df.iloc替代方法,则转换的必要性要少一些 。


df['Btime'].iloc[0] = x 可行,但不建议:

尽管这可行,但是它利用了当前实现DataFrames的方式。不能保证熊猫将来会以这种方式工作。特别是,它利用了以下事实:(当前)df['Btime']始终返回视图(而不是副本),因此df['Btime'].iloc[n] = x可用于在的列的第n个位置分配新值。Btimedf

由于Pandas无法明确保证索引器何时返回视图还是副本,因此使用链式索引的赋值通常会引发,SettingWithCopyWarning即使在这种情况下,赋值可以成功修改df

In [22]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])
In [24]: df['bar'] = 100
In [25]: df['bar'].iloc[0] = 99
/home/unutbu/data/binky/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)

In [26]: df
Out[26]: 
  foo  bar
0   A   99  <-- assignment succeeded
2   B  100
1   C  100

df.iloc[0]['Btime'] = x 不起作用:

相比之下,with的分配df.iloc[0]['bar'] = 123不起作用,因为df.iloc[0]正在返回副本:

In [66]: df.iloc[0]['bar'] = 123
/home/unutbu/data/binky/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [67]: df
Out[67]: 
  foo  bar
0   A   99  <-- assignment failed
2   B  100
1   C  100

警告:我之前曾建议过df_test.ix[i, 'Btime']。但这不能保证为您提供ith值,因为在尝试按位置索引之前先尝试ix标签索引。因此,如果DataFrame的整数索引不是从0开始的排序顺序,则using 将返回标有标签的行,而不是该行。例如,ix[i] iith

In [1]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])

In [2]: df
Out[2]: 
  foo
0   A
2   B
1   C

In [4]: df.ix[1, 'foo']
Out[4]: 'C'

To select the ith row, use iloc:

In [31]: df_test.iloc[0]
Out[31]: 
ATime     1.2
X         2.0
Y        15.0
Z         2.0
Btime     1.2
C        12.0
D        25.0
E        12.0
Name: 0, dtype: float64

To select the ith value in the Btime column you could use:

In [30]: df_test['Btime'].iloc[0]
Out[30]: 1.2

There is a difference between df_test['Btime'].iloc[0] (recommended) and df_test.iloc[0]['Btime']:

DataFrames store data in column-based blocks (where each block has a single dtype). If you select by column first, a view can be returned (which is quicker than returning a copy) and the original dtype is preserved. In contrast, if you select by row first, and if the DataFrame has columns of different dtypes, then Pandas copies the data into a new Series of object dtype. So selecting columns is a bit faster than selecting rows. Thus, although df_test.iloc[0]['Btime'] works, df_test['Btime'].iloc[0] is a little bit more efficient.

There is a big difference between the two when it comes to assignment. df_test['Btime'].iloc[0] = x affects df_test, but df_test.iloc[0]['Btime'] may not. See below for an explanation of why. Because a subtle difference in the order of indexing makes a big difference in behavior, it is better to use single indexing assignment:

df.iloc[0, df.columns.get_loc('Btime')] = x

df.iloc[0, df.columns.get_loc('Btime')] = x (recommended):

The recommended way to assign new values to a DataFrame is to avoid chained indexing, and instead use the method shown by andrew,

df.loc[df.index[n], 'Btime'] = x

or

df.iloc[n, df.columns.get_loc('Btime')] = x

The latter method is a bit faster, because df.loc has to convert the row and column labels to positional indices, so there is a little less conversion necessary if you use df.iloc instead.


df['Btime'].iloc[0] = x works, but is not recommended:

Although this works, it is taking advantage of the way DataFrames are currently implemented. There is no guarantee that Pandas has to work this way in the future. In particular, it is taking advantage of the fact that (currently) df['Btime'] always returns a view (not a copy) so df['Btime'].iloc[n] = x can be used to assign a new value at the nth location of the Btime column of df.

Since Pandas makes no explicit guarantees about when indexers return a view versus a copy, assignments that use chained indexing generally always raise a SettingWithCopyWarning even though in this case the assignment succeeds in modifying df:

In [22]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])
In [24]: df['bar'] = 100
In [25]: df['bar'].iloc[0] = 99
/home/unutbu/data/binky/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)

In [26]: df
Out[26]: 
  foo  bar
0   A   99  <-- assignment succeeded
2   B  100
1   C  100

df.iloc[0]['Btime'] = x does not work:

In contrast, assignment with df.iloc[0]['bar'] = 123 does not work because df.iloc[0] is returning a copy:

In [66]: df.iloc[0]['bar'] = 123
/home/unutbu/data/binky/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [67]: df
Out[67]: 
  foo  bar
0   A   99  <-- assignment failed
2   B  100
1   C  100

Warning: I had previously suggested df_test.ix[i, 'Btime']. But this is not guaranteed to give you the ith value since ix tries to index by label before trying to index by position. So if the DataFrame has an integer index which is not in sorted order starting at 0, then using ix[i] will return the row labeled i rather than the ith row. For example,

In [1]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])

In [2]: df
Out[2]: 
  foo
0   A
2   B
1   C

In [4]: df.ix[1, 'foo']
Out[4]: 'C'

回答 1

请注意,@ unutbu的答案是正确的,直到您想将值设置为新值,否则如果您的数据框是视图,则该答案将不起作用。

In [4]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])
In [5]: df['bar'] = 100
In [6]: df['bar'].iloc[0] = 99
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.16.0_19_g8d2818e-py2.7-macosx-10.9-x86_64.egg/pandas/core/indexing.py:118: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)

可以同时在设置和获取上使用的另一种方法是:

In [7]: df.loc[df.index[0], 'foo']
Out[7]: 'A'
In [8]: df.loc[df.index[0], 'bar'] = 99
In [9]: df
Out[9]:
  foo  bar
0   A   99
2   B  100
1   C  100

Note that the answer from @unutbu will be correct until you want to set the value to something new, then it will not work if your dataframe is a view.

In [4]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])
In [5]: df['bar'] = 100
In [6]: df['bar'].iloc[0] = 99
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.16.0_19_g8d2818e-py2.7-macosx-10.9-x86_64.egg/pandas/core/indexing.py:118: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)

Another approach that will consistently work with both setting and getting is:

In [7]: df.loc[df.index[0], 'foo']
Out[7]: 'A'
In [8]: df.loc[df.index[0], 'bar'] = 99
In [9]: df
Out[9]:
  foo  bar
0   A   99
2   B  100
1   C  100

回答 2

另一种方法是:

first_value = df['Btime'].values[0]

这种方式似乎比使用更快.iloc

In [1]: %timeit -n 1000 df['Btime'].values[20]
5.82 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [2]: %timeit -n 1000 df['Btime'].iloc[20]
29.2 µs ± 1.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Another way to do this:

first_value = df['Btime'].values[0]

This way seems to be faster than using .iloc:

In [1]: %timeit -n 1000 df['Btime'].values[20]
5.82 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [2]: %timeit -n 1000 df['Btime'].iloc[20]
29.2 µs ± 1.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

回答 3

  1. df.iloc[0].head(1) -仅从整个第一行开始的第一个数据集。
  2. df.iloc[0] -整个列的第一行。
  1. df.iloc[0].head(1) – First data set only from entire first row.
  2. df.iloc[0] – Entire First row in column.

回答 4

通常,如果您想从J列中获取前N行,最好的方法是:pandas dataframe

data = dataframe[0:N][:,J]

In a general way, if you want to pick up the first N rows from the J column from pandas dataframe the best way to do this is:

data = dataframe[0:N][:,J]

回答 5

为了从列“ test”和第1行获取例如值,它的工作原理如下

df[['test']].values[0][0]

因为只df[['test']].values[0]给一个数组

To get e.g the value from column ‘test’ and row 1 it works like

df[['test']].values[0][0]

as only df[['test']].values[0] gives back a array


回答 6

获取第一行并保留索引的另一种方法:

x = df.first('d') # Returns the first day. '3d' gives first three days.

Another way of getting the first row and preserving the index:

x = df.first('d') # Returns the first day. '3d' gives first three days.

PEP8的E128:连续行缩进不足以实现视觉缩进是什么?

问题:PEP8的E128:连续行缩进不足以实现视觉缩进是什么?

刚刚用Sublime Text(使用Sublime Linter)打开了一个文件,并注意到了一个我以前从未见过的PEP8格式错误。这是文本:

urlpatterns = patterns('',
    url(r'^$', listing, name='investment-listing'),
)

它标记了第二个参数,即开始的行 url(...)

我本来打算在ST2中禁用此检查,但是在忽略它之前,我想知道自己在做错什么。你永远不会知道,如果它看起来很重要,我什至可以改变我的方式:)

Just opened a file with Sublime Text (with Sublime Linter) and noticed a PEP8 formatting error that I’d never seen before. Here’s the text:

urlpatterns = patterns('',
    url(r'^$', listing, name='investment-listing'),
)

It’s flagging the second argument, the line that starts url(...)

I was about to disable this check in ST2 but I’d like to know what I’m doing wrong before I ignore it. You never know, if it seems important I might even change my ways :)


回答 0

如果在第一行上放置任何内容,PEP-8建议您在括号内缩进一行,因此应该在括号内缩进:

urlpatterns = patterns('',
                       url(r'^$', listing, name='investment-listing'))

或不将任何参数放在起始行上,然后缩进一个统一级别:

urlpatterns = patterns(
    '',
    url(r'^$', listing, name='investment-listing'),
)

urlpatterns = patterns(
    '', url(r'^$', listing, name='investment-listing'))

我建议您通读PEP-8-您可以浏览其中的很多内容,而且与某些技术性更高的PEP相比,它很容易理解。

PEP-8 recommends you indent lines to the opening parentheses if you put anything on the first line, so it should either be indenting to the opening bracket:

urlpatterns = patterns('',
                       url(r'^$', listing, name='investment-listing'))

or not putting any arguments on the starting line, then indenting to a uniform level:

urlpatterns = patterns(
    '',
    url(r'^$', listing, name='investment-listing'),
)

urlpatterns = patterns(
    '', url(r'^$', listing, name='investment-listing'))

I suggest taking a read through PEP-8 – you can skim through a lot of it, and it’s pretty easy to understand, unlike some of the more technical PEPs.


回答 1

对于这样的语句(由PyCharm自动格式化)也是如此:

    return combine_sample_generators(sample_generators['train']), \
           combine_sample_generators(sample_generators['dev']), \
           combine_sample_generators(sample_generators['test'])

它将发出相同的样式警告。为了摆脱它,我不得不将其重写为:

    return \
        combine_sample_generators(sample_generators['train']), \
        combine_sample_generators(sample_generators['dev']), \
        combine_sample_generators(sample_generators['test'])

This goes also for statements like this (auto-formatted by PyCharm):

    return combine_sample_generators(sample_generators['train']), \
           combine_sample_generators(sample_generators['dev']), \
           combine_sample_generators(sample_generators['test'])

Which will give the same style-warning. In order to get rid of it I had to rewrite it to:

    return \
        combine_sample_generators(sample_generators['train']), \
        combine_sample_generators(sample_generators['dev']), \
        combine_sample_generators(sample_generators['test'])

将字符串转换为有效的文件名?

问题:将字符串转换为有效的文件名?

我有一个要用作文件名的字符串,因此我想使用Python删除文件名中不允许的所有字符。

我宁愿严格一点,所以假设我只保留字母,数字和一小部分其他字符,例如"_-.() "。什么是最优雅的解决方案?

文件名在多个操作系统(Windows,Linux和Mac OS)上必须有效-这是我库中的MP3文件,歌曲名作为文件名,并且在3台计算机之间共享和备份。

I have a string that I want to use as a filename, so I want to remove all characters that wouldn’t be allowed in filenames, using Python.

I’d rather be strict than otherwise, so let’s say I want to retain only letters, digits, and a small set of other characters like "_-.() ". What’s the most elegant solution?

The filename needs to be valid on multiple operating systems (Windows, Linux and Mac OS) – it’s an MP3 file in my library with the song title as the filename, and is shared and backed up between 3 machines.


回答 0

您可以查看Django框架,了解它们如何从任意文本创建“子弹”。slug是URL和文件名友好的。

Django文本工具定义了一个函数,slugify()这可能是此类事情的黄金标准。本质上,它们的代码如下。

def slugify(value):
    """
    Normalizes string, converts to lowercase, removes non-alpha characters,
    and converts spaces to hyphens.
    """
    import unicodedata
    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore')
    value = unicode(re.sub('[^\w\s-]', '', value).strip().lower())
    value = unicode(re.sub('[-\s]+', '-', value))
    # ...
    return value

还有更多,但我不予赘述,因为它不解决节段化,而是逃脱。

You can look at the Django framework for how they create a “slug” from arbitrary text. A slug is URL- and filename- friendly.

The Django text utils define a function, slugify(), that’s probably the gold standard for this kind of thing. Essentially, their code is the following.

def slugify(value):
    """
    Normalizes string, converts to lowercase, removes non-alpha characters,
    and converts spaces to hyphens.
    """
    import unicodedata
    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore')
    value = unicode(re.sub('[^\w\s-]', '', value).strip().lower())
    value = unicode(re.sub('[-\s]+', '-', value))
    # ...
    return value

There’s more, but I left it out, since it doesn’t address slugification, but escaping.


回答 1

如果对文件格式或非法非法字符(例如“ ..”)的组合没有限制,则这种白名单方法(即仅允许有效字符中存在的字符)将起作用,例如,您说的是将允许一个名为“ .txt”的文件名,我认为在Windows上无效。由于这是最简单的方法,因此我会尝试从valid_chars中删除空格,并在出现错误的情况下添加一个已知的有效字符串,因此,任何其他方法都必须知道允许在何处应对Windows文件命名限制,因此复杂得多。

>>> import string
>>> valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)
>>> valid_chars
'-_.() abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
>>> filename = "This Is a (valid) - filename%$&$ .txt"
>>> ''.join(c for c in filename if c in valid_chars)
'This Is a (valid) - filename .txt'

This whitelist approach (ie, allowing only the chars present in valid_chars) will work if there aren’t limits on the formatting of the files or combination of valid chars that are illegal (like “..”), for example, what you say would allow a filename named ” . txt” which I think is not valid on Windows. As this is the most simple approach I’d try to remove whitespace from the valid_chars and prepend a known valid string in case of error, any other approach will have to know about what is allowed where to cope with Windows file naming limitations and thus be a lot more complex.

>>> import string
>>> valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)
>>> valid_chars
'-_.() abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
>>> filename = "This Is a (valid) - filename%$&$ .txt"
>>> ''.join(c for c in filename if c in valid_chars)
'This Is a (valid) - filename .txt'

回答 2

您可以将列表理解与字符串方法一起使用。

>>> s
'foo-bar#baz?qux@127/\\9]'
>>> "".join(x for x in s if x.isalnum())
'foobarbazqux1279'

You can use list comprehension together with the string methods.

>>> s
'foo-bar#baz?qux@127/\\9]'
>>> "".join(x for x in s if x.isalnum())
'foobarbazqux1279'

回答 3

使用字符串作为文件名的原因是什么?如果不是人类可读性的因素,我将使用base64模块,该模块可以生成文件系统安全的字符串。它不是可读的,但您不必处理碰撞并且它是可逆的。

import base64
file_name_string = base64.urlsafe_b64encode(your_string)

更新:根据马修评论更改。

What is the reason to use the strings as file names? If human readability is not a factor I would go with base64 module which can produce file system safe strings. It won’t be readable but you won’t have to deal with collisions and it is reversible.

import base64
file_name_string = base64.urlsafe_b64encode(your_string)

Update: Changed based on Matthew comment.


回答 4

只是为了使事情更加复杂,不能保证仅通过删除无效字符就可以获得有效的文件名。由于不同文件名上允许的字符不同,因此保守的方法可能最终将有效名称变成无效名称。对于以下情况,您可能需要添加特殊处理:

  • 该字符串是所有无效字符(留空字符串)

  • 您最终得到一个具有特殊含义的字符串,例如“。”。要么 ”..”

  • 在Windows上,某些设备名称被保留。例如,您无法创建名为“ nul”,“ nul.txt”(或实际上为nul.anything)的文件。保留名称为:

    CON,PRN,AUX,NUL,COM1,COM2,COM3,COM4,COM5,COM6,COM7,COM8,COM9,LPT1,LPT2,LPT3,LPT4,LPT5,LPT6,LPT7,LPT8和LPT9

您可以通过在文件名前添加一些字符串(它们永远不会导致这些情况之一)并去除无效字符来解决这些问题。

Just to further complicate things, you are not guaranteed to get a valid filename just by removing invalid characters. Since allowed characters differ on different filenames, a conservative approach could end up turning a valid name into an invalid one. You may want to add special handling for the cases where:

  • The string is all invalid characters (leaving you with an empty string)

  • You end up with a string with a special meaning, eg “.” or “..”

  • On windows, certain device names are reserved. For instance, you can’t create a file named “nul”, “nul.txt” (or nul.anything in fact) The reserved names are:

    CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9

You can probably work around these issues by prepending some string to the filenames that can never result in one of these cases, and stripping invalid characters.


回答 5

Github上有一个不错的项目python-slugify

安装:

pip install python-slugify

然后使用:

>>> from slugify import slugify
>>> txt = "This\ is/ a%#$ test ---"
>>> slugify(txt)
'this-is-a-test'

There is a nice project on Github called python-slugify:

Install:

pip install python-slugify

Then use:

>>> from slugify import slugify
>>> txt = "This\ is/ a%#$ test ---"
>>> slugify(txt)
'this-is-a-test'

回答 6

就像S.Lott回答的一样,您可以查看Django框架,了解它们如何将字符串转换为有效的文件名。

在utils / text.py中找到了最新的更新版本,并定义了“ get_valid_filename”,如下所示:

def get_valid_filename(s):
    s = str(s).strip().replace(' ', '_')
    return re.sub(r'(?u)[^-\w.]', '', s)

(参见https://github.com/django/django/blob/master/django/utils/text.py

Just like S.Lott answered, you can look at the Django Framework for how they convert a string to a valid filename.

The most recent and updated version is found in utils/text.py, and defines “get_valid_filename”, which is as follows:

def get_valid_filename(s):
    s = str(s).strip().replace(' ', '_')
    return re.sub(r'(?u)[^-\w.]', '', s)

( See https://github.com/django/django/blob/master/django/utils/text.py )


回答 7

这是我最终使用的解决方案:

import unicodedata

validFilenameChars = "-_.() %s%s" % (string.ascii_letters, string.digits)

def removeDisallowedFilenameChars(filename):
    cleanedFilename = unicodedata.normalize('NFKD', filename).encode('ASCII', 'ignore')
    return ''.join(c for c in cleanedFilename if c in validFilenameChars)

unicodedata.normalize调用将重音字符替换为未重音的等效字符,这比简单地将它们剥离要好。之后,将删除所有不允许的字符。

我的解决方案没有在已知字符串前添加前缀,以避免可能出现的不允许的文件名,因为我知道在给定特定文件名格式的情况下它们不会出现。需要一个更通用的解决方案。

This is the solution I ultimately used:

import unicodedata

validFilenameChars = "-_.() %s%s" % (string.ascii_letters, string.digits)

def removeDisallowedFilenameChars(filename):
    cleanedFilename = unicodedata.normalize('NFKD', filename).encode('ASCII', 'ignore')
    return ''.join(c for c in cleanedFilename if c in validFilenameChars)

The unicodedata.normalize call replaces accented characters with the unaccented equivalent, which is better than simply stripping them out. After that all disallowed characters are removed.

My solution doesn’t prepend a known string to avoid possible disallowed filenames, because I know they can’t occur given my particular filename format. A more general solution would need to do so.


回答 8

请记住,除了Unix系统上的文件名外,实际上没有任何限制。

  • 它可能不包含\ 0
  • 它可能不包含/

其他一切都是公平的游戏。

$ touch”
>甚至多行
>哈哈
> ^ [[31m红色^ [[0m
>邪恶”
$ ls -la 
-rw-r--r-- Nov 11 23:39?even multiline?haha ?? [31m red?[0m?evil
$ ls -lab
-rw-r--r-- 11月17日23:39 \ neven \ multiline \ nhaha \ n \ 033 [31m \ red \ \ 033 [0m \ nevil
$ perl -e'for my $ i(glob(q {./* even *})){print $ i; }'
./
甚至多行
哈哈
 红色 
邪恶

是的,我只是将ANSI颜色代码存储在文件名中,并使它们生效。

为了娱乐,请将BEL字符放在目录名称中,并观看将CD放入其中时的乐趣;)

Keep in mind, there are actually no restrictions on filenames on Unix systems other than

  • It may not contain \0
  • It may not contain /

Everything else is fair game.

$ touch "
> even multiline
> haha
> ^[[31m red ^[[0m
> evil"
$ ls -la 
-rw-r--r--       0 Nov 17 23:39 ?even multiline?haha??[31m red ?[0m?evil
$ ls -lab
-rw-r--r--       0 Nov 17 23:39 \neven\ multiline\nhaha\n\033[31m\ red\ \033[0m\nevil
$ perl -e 'for my $i ( glob(q{./*even*}) ){ print $i; } '
./
even multiline
haha
 red 
evil

Yes, i just stored ANSI Colour Codes in a file name and had them take effect.

For entertainment, put a BEL character in a directory name and watch the fun that ensues when you CD into it ;)


回答 9

一行:

valid_file_name = re.sub('[^\w_.)( -]', '', any_string)

您还可以添加’_’字符以使其更具可读性(例如,在替换斜杠的情况下)

In one line:

valid_file_name = re.sub('[^\w_.)( -]', '', any_string)

you can also put ‘_’ character to make it more readable (in case of replacing slashs, for example)


回答 10

您可以使用re.sub()方法替换非“类似文件”的任何内容。但实际上,每个字符都可以有效;因此,没有预构建的功能(我相信)可以完成它。

import re

str = "File!name?.txt"
f = open(os.path.join("/tmp", re.sub('[^-a-zA-Z0-9_.() ]+', '', str))

会导致/tmp/filename.txt的文件句柄。

You could use the re.sub() method to replace anything not “filelike”. But in effect, every character could be valid; so there are no prebuilt functions (I believe), to get it done.

import re

str = "File!name?.txt"
f = open(os.path.join("/tmp", re.sub('[^-a-zA-Z0-9_.() ]+', '', str))

Would result in a filehandle to /tmp/filename.txt.


回答 11

>>> import string
>>> safechars = bytearray(('_-.()' + string.digits + string.ascii_letters).encode())
>>> allchars = bytearray(range(0x100))
>>> deletechars = bytearray(set(allchars) - set(safechars))
>>> filename = u'#ab\xa0c.$%.txt'
>>> safe_filename = filename.encode('ascii', 'ignore').translate(None, deletechars).decode()
>>> safe_filename
'abc..txt'

它不处理空字符串,特殊文件名(“ nul”,“ con”等)。

>>> import string
>>> safechars = bytearray(('_-.()' + string.digits + string.ascii_letters).encode())
>>> allchars = bytearray(range(0x100))
>>> deletechars = bytearray(set(allchars) - set(safechars))
>>> filename = u'#ab\xa0c.$%.txt'
>>> safe_filename = filename.encode('ascii', 'ignore').translate(None, deletechars).decode()
>>> safe_filename
'abc..txt'

It doesn’t handle empty strings, special filenames (‘nul’, ‘con’, etc).


回答 12

虽然您必须要小心。如果您只看常规语言,则在介绍中并没有明确指出。如果仅使用ascii字符对某些单词进行消毒,则某些单词可能变得毫无意义,或变得另一种含义。

假设您有“Forêtpoésie”(森林诗歌),您的卫生处理可能会给您带来“ fort-posie”(强+毫无意义的东西)

更糟糕的是,如果您不得不处理汉字。

您的系统“下北沢”可能最终会执行“ —”,这注定会在一段时间后失败,并且不是很有帮助。因此,如果您只处理文件,我建议您将它们称为您控制的通用链,或者保持其原样。对于URI,大致相同。

Though you have to be careful. It is not clearly said in your intro, if you are looking only at latine language. Some words can become meaningless or another meaning if you sanitize them with ascii characters only.

imagine you have “forêt poésie” (forest poetry), your sanitization might give “fort-posie” (strong + something meaningless)

Worse if you have to deal with chinese characters.

“下北沢” your system might end up doing “—” which is doomed to fail after a while and not very helpful. So if you deal with only files I would encourage to either call them a generic chain that you control or to keep the characters as it is. For URIs, about the same.


回答 13

为什么不只用try / except包裹“ osopen”,然后让底层的OS整理出文件是否有效?

这似乎工作量少得多,并且无论使用哪种操作系统,这都是有效的。

Why not just wrap the “osopen” with a try/except and let the underlying OS sort out whether the file is valid?

This seems like much less work and is valid no matter which OS you use.


回答 14

其他注释尚未解决的另一个问题是空字符串,这显然不是有效的文件名。您还可以通过剥离太多字符而最终得到一个空字符串。

对于Windows保留的文件名和带点的问题,对“如何从任意用户输入中对有效文件名进行规范化”这个问题的最安全答案是什么?是“什至不用费劲尝试”:如果您可以找到其他避免它的方法(例如,使用数据库中的整数主键作为文件名),则可以这样做。

如果需要,并且确实需要允许空格和“。” 对于文件扩展名,请尝试以下操作:

import re
badchars= re.compile(r'[^A-Za-z0-9_. ]+|^\.|\.$|^ | $|^$')
badnames= re.compile(r'(aux|com[1-9]|con|lpt[1-9]|prn)(\.|$)')

def makeName(s):
    name= badchars.sub('_', s)
    if badnames.match(name):
        name= '_'+name
    return name

即使是这种情况也无法保证,尤其是在意外的OS上,例如RISC OS讨厌空格并使用’。作为目录分隔符。

Another issue that the other comments haven’t addressed yet is the empty string, which is obviously not a valid filename. You can also end up with an empty string from stripping too many characters.

What with the Windows reserved filenames and issues with dots, the safest answer to the question “how do I normalise a valid filename from arbitrary user input?” is “don’t even bother try”: if you can find any other way to avoid it (eg. using integer primary keys from a database as filenames), do that.

If you must, and you really need to allow spaces and ‘.’ for file extensions as part of the name, try something like:

import re
badchars= re.compile(r'[^A-Za-z0-9_. ]+|^\.|\.$|^ | $|^$')
badnames= re.compile(r'(aux|com[1-9]|con|lpt[1-9]|prn)(\.|$)')

def makeName(s):
    name= badchars.sub('_', s)
    if badnames.match(name):
        name= '_'+name
    return name

Even this can’t be guaranteed right especially on unexpected OSs — for example RISC OS hates spaces and uses ‘.’ as a directory separator.


回答 15

我喜欢这里的python-slugify方法,但是它也剥离了点,这是不希望的。所以我对其进行了优化,以便以这种方式将干净的文件名上传到s3:

pip install python-slugify

示例代码:

s = 'Very / Unsafe / file\nname hähä \n\r .txt'
clean_basename = slugify(os.path.splitext(s)[0])
clean_extension = slugify(os.path.splitext(s)[1][1:])
if clean_extension:
    clean_filename = '{}.{}'.format(clean_basename, clean_extension)
elif clean_basename:
    clean_filename = clean_basename
else:
    clean_filename = 'none' # only unclean characters

输出:

>>> clean_filename
'very-unsafe-file-name-haha.txt'

这是如此的故障安全,它适用于没有扩展名的文件名,甚至只适用于不安全字符的文件名(结果在none这里)。

I liked the python-slugify approach here but it was stripping dots also away which was not desired. So I optimized it for uploading a clean filename to s3 this way:

pip install python-slugify

Example code:

s = 'Very / Unsafe / file\nname hähä \n\r .txt'
clean_basename = slugify(os.path.splitext(s)[0])
clean_extension = slugify(os.path.splitext(s)[1][1:])
if clean_extension:
    clean_filename = '{}.{}'.format(clean_basename, clean_extension)
elif clean_basename:
    clean_filename = clean_basename
else:
    clean_filename = 'none' # only unclean characters

Output:

>>> clean_filename
'very-unsafe-file-name-haha.txt'

This is so failsafe, it works with filenames without extension and it even works for only unsafe characters file names (result is none here).


回答 16

为python 3.6修改的答案

import string
import unicodedata

validFilenameChars = "-_.() %s%s" % (string.ascii_letters, string.digits)
def removeDisallowedFilenameChars(filename):
    cleanedFilename = unicodedata.normalize('NFKD', filename).encode('ASCII', 'ignore')
    return ''.join(chr(c) for c in cleanedFilename if chr(c) in validFilenameChars)

Answer modified for python 3.6

import string
import unicodedata

validFilenameChars = "-_.() %s%s" % (string.ascii_letters, string.digits)
def removeDisallowedFilenameChars(filename):
    cleanedFilename = unicodedata.normalize('NFKD', filename).encode('ASCII', 'ignore')
    return ''.join(chr(c) for c in cleanedFilename if chr(c) in validFilenameChars)

回答 17

我知道有很多答案,但是它们大多依赖于正则表达式或外部模块,因此我想提出自己的答案。一个纯python函数,不需要外部模块,不使用正则表达式。我的方法不是清除无效字符,而仅允许有效字符。

def normalizefilename(fn):
    validchars = "-_.() "
    out = ""
    for c in fn:
      if str.isalpha(c) or str.isdigit(c) or (c in validchars):
        out += c
      else:
        out += "_"
    return out    

如果愿意,可以将自己的有效字符添加到 validchars变量开头,例如英文字母中不存在的国家字母。这是您可能想要或不想要的:某些未在UTF-8上运行的文件系统可能仍存在非ASCII字符问题。

此函数用于测试单个文件名的有效性,因此它将使用_替换路径分隔符,因为它们是无效字符。如果要添加它,修改if包含OS路径分隔符很简单。

I realise there are many answers but they mostly rely on regular expressions or external modules, so I’d like to throw in my own answer. A pure python function, no external module needed, no regular expression used. My approach is not to clean invalid chars, but to only allow valid ones.

def normalizefilename(fn):
    validchars = "-_.() "
    out = ""
    for c in fn:
      if str.isalpha(c) or str.isdigit(c) or (c in validchars):
        out += c
      else:
        out += "_"
    return out    

if you like, you can add your own valid chars to the validchars variable at the beginning, such as your national letters that don’t exist in English alphabet. This is something you may or may not want: some file systems that don’t run on UTF-8 might still have problems with non-ASCII chars.

This function is to test for a single file name validity, so it will replace path separators with _ considering them invalid chars. If you want to add that, it is trivial to modify the if to include os path separator.


回答 18

这些解决方案大多数都不起作用。

‘/ hello / world’->’helloworld’

‘/ helloworld’/->’helloworld’

通常,这不是您想要的,例如,您要为每个链接保存html,而要覆盖其他网页的html。

我腌一个字典,如:

{'helloworld': 
    (
    {'/hello/world': 'helloworld', '/helloworld/': 'helloworld1'},
    2)
    }

2表示应该附加到下一个文件名的数字。

每次从字典中查找文件名。如果不存在,我创建一个新的,如果需要的话,添加最大数量。

Most of these solutions don’t work.

‘/hello/world’ -> ‘helloworld’

‘/helloworld’/ -> ‘helloworld’

This isn’t what you want generally, say you are saving the html for each link, you’re going to overwrite the html for a different webpage.

I pickle a dict such as:

{'helloworld': 
    (
    {'/hello/world': 'helloworld', '/helloworld/': 'helloworld1'},
    2)
    }

2 represents the number that should be appended to the next filename.

I look up the filename each time from the dict. If it’s not there, I create a new one, appending the max number if needed.


回答 19

不完全是OP的要求,但这是我使用的,因为我需要唯一且可逆的转换:

# p3 code
def safePath (url):
    return ''.join(map(lambda ch: chr(ch) if ch in safePath.chars else '%%%02x' % ch, url.encode('utf-8')))
safePath.chars = set(map(lambda x: ord(x), '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+-_ .'))

至少从sysadmin的角度来看,结果是“有点”可读的。

Not exactly what OP was asking for but this is what I use because I need unique and reversible conversions:

# p3 code
def safePath (url):
    return ''.join(map(lambda ch: chr(ch) if ch in safePath.chars else '%%%02x' % ch, url.encode('utf-8')))
safePath.chars = set(map(lambda x: ord(x), '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+-_ .'))

Result is “somewhat” readable, at least from a sysadmin point of view.


回答 20

如果您不介意安装软件包,这将非常有用:https : //pypi.org/project/pathvalidate/

来自https://pypi.org/project/pathvalidate/#sanitize-a-filename

from pathvalidate import sanitize_filename

fname = "fi:l*e/p\"a?t>h|.t<xt"
print(f"{fname} -> {sanitize_filename(fname)}\n")
fname = "\0_a*b:c<d>e%f/(g)h+i_0.txt"
print(f"{fname} -> {sanitize_filename(fname)}\n")

输出量

fi:l*e/p"a?t>h|.t<xt -> filepath.txt
_a*b:c<d>e%f/(g)h+i_0.txt -> _abcde%f(g)h+i_0.txt

If you don’t mind installing a package, this should be useful: https://pypi.org/project/pathvalidate/

From https://pypi.org/project/pathvalidate/#sanitize-a-filename:

from pathvalidate import sanitize_filename

fname = "fi:l*e/p\"a?t>h|.t<xt"
print(f"{fname} -> {sanitize_filename(fname)}\n")
fname = "\0_a*b:c<d>e%f/(g)h+i_0.txt"
print(f"{fname} -> {sanitize_filename(fname)}\n")

Output

fi:l*e/p"a?t>h|.t<xt -> filepath.txt
_a*b:c<d>e%f/(g)h+i_0.txt -> _abcde%f(g)h+i_0.txt

回答 21

我确定这不是一个很好的答案,因为它修改了循环的字符串,但似乎可以正常工作:

import string
for chr in your_string:
 if chr == ' ':
   your_string = your_string.replace(' ', '_')
 elif chr not in string.ascii_letters or chr not in string.digits:
    your_string = your_string.replace(chr, '')

I’m sure this isn’t a great answer, since it modifies the string it’s looping over, but it seems to work alright:

import string
for chr in your_string:
 if chr == ' ':
   your_string = your_string.replace(' ', '_')
 elif chr not in string.ascii_letters or chr not in string.digits:
    your_string = your_string.replace(chr, '')

回答 22

更新

在这6年的历史中,所有链接都无法修复。

另外,我也不会再这样做了,只base64编码或删除不安全的字符。Python 3示例:

import re
t = re.compile("[a-zA-Z0-9.,_-]")
unsafe = "abc∂éåß®∆˚˙©¬ñ√ƒµ©∆∫ø"
safe = [ch for ch in unsafe if t.match(ch)]
# => 'abc'

使用base64可以进行编码和解码,因此可以再次检索原始文件名。

但是根据使用情况,最好生成一个随机文件名并将元数据存储在单独的文件或数据库中。

from random import choice
from string import ascii_lowercase, ascii_uppercase, digits
allowed_chr = ascii_lowercase + ascii_uppercase + digits

safe = ''.join([choice(allowed_chr) for _ in range(16)])
# => 'CYQ4JDKE9JfcRzAZ'

原始链接答案

bobcat项目包含一个执行此操作的python模块。

它并不完全健壮,请参阅此帖子和此回复

因此,如前所述:base64如果可读性无关紧要,则编码可能是一个更好的主意。

UPDATE

All links broken beyond repair in this 6 year old answer.

Also, I also wouldn’t do it this way anymore, just base64 encode or drop unsafe chars. Python 3 example:

import re
t = re.compile("[a-zA-Z0-9.,_-]")
unsafe = "abc∂éåß®∆˚˙©¬ñ√ƒµ©∆∫ø"
safe = [ch for ch in unsafe if t.match(ch)]
# => 'abc'

With base64 you can encode and decode, so you can retrieve the original filename again.

But depending on the use case you might be better off generating a random filename and storing the metadata in separate file or DB.

from random import choice
from string import ascii_lowercase, ascii_uppercase, digits
allowed_chr = ascii_lowercase + ascii_uppercase + digits

safe = ''.join([choice(allowed_chr) for _ in range(16)])
# => 'CYQ4JDKE9JfcRzAZ'

ORIGINAL LINKROTTEN ANSWER:

The bobcat project contains a python module that does just this.

It’s not completely robust, see this post and this reply.

So, as noted: base64 encoding is probably a better idea if readability doesn’t matter.


将Python字典转换为数据框

问题:将Python字典转换为数据框

我有如下的Python字典:

{u'2012-06-08': 388,
 u'2012-06-09': 388,
 u'2012-06-10': 388,
 u'2012-06-11': 389,
 u'2012-06-12': 389,
 u'2012-06-13': 389,
 u'2012-06-14': 389,
 u'2012-06-15': 389,
 u'2012-06-16': 389,
 u'2012-06-17': 389,
 u'2012-06-18': 390,
 u'2012-06-19': 390,
 u'2012-06-20': 390,
 u'2012-06-21': 390,
 u'2012-06-22': 390,
 u'2012-06-23': 390,
 u'2012-06-24': 390,
 u'2012-06-25': 391,
 u'2012-06-26': 391,
 u'2012-06-27': 391,
 u'2012-06-28': 391,
 u'2012-06-29': 391,
 u'2012-06-30': 391,
 u'2012-07-01': 391,
 u'2012-07-02': 392,
 u'2012-07-03': 392,
 u'2012-07-04': 392,
 u'2012-07-05': 392,
 u'2012-07-06': 392}

键是Unicode日期,值是整数。我想通过将日期及其对应的值作为两个单独的列将其转换为pandas数据框。示例:col1:日期col2:DateValue(日期仍为Unicode,日期值仍为整数)

     Date         DateValue
0    2012-07-01    391
1    2012-07-02    392
2    2012-07-03    392
.    2012-07-04    392
.    ...           ...
.    ...           ...

对此方向的任何帮助将不胜感激。我找不到有关熊猫文档的资源来帮助我。

我知道一个解决方案可能是将此dict中的每个键值对转换为dict,以便整个结构成为dict的dict,然后我们可以将每一行分别添加到数据帧中。但我想知道是否有更简单的方法和更直接的方法来执行此操作。

到目前为止,我已经尝试将dict转换为series对象,但这似乎并不能维持各列之间的关系:

s  = Series(my_dict,index=my_dict.keys())

I have a Python dictionary like the following:

{u'2012-06-08': 388,
 u'2012-06-09': 388,
 u'2012-06-10': 388,
 u'2012-06-11': 389,
 u'2012-06-12': 389,
 u'2012-06-13': 389,
 u'2012-06-14': 389,
 u'2012-06-15': 389,
 u'2012-06-16': 389,
 u'2012-06-17': 389,
 u'2012-06-18': 390,
 u'2012-06-19': 390,
 u'2012-06-20': 390,
 u'2012-06-21': 390,
 u'2012-06-22': 390,
 u'2012-06-23': 390,
 u'2012-06-24': 390,
 u'2012-06-25': 391,
 u'2012-06-26': 391,
 u'2012-06-27': 391,
 u'2012-06-28': 391,
 u'2012-06-29': 391,
 u'2012-06-30': 391,
 u'2012-07-01': 391,
 u'2012-07-02': 392,
 u'2012-07-03': 392,
 u'2012-07-04': 392,
 u'2012-07-05': 392,
 u'2012-07-06': 392}

The keys are Unicode dates and the values are integers. I would like to convert this into a pandas dataframe by having the dates and their corresponding values as two separate columns. Example: col1: Dates col2: DateValue (the dates are still Unicode and datevalues are still integers)

     Date         DateValue
0    2012-07-01    391
1    2012-07-02    392
2    2012-07-03    392
.    2012-07-04    392
.    ...           ...
.    ...           ...

Any help in this direction would be much appreciated. I am unable to find resources on the pandas docs to help me with this.

I know one solution might be to convert each key-value pair in this dict, into a dict so the entire structure becomes a dict of dicts, and then we can add each row individually to the dataframe. But I want to know if there is an easier way and a more direct way to do this.

So far I have tried converting the dict into a series object but this doesn’t seem to maintain the relationship between the columns:

s  = Series(my_dict,index=my_dict.keys())

回答 0

这里的错误是因为使用标量值调用DataFrame构造函数(它期望值是列表/字典/ …,即具有多个列):

pd.DataFrame(d)
ValueError: If using all scalar values, you must must pass an index

您可以从字典中获取项目(即键值对):

In [11]: pd.DataFrame(d.items())  # or list(d.items()) in python 3
Out[11]:
             0    1
0   2012-07-02  392
1   2012-07-06  392
2   2012-06-29  391
3   2012-06-28  391
...

In [12]: pd.DataFrame(d.items(), columns=['Date', 'DateValue'])
Out[12]:
          Date  DateValue
0   2012-07-02        392
1   2012-07-06        392
2   2012-06-29        391

但是我认为传递Series构造函数更有意义:

In [21]: s = pd.Series(d, name='DateValue')
Out[21]:
2012-06-08    388
2012-06-09    388
2012-06-10    388

In [22]: s.index.name = 'Date'

In [23]: s.reset_index()
Out[23]:
          Date  DateValue
0   2012-06-08        388
1   2012-06-09        388
2   2012-06-10        388

The error here, is since calling the DataFrame constructor with scalar values (where it expects values to be a list/dict/… i.e. have multiple columns):

pd.DataFrame(d)
ValueError: If using all scalar values, you must must pass an index

You could take the items from the dictionary (i.e. the key-value pairs):

In [11]: pd.DataFrame(d.items())  # or list(d.items()) in python 3
Out[11]:
             0    1
0   2012-07-02  392
1   2012-07-06  392
2   2012-06-29  391
3   2012-06-28  391
...

In [12]: pd.DataFrame(d.items(), columns=['Date', 'DateValue'])
Out[12]:
          Date  DateValue
0   2012-07-02        392
1   2012-07-06        392
2   2012-06-29        391

But I think it makes more sense to pass the Series constructor:

In [21]: s = pd.Series(d, name='DateValue')
Out[21]:
2012-06-08    388
2012-06-09    388
2012-06-10    388

In [22]: s.index.name = 'Date'

In [23]: s.reset_index()
Out[23]:
          Date  DateValue
0   2012-06-08        388
1   2012-06-09        388
2   2012-06-10        388

回答 1

将字典转换为pandas数据框时,您希望键是该数据框的列,而值是行值,则可以像这样在字典周围放置方括号:

>>> dict_ = {'key 1': 'value 1', 'key 2': 'value 2', 'key 3': 'value 3'}
>>> pd.DataFrame([dict_])

    key 1     key 2     key 3
0   value 1   value 2   value 3

它免除了我的头疼,所以我希望它可以帮助某个人!

编辑:在pandas docsdata中,DataFrame构造函数中参数的一个选项是词典列表。在这里,我们传递的列表中有一个字典。

When converting a dictionary into a pandas dataframe where you want the keys to be the columns of said dataframe and the values to be the row values, you can do simply put brackets around the dictionary like this:

>>> dict_ = {'key 1': 'value 1', 'key 2': 'value 2', 'key 3': 'value 3'}
>>> pd.DataFrame([dict_])

    key 1     key 2     key 3
0   value 1   value 2   value 3

It’s saved me some headaches so I hope it helps someone out there!

EDIT: In the pandas docs one option for the data parameter in the DataFrame constructor is a list of dictionaries. Here we’re passing a list with one dictionary in it.


回答 2

如另一个答案所述,在pandas.DataFrame()此处直接使用将不会发挥您的作用。

你可以做的是使用pandas.DataFrame.from_dict具有orient='index'

In[7]: pandas.DataFrame.from_dict({u'2012-06-08': 388,
 u'2012-06-09': 388,
 u'2012-06-10': 388,
 u'2012-06-11': 389,
 u'2012-06-12': 389,
 .....
 u'2012-07-05': 392,
 u'2012-07-06': 392}, orient='index', columns=['foo'])
Out[7]: 
            foo
2012-06-08  388
2012-06-09  388
2012-06-10  388
2012-06-11  389
2012-06-12  389
........
2012-07-05  392
2012-07-06  392

As explained on another answer using pandas.DataFrame() directly here will not act as you think.

What you can do is use pandas.DataFrame.from_dict with orient='index':

In[7]: pandas.DataFrame.from_dict({u'2012-06-08': 388,
 u'2012-06-09': 388,
 u'2012-06-10': 388,
 u'2012-06-11': 389,
 u'2012-06-12': 389,
 .....
 u'2012-07-05': 392,
 u'2012-07-06': 392}, orient='index', columns=['foo'])
Out[7]: 
            foo
2012-06-08  388
2012-06-09  388
2012-06-10  388
2012-06-11  389
2012-06-12  389
........
2012-07-05  392
2012-07-06  392

回答 3

将字典的项目传递给DataFrame构造函数,并指定列名称。之后,解析Date列以获取Timestamp值。

注意python 2.x和3.x之间的区别:

在python 2.x中:

df = pd.DataFrame(data.items(), columns=['Date', 'DateValue'])
df['Date'] = pd.to_datetime(df['Date'])

在Python 3.x中:(需要一个附加的“列表”)

df = pd.DataFrame(list(data.items()), columns=['Date', 'DateValue'])
df['Date'] = pd.to_datetime(df['Date'])

Pass the items of the dictionary to the DataFrame constructor, and give the column names. After that parse the Date column to get Timestamp values.

Note the difference between python 2.x and 3.x:

In python 2.x:

df = pd.DataFrame(data.items(), columns=['Date', 'DateValue'])
df['Date'] = pd.to_datetime(df['Date'])

In Python 3.x: (requiring an additional ‘list’)

df = pd.DataFrame(list(data.items()), columns=['Date', 'DateValue'])
df['Date'] = pd.to_datetime(df['Date'])

回答 4

来自列表和字典的df

尤其是ps,我发现面向行的示例很有帮助;因为通常记录是如何在外部存储的。

https://pbpython.com/pandas-list-dict.html

df from lists and dictionaries

p.s. in particular, I’ve found Row-Oriented examples helpful; since often that how records are stored externally.

https://pbpython.com/pandas-list-dict.html


回答 5

熊猫具有内置功能,可将字典转换为数据帧。

pd.DataFrame.from_dict(dictionaryObject,orient =’index’)

对于您的数据,您可以如下进行转换:

import pandas as pd
your_dict={u'2012-06-08': 388,
 u'2012-06-09': 388,
 u'2012-06-10': 388,
 u'2012-06-11': 389,
 u'2012-06-12': 389,
 u'2012-06-13': 389,
 u'2012-06-14': 389,
 u'2012-06-15': 389,
 u'2012-06-16': 389,
 u'2012-06-17': 389,
 u'2012-06-18': 390,
 u'2012-06-19': 390,
 u'2012-06-20': 390,
 u'2012-06-21': 390,
 u'2012-06-22': 390,
 u'2012-06-23': 390,
 u'2012-06-24': 390,
 u'2012-06-25': 391,
 u'2012-06-26': 391,
 u'2012-06-27': 391,
 u'2012-06-28': 391,
 u'2012-06-29': 391,
 u'2012-06-30': 391,
 u'2012-07-01': 391,
 u'2012-07-02': 392,
 u'2012-07-03': 392,
 u'2012-07-04': 392,
 u'2012-07-05': 392,
 u'2012-07-06': 392}

your_df_from_dict=pd.DataFrame.from_dict(your_dict,orient='index')
print(your_df_from_dict)

Pandas have built-in function for conversion of dict to data frame.

pd.DataFrame.from_dict(dictionaryObject,orient=’index’)

For your data you can convert it like below:

import pandas as pd
your_dict={u'2012-06-08': 388,
 u'2012-06-09': 388,
 u'2012-06-10': 388,
 u'2012-06-11': 389,
 u'2012-06-12': 389,
 u'2012-06-13': 389,
 u'2012-06-14': 389,
 u'2012-06-15': 389,
 u'2012-06-16': 389,
 u'2012-06-17': 389,
 u'2012-06-18': 390,
 u'2012-06-19': 390,
 u'2012-06-20': 390,
 u'2012-06-21': 390,
 u'2012-06-22': 390,
 u'2012-06-23': 390,
 u'2012-06-24': 390,
 u'2012-06-25': 391,
 u'2012-06-26': 391,
 u'2012-06-27': 391,
 u'2012-06-28': 391,
 u'2012-06-29': 391,
 u'2012-06-30': 391,
 u'2012-07-01': 391,
 u'2012-07-02': 392,
 u'2012-07-03': 392,
 u'2012-07-04': 392,
 u'2012-07-05': 392,
 u'2012-07-06': 392}

your_df_from_dict=pd.DataFrame.from_dict(your_dict,orient='index')
print(your_df_from_dict)

回答 6

pd.DataFrame({'date' : dict_dates.keys() , 'date_value' : dict_dates.values() })
pd.DataFrame({'date' : dict_dates.keys() , 'date_value' : dict_dates.values() })

回答 7

您也可以只将字典的键和值传递给新的数据框,如下所示:

import pandas as pd

myDict = {<the_dict_from_your_example>]
df = pd.DataFrame()
df['Date'] = myDict.keys()
df['DateValue'] = myDict.values()

You can also just pass the keys and values of the dictionary to the new dataframe, like so:

import pandas as pd

myDict = {<the_dict_from_your_example>]
df = pd.DataFrame()
df['Date'] = myDict.keys()
df['DateValue'] = myDict.values()

回答 8

就我而言,我希望字典的键和值成为DataFrame的列和值。因此,唯一对我有用的是:

data = {'adjust_power': 'y', 'af_policy_r_submix_prio_adjust': '[null]', 'af_rf_info': '[null]', 'bat_ac': '3500', 'bat_capacity': '75'} 

columns = list(data.keys())
values = list(data.values())
arr_len = len(values)

pd.DataFrame(np.array(values, dtype=object).reshape(1, arr_len), columns=columns)

In my case I wanted keys and values of a dict to be columns and values of DataFrame. So the only thing that worked for me was:

data = {'adjust_power': 'y', 'af_policy_r_submix_prio_adjust': '[null]', 'af_rf_info': '[null]', 'bat_ac': '3500', 'bat_capacity': '75'} 

columns = list(data.keys())
values = list(data.values())
arr_len = len(values)

pd.DataFrame(np.array(values, dtype=object).reshape(1, arr_len), columns=columns)

回答 9

这对我有用,因为我想拥有一个单独的索引列

df = pd.DataFrame.from_dict(some_dict, orient="index").reset_index()
df.columns = ['A', 'B']

This is what worked for me, since I wanted to have a separate index column

df = pd.DataFrame.from_dict(some_dict, orient="index").reset_index()
df.columns = ['A', 'B']

回答 10

接受一个dict作为参数,并返回一个数据帧,其中dict的键作为索引,而值作为一列。

def dict_to_df(d):
    df=pd.DataFrame(d.items())
    df.set_index(0, inplace=True)
    return df

Accepts a dict as argument and returns a dataframe with the keys of the dict as index and values as a column.

def dict_to_df(d):
    df=pd.DataFrame(d.items())
    df.set_index(0, inplace=True)
    return df

回答 11

这对我来说是这样的:

df= pd.DataFrame([d.keys(), d.values()]).T
df.columns= ['keys', 'values']  # call them whatever you like

我希望这有帮助

This is how it worked for me :

df= pd.DataFrame([d.keys(), d.values()]).T
df.columns= ['keys', 'values']  # call them whatever you like

I hope this helps


回答 12

d = {'Date': list(yourDict.keys()),'Date_Values': list(yourDict.values())}
df = pandas.DataFrame(data=d)

如果不封装yourDict.keys()在中list(),则最终会将所有键和值放置在每一列的每一行中。像这样:

Date \ 0 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...
1 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...
2 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...
3 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...
4 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...

但是通过添加list(),结果看起来像这样:

Date Date_Values 0 2012-06-08 388 1 2012-06-09 388 2 2012-06-10 388 3 2012-06-11 389 4 2012-06-12 389 ...

d = {'Date': list(yourDict.keys()),'Date_Values': list(yourDict.values())}
df = pandas.DataFrame(data=d)

If you don’t encapsulate yourDict.keys() inside of list() , then you will end up with all of your keys and values being placed in every row of every column. Like this:

Date \ 0 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...
1 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...
2 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...
3 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...
4 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...

But by adding list() then the result looks like this:

Date Date_Values 0 2012-06-08 388 1 2012-06-09 388 2 2012-06-10 388 3 2012-06-11 389 4 2012-06-12 389 ...


回答 13

我已经遇到过几次,并有一个我从一个函数创建的示例字典,get_max_Path()它返回了示例字典:

{2: 0.3097502930247044, 3: 0.4413177909384636, 4: 0.5197224051562838, 5: 0.5717654946470984, 6: 0.6063959031223476, 7: 0.6365209824708223, 8: 0.655918861281035, 9: 0.680844386645206}

要将其转换为数据框,我运行了以下命令:

df = pd.DataFrame.from_dict(get_max_path(2), orient = 'index').reset_index()

返回带有单独索引的简单两列数据框:

index 0 0 2 0.309750 1 3 0.441318

只需使用重命名列 f.rename(columns={'index': 'Column1', 0: 'Column2'}, inplace=True)

I have run into this several times and have an example dictionary that I created from a function get_max_Path(), and it returns the sample dictionary:

{2: 0.3097502930247044, 3: 0.4413177909384636, 4: 0.5197224051562838, 5: 0.5717654946470984, 6: 0.6063959031223476, 7: 0.6365209824708223, 8: 0.655918861281035, 9: 0.680844386645206}

To convert this to a dataframe, I ran the following:

df = pd.DataFrame.from_dict(get_max_path(2), orient = 'index').reset_index()

Returns a simple two column dataframe with a separate index:

index 0 0 2 0.309750 1 3 0.441318

Just rename the columns using f.rename(columns={'index': 'Column1', 0: 'Column2'}, inplace=True)


回答 14

我认为您可以在创建字典时对数据格式进行一些更改,然后将其轻松转换为DataFrame:

输入:

a={'Dates':['2012-06-08','2012-06-10'],'Date_value':[388,389]}

输出:

{'Date_value': [388, 389], 'Dates': ['2012-06-08', '2012-06-10']}

输入:

aframe=DataFrame(a)

输出:将是您的DataFrame

您只需要在Sublime或Excel之类的地方使用一些文本编辑即可。

I think that you can make some changes in your data format when you create dictionary, then you can easily convert it to DataFrame:

input:

a={'Dates':['2012-06-08','2012-06-10'],'Date_value':[388,389]}

output:

{'Date_value': [388, 389], 'Dates': ['2012-06-08', '2012-06-10']}

input:

aframe=DataFrame(a)

output: will be your DataFrame

You just need to use some text editing in somewhere like Sublime or maybe Excel.


Python将来五分钟创建unix时间戳

问题:Python将来五分钟创建unix时间戳

我必须在将来的5分钟内创建一个“ Expires”值,但是我必须以UNIX Timestamp格式提供它。到目前为止,我已经掌握了这个功能,但似乎有点。

def expires():
    '''return a UNIX style timestamp representing 5 minutes from now'''
    epoch = datetime.datetime(1970, 1, 1)
    seconds_in_a_day = 60 * 60 * 24
    five_minutes = datetime.timedelta(seconds=5*60)
    five_minutes_from_now = datetime.datetime.now() + five_minutes
    since_epoch = five_minutes_from_now - epoch
    return since_epoch.days * seconds_in_a_day + since_epoch.seconds

是否有为我转换时间戳的模块或功能?

I have to create an “Expires” value 5 minutes in the future, but I have to supply it in UNIX Timestamp format. I have this so far, but it seems like a hack.

def expires():
    '''return a UNIX style timestamp representing 5 minutes from now'''
    epoch = datetime.datetime(1970, 1, 1)
    seconds_in_a_day = 60 * 60 * 24
    five_minutes = datetime.timedelta(seconds=5*60)
    five_minutes_from_now = datetime.datetime.now() + five_minutes
    since_epoch = five_minutes_from_now - epoch
    return since_epoch.days * seconds_in_a_day + since_epoch.seconds

Is there a module or function that does the timestamp conversion for me?


回答 0

另一种方法是使用calendar.timegm

future = datetime.datetime.utcnow() + datetime.timedelta(minutes=5)
return calendar.timegm(future.timetuple())

它比%s标记为strftime(在Windows上不起作用)更可移植。

Another way is to use calendar.timegm:

future = datetime.datetime.utcnow() + datetime.timedelta(minutes=5)
return calendar.timegm(future.timetuple())

It’s also more portable than %s flag to strftime (which doesn’t work on Windows).


回答 1

现在,在Python> = 3.3中,您只需调用timestamp()方法即可将时间戳记作为浮点数获取。

import datetime
current_time = datetime.datetime.now(datetime.timezone.utc)
unix_timestamp = current_time.timestamp() # works if Python >= 3.3

unix_timestamp_plus_5_min = unix_timestamp + (5 * 60)  # 5 min * 60 seconds

Now in Python >= 3.3 you can just call the timestamp() method to get the timestamp as a float.

import datetime
current_time = datetime.datetime.now(datetime.timezone.utc)
unix_timestamp = current_time.timestamp() # works if Python >= 3.3

unix_timestamp_plus_5_min = unix_timestamp + (5 * 60)  # 5 min * 60 seconds

回答 2

刚发现,它甚至更短。

import time
def expires():
    '''return a UNIX style timestamp representing 5 minutes from now'''
    return int(time.time()+300)

Just found this, and its even shorter.

import time
def expires():
    '''return a UNIX style timestamp representing 5 minutes from now'''
    return int(time.time()+300)

回答 3

这是您需要的:

import time
import datetime
n = datetime.datetime.now()
unix_time = time.mktime(n.timetuple())

This is what you need:

import time
import datetime
n = datetime.datetime.now()
unix_time = time.mktime(n.timetuple())

回答 4

您可以使用datetime.strftime以下%s格式的字符串以Epoch格式获取时间:

def expires():
    future = datetime.datetime.now() + datetime.timedelta(seconds=5*60)
    return int(future.strftime("%s"))

注意:此方法仅在linux下有效,而此方法不适用于时区。

You can use datetime.strftime to get the time in Epoch form, using the %s format string:

def expires():
    future = datetime.datetime.now() + datetime.timedelta(seconds=5*60)
    return int(future.strftime("%s"))

Note: This only works under linux, and this method doesn’t work with timezones.


回答 5

这是一个datetime从日期时间对象转换为posix时间戳记的基于基础的解决方案:

future = datetime.datetime.utcnow() + datetime.timedelta(minutes=5)
return (future - datetime.datetime(1970, 1, 1)).total_seconds()

在Python中将datetime.date转换为UTC时间戳,请参见更多详细信息。

Here’s a less broken datetime-based solution to convert from datetime object to posix timestamp:

future = datetime.datetime.utcnow() + datetime.timedelta(minutes=5)
return (future - datetime.datetime(1970, 1, 1)).total_seconds()

See more details at Converting datetime.date to UTC timestamp in Python.


回答 6

def in_unix(input):
  start = datetime.datetime(year=1970,month=1,day=1)
  diff = input - start
  return diff.total_seconds()
def in_unix(input):
  start = datetime.datetime(year=1970,month=1,day=1)
  diff = input - start
  return diff.total_seconds()

回答 7

关键是在开始转换之前,确保您使用的所有日期都在utc时区中。请参阅http://pytz.sourceforge.net/了解如何正确执行此操作。通过对utc进行标准化,可以消除夏令时转换的歧义。然后,您可以安全地使用timedelta来计算距Unix纪元的距离,然后将其转换为秒或毫秒。

请注意,生成的unix时间戳本身就是UTC时区。如果您希望查看本地化时区中的时间戳,则需要进行另一次转换。

另请注意,这仅适用于1970年之后的日期。

   import datetime
   import pytz

   UNIX_EPOCH = datetime.datetime(1970, 1, 1, 0, 0, tzinfo = pytz.utc)
   def EPOCH(utc_datetime):
      delta = utc_datetime - UNIX_EPOCH
      seconds = delta.total_seconds()
      ms = seconds * 1000
      return ms

The key is to ensure all the dates you are using are in the utc timezone before you start converting. See http://pytz.sourceforge.net/ to learn how to do that properly. By normalizing to utc, you eliminate the ambiguity of daylight savings transitions. Then you can safely use timedelta to calculate distance from the unix epoch, and then convert to seconds or milliseconds.

Note that the resulting unix timestamp is itself in the UTC timezone. If you wish to see the timestamp in a localized timezone, you will need to make another conversion.

Also note that this will only work for dates after 1970.

   import datetime
   import pytz

   UNIX_EPOCH = datetime.datetime(1970, 1, 1, 0, 0, tzinfo = pytz.utc)
   def EPOCH(utc_datetime):
      delta = utc_datetime - UNIX_EPOCH
      seconds = delta.total_seconds()
      ms = seconds * 1000
      return ms

回答 8

以下内容基于上述答案(加上毫秒数的更正),并datetime.timestamp()在使用时区时针对3.3之前的Python 3进行了仿真。

def datetime_timestamp(datetime):
    '''
    Equivalent to datetime.timestamp() for pre-3.3
    '''
    try:
        return datetime.timestamp()
    except AttributeError:
        utc_datetime = datetime.astimezone(utc)
        return timegm(utc_datetime.timetuple()) + utc_datetime.microsecond / 1e6

要严格按照要求回答问题,您需要:

datetime_timestamp(my_datetime) + 5 * 60

datetime_timestampsimple-date的一部分。但是,如果您使用的是该软件包,则可能要输入:

SimpleDate(my_datetime).timestamp + 5 * 60

可以为my_datetime处理更多格式/类型。

The following is based on the answers above (plus a correction for the milliseconds) and emulates datetime.timestamp() for Python 3 before 3.3 when timezones are used.

def datetime_timestamp(datetime):
    '''
    Equivalent to datetime.timestamp() for pre-3.3
    '''
    try:
        return datetime.timestamp()
    except AttributeError:
        utc_datetime = datetime.astimezone(utc)
        return timegm(utc_datetime.timetuple()) + utc_datetime.microsecond / 1e6

To strictly answer the question as asked, you’d want:

datetime_timestamp(my_datetime) + 5 * 60

datetime_timestamp is part of simple-date. But if you were using that package you’d probably type:

SimpleDate(my_datetime).timestamp + 5 * 60

which handles many more formats / types for my_datetime.


回答 9

def expiration_time():
    import datetime,calendar
    timestamp = calendar.timegm(datetime.datetime.now().timetuple())
    returnValue = datetime.timedelta(minutes=5).total_seconds() + timestamp
    return returnValue
def expiration_time():
    import datetime,calendar
    timestamp = calendar.timegm(datetime.datetime.now().timetuple())
    returnValue = datetime.timedelta(minutes=5).total_seconds() + timestamp
    return returnValue

回答 10

请注意,timedelta.total_seconds()适用于python-2.7 +的解决方案。使用calendar.timegm(future.utctimetuple())较低版本的Python。

Note that solutions with timedelta.total_seconds() work on python-2.7+. Use calendar.timegm(future.utctimetuple()) for lower versions of Python.


如何将列表合并为元组列表?

问题:如何将列表合并为元组列表?

实现以下目标的Python方法是什么?

# Original lists:

list_a = [1, 2, 3, 4]
list_b = [5, 6, 7, 8]

# List of tuples from 'list_a' and 'list_b':

list_c = [(1,5), (2,6), (3,7), (4,8)]

的每个成员list_c都是一个元组,其第一个成员是from list_a,第二个成员是from list_b

What is the Pythonic approach to achieve the following?

# Original lists:

list_a = [1, 2, 3, 4]
list_b = [5, 6, 7, 8]

# List of tuples from 'list_a' and 'list_b':

list_c = [(1,5), (2,6), (3,7), (4,8)]

Each member of list_c is a tuple, whose first member is from list_a and the second is from list_b.


回答 0

在Python 2中:

>>> list_a = [1, 2, 3, 4]
>>> list_b = [5, 6, 7, 8]
>>> zip(list_a, list_b)
[(1, 5), (2, 6), (3, 7), (4, 8)]

在Python 3中:

>>> list_a = [1, 2, 3, 4]
>>> list_b = [5, 6, 7, 8]
>>> list(zip(list_a, list_b))
[(1, 5), (2, 6), (3, 7), (4, 8)]

In Python 2:

>>> list_a = [1, 2, 3, 4]
>>> list_b = [5, 6, 7, 8]
>>> zip(list_a, list_b)
[(1, 5), (2, 6), (3, 7), (4, 8)]

In Python 3:

>>> list_a = [1, 2, 3, 4]
>>> list_b = [5, 6, 7, 8]
>>> list(zip(list_a, list_b))
[(1, 5), (2, 6), (3, 7), (4, 8)]

回答 1

在python 3.0中,zip返回一个zip对象。您可以调用以获得清单list(zip(a, b))

In python 3.0 zip returns a zip object. You can get a list out of it by calling list(zip(a, b)).


回答 2

您可以使用地图lambda

a = [2,3,4]
b = [5,6,7]
c = map(lambda x,y:(x,y),a,b)

如果原始列表的长度不匹配,这也将起作用

You can use map lambda

a = [2,3,4]
b = [5,6,7]
c = map(lambda x,y:(x,y),a,b)

This will also work if there lengths of original lists do not match


回答 3

您正在寻找内置功能zip

Youre looking for the builtin function zip.


回答 4

我不确定这是否是pythonic方式,但是如果两个列表具有相同数量的元素,这似乎很简单:

list_a = [1, 2, 3, 4]

list_b = [5, 6, 7, 8]

list_c=[(list_a[i],list_b[i]) for i in range(0,len(list_a))]

I am not sure if this a pythonic way or not but this seems simple if both lists have the same number of elements :

list_a = [1, 2, 3, 4]

list_b = [5, 6, 7, 8]

list_c=[(list_a[i],list_b[i]) for i in range(0,len(list_a))]

回答 5

我知道这是一个古老的问题,已经得到回答,但是由于某些原因,我仍然想发布此替代解决方案。我知道很容易找出哪个内置函数可以完成您所需的“魔术”,但是知道您可以自己完成该操作也不会有什么害处。

>>> list_1 = ['Ace', 'King']
>>> list_2 = ['Spades', 'Clubs', 'Diamonds']
>>> deck = []
>>> for i in range(max((len(list_1),len(list_2)))):
        while True:
            try:
                card = (list_1[i],list_2[i])
            except IndexError:
                if len(list_1)>len(list_2):
                    list_2.append('')
                    card = (list_1[i],list_2[i])
                elif len(list_1)<len(list_2):
                    list_1.append('')
                    card = (list_1[i], list_2[i])
                continue
            deck.append(card)
            break
>>>
>>> #and the result should be:
>>> print deck
>>> [('Ace', 'Spades'), ('King', 'Clubs'), ('', 'Diamonds')]

I know this is an old question and was already answered, but for some reason, I still wanna post this alternative solution. I know it’s easy to just find out which built-in function does the “magic” you need, but it doesn’t hurt to know you can do it by yourself.

>>> list_1 = ['Ace', 'King']
>>> list_2 = ['Spades', 'Clubs', 'Diamonds']
>>> deck = []
>>> for i in range(max((len(list_1),len(list_2)))):
        while True:
            try:
                card = (list_1[i],list_2[i])
            except IndexError:
                if len(list_1)>len(list_2):
                    list_2.append('')
                    card = (list_1[i],list_2[i])
                elif len(list_1)<len(list_2):
                    list_1.append('')
                    card = (list_1[i], list_2[i])
                continue
            deck.append(card)
            break
>>>
>>> #and the result should be:
>>> print deck
>>> [('Ace', 'Spades'), ('King', 'Clubs'), ('', 'Diamonds')]

回答 6

您在问题陈述中显示的输出不是元组而是列表

list_c = [(1,5), (2,6), (3,7), (4,8)]

检查

type(list_c)

考虑到您想要结果作为list_a和list_b中的元组,请执行

tuple(zip(list_a,list_b)) 

The output which you showed in problem statement is not the tuple but list

list_c = [(1,5), (2,6), (3,7), (4,8)]

check for

type(list_c)

considering you want the result as tuple out of list_a and list_b, do

tuple(zip(list_a,list_b)) 

回答 7

一种不使用的替代方法zip

list_c = [(p1, p2) for idx1, p1 in enumerate(list_a) for idx2, p2 in enumerate(list_b) if idx1==idx2]

万一不仅要获取元组1st与1st,2nd与2nd …而且要获取2个列表的所有可能组合,可以使用

list_d = [(p1, p2) for p1 in list_a for p2 in list_b]

One alternative without using zip:

list_c = [(p1, p2) for idx1, p1 in enumerate(list_a) for idx2, p2 in enumerate(list_b) if idx1==idx2]

In case one wants to get not only tuples 1st with 1st, 2nd with 2nd… but all possible combinations of the 2 lists, that would be done with

list_d = [(p1, p2) for p1 in list_a for p2 in list_b]