分类目录归档:知识问答

您如何用Python表达二进制文字?

问题:您如何用Python表达二进制文字?

如何使用Python文字将整数表示为二进制数?

我很容易找到十六进制的答案:

>>> 0x12AF
4783
>>> 0x100
256

和八进制:

>>> 01267
695
>>> 0100
64

您如何使用文字在Python中表示二进制?


答案摘要

  • Python 2.5及更早版本:可以使用,int('01010101111',2)但不能使用文字来表示二进制。
  • Python 2.5和更早版本:无法表达二进制文字。
  • Python 2.6 beta:您可以这样做:0b11001110B1100111
  • Python 2.6 beta:还将允许0o270O27(第二个字符是字母O)表示一个八进制。
  • Python 3.0 beta:与2.6相同,但将不再允许使用较旧027的八进制语法。

How do you express an integer as a binary number with Python literals?

I was easily able to find the answer for hex:

>>> 0x12AF
4783
>>> 0x100
256

and octal:

>>> 01267
695
>>> 0100
64

How do you use literals to express binary in Python?


Summary of Answers

  • Python 2.5 and earlier: can express binary using int('01010101111',2) but not with a literal.
  • Python 2.5 and earlier: there is no way to express binary literals.
  • Python 2.6 beta: You can do like so: 0b1100111 or 0B1100111.
  • Python 2.6 beta: will also allow 0o27 or 0O27 (second character is the letter O) to represent an octal.
  • Python 3.0 beta: Same as 2.6, but will no longer allow the older 027 syntax for octals.

回答 0

供参考- 未来的 Python可能性:
从Python 2.6开始,您可以使用前缀0b0B表示二进制文字:

>>> 0b101111
47

您还可以使用新的bin函数来获取数字的二进制表示形式:

>>> bin(173)
'0b10101101'

文档的开发版本:Python 2.6的新增功能

For reference—future Python possibilities:
Starting with Python 2.6 you can express binary literals using the prefix 0b or 0B:

>>> 0b101111
47

You can also use the new bin function to get the binary representation of a number:

>>> bin(173)
'0b10101101'

Development version of the documentation: What’s New in Python 2.6


回答 1

>>> print int('01010101111',2)
687
>>> print int('11111111',2)
255

另一种方式。

>>> print int('01010101111',2)
687
>>> print int('11111111',2)
255

Another way.


回答 2

您如何用Python表达二进制文字?

它们不是“二进制”文字,而是“整数文字”。您可以用二进制格式表示整数文字,0后跟a Bb后跟一系列零和一,例如:

>>> 0b0010101010
170
>>> 0B010101
21

从Python 3 文档开始,以下是在Python中提供整数文字的方式:

整数文字由以下词汇定义描述:

integer      ::=  decinteger | bininteger | octinteger | hexinteger
decinteger   ::=  nonzerodigit (["_"] digit)* | "0"+ (["_"] "0")*
bininteger   ::=  "0" ("b" | "B") (["_"] bindigit)+
octinteger   ::=  "0" ("o" | "O") (["_"] octdigit)+
hexinteger   ::=  "0" ("x" | "X") (["_"] hexdigit)+
nonzerodigit ::=  "1"..."9"
digit        ::=  "0"..."9"
bindigit     ::=  "0" | "1"
octdigit     ::=  "0"..."7"
hexdigit     ::=  digit | "a"..."f" | "A"..."F"

除了可以存储在可用内存中的整数之外,整数字面量的长度没有限制。

请注意,不允许使用非零十进制数字开头的零。这是为了消除C样式八进制文字的歧义,Python在3.0版之前使用了这些文字。

整数文字的一些示例:

7     2147483647                        0o177    0b100110111
3     79228162514264337593543950336     0o377    0xdeadbeef
      100_000_000_000                   0b_1110_0101

在版本3.6中进行了更改:现在允许在文本中使用下划线进行分组。

其他表达二进制的方式:

您可以在可操作的字符串对象中包含零和一(尽管在大多数情况下,您可能应该对整数进行按位运算)-只需将零和一的字符串以及您要从中转换的基数传递给int ):

>>> int('010101', 2)
21

您可以选择使用0b0B前缀:

>>> int('0b0010101010', 2)
170

如果将其0作为基数传递,则如果字符串未指定前缀,则它将假定基数为10:

>>> int('10101', 0)
10101
>>> int('0b10101', 0)
21

从int转换回人类可读的二进制文件:

您可以将整数传递给bin以查看二进制文字的字符串表示形式:

>>> bin(21)
'0b10101'

你可以结合binint去来回:

>>> bin(int('010101', 2))
'0b10101'

如果希望最小宽度和前面的零,也可以使用格式规范:

>>> format(int('010101', 2), '{fill}{width}b'.format(width=10, fill=0))
'0000010101'
>>> format(int('010101', 2), '010b')
'0000010101'

How do you express binary literals in Python?

They’re not “binary” literals, but rather, “integer literals”. You can express integer literals with a binary format with a 0 followed by a B or b followed by a series of zeros and ones, for example:

>>> 0b0010101010
170
>>> 0B010101
21

From the Python 3 docs, these are the ways of providing integer literals in Python:

Integer literals are described by the following lexical definitions:

integer      ::=  decinteger | bininteger | octinteger | hexinteger
decinteger   ::=  nonzerodigit (["_"] digit)* | "0"+ (["_"] "0")*
bininteger   ::=  "0" ("b" | "B") (["_"] bindigit)+
octinteger   ::=  "0" ("o" | "O") (["_"] octdigit)+
hexinteger   ::=  "0" ("x" | "X") (["_"] hexdigit)+
nonzerodigit ::=  "1"..."9"
digit        ::=  "0"..."9"
bindigit     ::=  "0" | "1"
octdigit     ::=  "0"..."7"
hexdigit     ::=  digit | "a"..."f" | "A"..."F"

There is no limit for the length of integer literals apart from what can be stored in available memory.

Note that leading zeros in a non-zero decimal number are not allowed. This is for disambiguation with C-style octal literals, which Python used before version 3.0.

Some examples of integer literals:

7     2147483647                        0o177    0b100110111
3     79228162514264337593543950336     0o377    0xdeadbeef
      100_000_000_000                   0b_1110_0101

Changed in version 3.6: Underscores are now allowed for grouping purposes in literals.

Other ways of expressing binary:

You can have the zeros and ones in a string object which can be manipulated (although you should probably just do bitwise operations on the integer in most cases) – just pass int the string of zeros and ones and the base you are converting from (2):

>>> int('010101', 2)
21

You can optionally have the 0b or 0B prefix:

>>> int('0b0010101010', 2)
170

If you pass it 0 as the base, it will assume base 10 if the string doesn’t specify with a prefix:

>>> int('10101', 0)
10101
>>> int('0b10101', 0)
21

Converting from int back to human readable binary:

You can pass an integer to bin to see the string representation of a binary literal:

>>> bin(21)
'0b10101'

And you can combine bin and int to go back and forth:

>>> bin(int('010101', 2))
'0b10101'

You can use a format specification as well, if you want to have minimum width with preceding zeros:

>>> format(int('010101', 2), '{fill}{width}b'.format(width=10, fill=0))
'0000010101'
>>> format(int('010101', 2), '010b')
'0000010101'

回答 3

开头的0表示底数是8(而不是10),这很容易看到:

>>> int('010101', 0)
4161

如果您不以0开头,则python假定数字以10为底。

>>> int('10101', 0)
10101

0 in the start here specifies that the base is 8 (not 10), which is pretty easy to see:

>>> int('010101', 0)
4161

If you don’t start with a 0, then python assumes the number is base 10.

>>> int('10101', 0)
10101

回答 4

据我所知,直到2.5,Python仅支持十六进制和八进制文字。我确实找到了一些有关在将来的版本中添加二进制文件的讨论,但没有明确的定义。

As far as I can tell Python, up through 2.5, only supports hexadecimal & octal literals. I did find some discussions about adding binary to future versions but nothing definite.


回答 5

我很确定这是由于Python 3.0的变化之一,也许bin()与hex()和oct()一起使用。

编辑:lbrandy的答案在所有情况下都是正确的。

I am pretty sure this is one of the things due to change in Python 3.0 with perhaps bin() to go with hex() and oct().

EDIT: lbrandy’s answer is correct in all cases.


如何使用timeit模块

问题:如何使用timeit模块

我了解做什么的概念,timeit但不确定如何在代码中实现。

我怎样才能比较两个功能,比方说insertion_sorttim_sort,用timeit

I understand the concept of what timeit does but I am not sure how to implement it in my code.

How can I compare two functions, say insertion_sort and tim_sort, with timeit?


回答 0

timeit的工作方式是运行一次安装代码,然后重复调用一系列语句。因此,如果要测试排序,则需要格外小心,以免就地进行一次排序不会影响已排序数据的下一遍(这当然会使Timsort发光,因为它表现最佳当数据已经部分排序时)。

这是有关如何设置排序测试的示例:

>>> import timeit

>>> setup = '''
import random

random.seed('slartibartfast')
s = [random.random() for i in range(1000)]
timsort = list.sort
'''

>>> print min(timeit.Timer('a=s[:]; timsort(a)', setup=setup).repeat(7, 1000))
0.334147930145

请注意,这一系列语句在每次通过时都会对未排序的数据进行全新复制。

另外,请注意运行测量套件七次并仅保留最佳时间的计时技术-这确实可以帮助减少由于系统上正在运行其他进程而导致的测量失真。

这些是我正确使用timeit的技巧。希望这可以帮助 :-)

The way timeit works is to run setup code once and then make repeated calls to a series of statements. So, if you want to test sorting, some care is required so that one pass at an in-place sort doesn’t affect the next pass with already sorted data (that, of course, would make the Timsort really shine because it performs best when the data already partially ordered).

Here is an example of how to set up a test for sorting:

>>> import timeit

>>> setup = '''
import random

random.seed('slartibartfast')
s = [random.random() for i in range(1000)]
timsort = list.sort
'''

>>> print min(timeit.Timer('a=s[:]; timsort(a)', setup=setup).repeat(7, 1000))
0.334147930145

Note that the series of statements makes a fresh copy of the unsorted data on every pass.

Also, note the timing technique of running the measurement suite seven times and keeping only the best time — this can really help reduce measurement distortions due to other processes running on your system.

Those are my tips for using timeit correctly. Hope this helps :-)


回答 1

如果要timeit在交互式Python会话中使用,有两个方便的选项:

  1. 使用IPython Shell。它具有方便的%timeit特殊功能:

    In [1]: def f(x):
       ...:     return x*x
       ...: 
    
    In [2]: %timeit for x in range(100): f(x)
    100000 loops, best of 3: 20.3 us per loop
  2. 在标准的Python解释器中,您可以通过__main__在setup语句中导入它们来访问在交互式会话期间先前定义的函数和其他名称:

    >>> def f(x):
    ...     return x * x 
    ... 
    >>> import timeit
    >>> timeit.repeat("for x in range(100): f(x)", "from __main__ import f",
                      number=100000)
    [2.0640320777893066, 2.0876040458679199, 2.0520210266113281]

If you want to use timeit in an interactive Python session, there are two convenient options:

  1. Use the IPython shell. It features the convenient %timeit special function:

    In [1]: def f(x):
       ...:     return x*x
       ...: 
    
    In [2]: %timeit for x in range(100): f(x)
    100000 loops, best of 3: 20.3 us per loop
    
  2. In a standard Python interpreter, you can access functions and other names you defined earlier during the interactive session by importing them from __main__ in the setup statement:

    >>> def f(x):
    ...     return x * x 
    ... 
    >>> import timeit
    >>> timeit.repeat("for x in range(100): f(x)", "from __main__ import f",
                      number=100000)
    [2.0640320777893066, 2.0876040458679199, 2.0520210266113281]
    

回答 2

我会告诉您一个秘密:最好的使用方法timeit是在命令行上。

在命令行上,进行timeit适当的统计分析:它告诉您最短运行花费了多长时间。这很好,因为所有计时错误都是正的。因此,最短的时间误差最小。没有办法得到负错误,因为计算机的计算速度永远不可能超过其计算速度!

因此,命令行界面:

%~> python -m timeit "1 + 2"
10000000 loops, best of 3: 0.0468 usec per loop

这很简单,是吗?

您可以设置以下内容:

%~> python -m timeit -s "x = range(10000)" "sum(x)"
1000 loops, best of 3: 543 usec per loop

也是有用的!

如果需要多行,则可以使用外壳程序的自动延续或使用单独的参数:

%~> python -m timeit -s "x = range(10000)" -s "y = range(100)" "sum(x)" "min(y)"
1000 loops, best of 3: 554 usec per loop

这给出了一个设置

x = range(1000)
y = range(100)

和时代

sum(x)
min(y)

如果您想要更长的脚本,则可能会倾向于timeit使用Python脚本。我建议避免这种情况,因为在命令行上分析和计时会更好。相反,我倾向于制作shell脚本:

 SETUP="

 ... # lots of stuff

 "

 echo Minmod arr1
 python -m timeit -s "$SETUP" "Minmod(arr1)"

 echo pure_minmod arr1
 python -m timeit -s "$SETUP" "pure_minmod(arr1)"

 echo better_minmod arr1
 python -m timeit -s "$SETUP" "better_minmod(arr1)"

 ... etc

由于要进行多次初始化,因此可能需要更长的时间,但是通常这没什么大不了的。


但是,如果timeit在模块内部使用该怎么办?

好吧,简单的方法是:

def function(...):
    ...

timeit.Timer(function).timeit(number=NUMBER)

这样您就可以累积(而不是最短!)时间来运行该次数。

为了获得良好的分析效果,请使用.repeat并采取以下最低限度的措施:

min(timeit.Timer(function).repeat(repeat=REPEATS, number=NUMBER))

通常应将此与结合使用,functools.partial而不是lambda: ...降低开销。因此,您可能会遇到以下情况:

from functools import partial

def to_time(items):
    ...

test_items = [1, 2, 3] * 100
times = timeit.Timer(partial(to_time, test_items)).repeat(3, 1000)

# Divide by the number of repeats
time_taken = min(times) / 1000

您也可以:

timeit.timeit("...", setup="from __main__ import ...", number=NUMBER)

这将使您从命令行更接近界面,但是方式要少得多。将"from __main__ import ..."让您使用代码从您的主模块所创造的人工环境内timeit

值得注意的是,这是一个方便包装Timer(...).timeit(...),因此在时间安排上并不是特别好。我个人更喜欢使用Timer(...).repeat(...)上面显示的内容。


警告事项

timeit到处都有一些警告。

  • 开销不占。说您要计时x += 1,找出加法需要多长时间:

    >>> python -m timeit -s "x = 0" "x += 1"
    10000000 loops, best of 3: 0.0476 usec per loop

    好吧,这不是 0.0476 µs。您只知道它比这还。所有错误均为正。

    因此,尝试找到开销:

    >>> python -m timeit -s "x = 0" ""      
    100000000 loops, best of 3: 0.014 usec per loop

    仅从定时开始,这就是30%的开销!这会大大歪曲相对的时间安排。但是您只真正关心添加的时间。查找时间x也需要包含在开销中:

    >>> python -m timeit -s "x = 0" "x"
    100000000 loops, best of 3: 0.0166 usec per loop

    差别不大,但是就在那里。

  • 变异方法很危险。

    >>> python -m timeit -s "x = [0]*100000" "while x: x.pop()"
    10000000 loops, best of 3: 0.0436 usec per loop

    但这是完全错误的! x是第一次迭代后的空列表。您需要重新初始化:

    >>> python -m timeit "x = [0]*100000" "while x: x.pop()"
    100 loops, best of 3: 9.79 msec per loop

    但是那样您就会有很多开销。分别说明。

    >>> python -m timeit "x = [0]*100000"                   
    1000 loops, best of 3: 261 usec per loop

    请注意,在这里减去开销是合理的,仅是因为开销只是时间的一小部分。

    对于你的榜样,值得一提的是,这两个插入排序和蒂姆排序有完全不同寻常的已排序的列表时序行为。这意味着,random.shuffle如果您想避免破坏时间安排,就需要进行两次排序。

I’ll let you in on a secret: the best way to use timeit is on the command line.

On the command line, timeit does proper statistical analysis: it tells you how long the shortest run took. This is good because all error in timing is positive. So the shortest time has the least error in it. There’s no way to get negative error because a computer can’t ever compute faster than it can compute!

So, the command-line interface:

%~> python -m timeit "1 + 2"
10000000 loops, best of 3: 0.0468 usec per loop

That’s quite simple, eh?

You can set stuff up:

%~> python -m timeit -s "x = range(10000)" "sum(x)"
1000 loops, best of 3: 543 usec per loop

which is useful, too!

If you want multiple lines, you can either use the shell’s automatic continuation or use separate arguments:

%~> python -m timeit -s "x = range(10000)" -s "y = range(100)" "sum(x)" "min(y)"
1000 loops, best of 3: 554 usec per loop

That gives a setup of

x = range(1000)
y = range(100)

and times

sum(x)
min(y)

If you want to have longer scripts you might be tempted to move to timeit inside a Python script. I suggest avoiding that because the analysis and timing is simply better on the command line. Instead, I tend to make shell scripts:

 SETUP="

 ... # lots of stuff

 "

 echo Minmod arr1
 python -m timeit -s "$SETUP" "Minmod(arr1)"

 echo pure_minmod arr1
 python -m timeit -s "$SETUP" "pure_minmod(arr1)"

 echo better_minmod arr1
 python -m timeit -s "$SETUP" "better_minmod(arr1)"

 ... etc

This can take a bit longer due to the multiple initialisations, but normally that’s not a big deal.


But what if you want to use timeit inside your module?

Well, the simple way is to do:

def function(...):
    ...

timeit.Timer(function).timeit(number=NUMBER)

and that gives you cumulative (not minimum!) time to run that number of times.

To get a good analysis, use .repeat and take the minimum:

min(timeit.Timer(function).repeat(repeat=REPEATS, number=NUMBER))

You should normally combine this with functools.partial instead of lambda: ... to lower overhead. Thus you could have something like:

from functools import partial

def to_time(items):
    ...

test_items = [1, 2, 3] * 100
times = timeit.Timer(partial(to_time, test_items)).repeat(3, 1000)

# Divide by the number of repeats
time_taken = min(times) / 1000

You can also do:

timeit.timeit("...", setup="from __main__ import ...", number=NUMBER)

which would give you something closer to the interface from the command-line, but in a much less cool manner. The "from __main__ import ..." lets you use code from your main module inside the artificial environment created by timeit.

It’s worth noting that this is a convenience wrapper for Timer(...).timeit(...) and so isn’t particularly good at timing. I personally far prefer using Timer(...).repeat(...) as I’ve shown above.


Warnings

There are a few caveats with timeit that hold everywhere.

  • Overhead is not accounted for. Say you want to time x += 1, to find out how long addition takes:

    >>> python -m timeit -s "x = 0" "x += 1"
    10000000 loops, best of 3: 0.0476 usec per loop
    

    Well, it’s not 0.0476 µs. You only know that it’s less than that. All error is positive.

    So try and find pure overhead:

    >>> python -m timeit -s "x = 0" ""      
    100000000 loops, best of 3: 0.014 usec per loop
    

    That’s a good 30% overhead just from timing! This can massively skew relative timings. But you only really cared about the adding timings; the look-up timings for x also need to be included in overhead:

    >>> python -m timeit -s "x = 0" "x"
    100000000 loops, best of 3: 0.0166 usec per loop
    

    The difference isn’t much larger, but it’s there.

  • Mutating methods are dangerous.

    >>> python -m timeit -s "x = [0]*100000" "while x: x.pop()"
    10000000 loops, best of 3: 0.0436 usec per loop
    

    But that’s completely wrong! x is the empty list after the first iteration. You’ll need to reinitialize:

    >>> python -m timeit "x = [0]*100000" "while x: x.pop()"
    100 loops, best of 3: 9.79 msec per loop
    

    But then you have lots of overhead. Account for that separately.

    >>> python -m timeit "x = [0]*100000"                   
    1000 loops, best of 3: 261 usec per loop
    

    Note that subtracting the overhead is reasonable here only because the overhead is a small-ish fraction of the time.

    For your example, it’s worth noting that both Insertion Sort and Tim Sort have completely unusual timing behaviours for already-sorted lists. This means you will require a random.shuffle between sorts if you want to avoid wrecking your timings.


回答 3

如果要快速比较两个代码/功能块,可以执行以下操作:

import timeit

start_time = timeit.default_timer()
func1()
print(timeit.default_timer() - start_time)

start_time = timeit.default_timer()
func2()
print(timeit.default_timer() - start_time)

If you want to compare two blocks of code / functions quickly you could do:

import timeit

start_time = timeit.default_timer()
func1()
print(timeit.default_timer() - start_time)

start_time = timeit.default_timer()
func2()
print(timeit.default_timer() - start_time)

回答 4

我发现使用timeit的最简单方法是从命令行:

给定test.py

def InsertionSort(): ...
def TimSort(): ...

运行timeit是这样的:

% python -mtimeit -s'import test' 'test.InsertionSort()'
% python -mtimeit -s'import test' 'test.TimSort()'

I find the easiest way to use timeit is from the command line:

Given test.py:

def InsertionSort(): ...
def TimSort(): ...

run timeit like this:

% python -mtimeit -s'import test' 'test.InsertionSort()'
% python -mtimeit -s'import test' 'test.TimSort()'

回答 5

对我来说,这是最快的方法:

import timeit
def foo():
    print("here is my code to time...")


timeit.timeit(stmt=foo, number=1234567)

for me, this is the fastest way:

import timeit
def foo():
    print("here is my code to time...")


timeit.timeit(stmt=foo, number=1234567)

回答 6

# Генерация целых чисел

def gen_prime(x):
    multiples = []
    results = []
    for i in range(2, x+1):
        if i not in multiples:
            results.append(i)
            for j in range(i*i, x+1, i):
                multiples.append(j)

    return results


import timeit

# Засекаем время

start_time = timeit.default_timer()
gen_prime(3000)
print(timeit.default_timer() - start_time)

# start_time = timeit.default_timer()
# gen_prime(1001)
# print(timeit.default_timer() - start_time)
# Генерация целых чисел

def gen_prime(x):
    multiples = []
    results = []
    for i in range(2, x+1):
        if i not in multiples:
            results.append(i)
            for j in range(i*i, x+1, i):
                multiples.append(j)

    return results


import timeit

# Засекаем время

start_time = timeit.default_timer()
gen_prime(3000)
print(timeit.default_timer() - start_time)

# start_time = timeit.default_timer()
# gen_prime(1001)
# print(timeit.default_timer() - start_time)

回答 7

这很好用:

  python -m timeit -c "$(cat file_name.py)"

This works great:

  python -m timeit -c "$(cat file_name.py)"

回答 8

让我们在以下每个目录中设置相同的字典并测试执行时间。

setup参数基本上是在设置字典

编号是要运行的代码1000000次。不是设置而是stmt

运行此命令时,您可以看到索引比获取索引快得多。您可以多次运行以查看。

该代码基本上试图获取字典中c的值。

import timeit

print('Getting value of C by index:', timeit.timeit(stmt="mydict['c']", setup="mydict={'a':5, 'b':6, 'c':7}", number=1000000))
print('Getting value of C by get:', timeit.timeit(stmt="mydict.get('c')", setup="mydict={'a':5, 'b':6, 'c':7}", number=1000000))

这是我的结果,您的结果会有所不同。

按索引:0.20900007452246427

通过获取:0.54841166886888

lets setup the same dictionary in each of the following and test the execution time.

The setup argument is basically setting up the dictionary

Number is to run the code 1000000 times. Not the setup but the stmt

When you run this you can see that index is way faster than get. You can run it multiple times to see.

The code basically tries to get the value of c in the dictionary.

import timeit

print('Getting value of C by index:', timeit.timeit(stmt="mydict['c']", setup="mydict={'a':5, 'b':6, 'c':7}", number=1000000))
print('Getting value of C by get:', timeit.timeit(stmt="mydict.get('c')", setup="mydict={'a':5, 'b':6, 'c':7}", number=1000000))

Here are my results, yours will differ.

by index: 0.20900007452246427

by get: 0.54841166886888


回答 9

只需将整个代码作为timeit的参数传递:

import timeit

print(timeit.timeit(

"""   
limit = 10000
prime_list = [i for i in range(2, limit+1)]

for prime in prime_list:
    for elem in range(prime*2, max(prime_list)+1, prime):
        if elem in prime_list:
            prime_list.remove(elem)
"""   
, number=10))

simply pass your entire code as an argument of timeit:

import timeit

print(timeit.timeit(

"""   
limit = 10000
prime_list = [i for i in range(2, limit+1)]

for prime in prime_list:
    for elem in range(prime*2, max(prime_list)+1, prime):
        if elem in prime_list:
            prime_list.remove(elem)
"""   
, number=10))

回答 10

import timeit


def oct(x):
   return x*x


timeit.Timer("for x in range(100): oct(x)", "gc.enable()").timeit()
import timeit


def oct(x):
   return x*x


timeit.Timer("for x in range(100): oct(x)", "gc.enable()").timeit()

回答 11

内置的timeit模块在IPython命令行中效果最佳。

要从模块内计时功能:

from timeit import default_timer as timer
import sys

def timefunc(func, *args, **kwargs):
    """Time a function. 

    args:
        iterations=3

    Usage example:
        timeit(myfunc, 1, b=2)
    """
    try:
        iterations = kwargs.pop('iterations')
    except KeyError:
        iterations = 3
    elapsed = sys.maxsize
    for _ in range(iterations):
        start = timer()
        result = func(*args, **kwargs)
        elapsed = min(timer() - start, elapsed)
    print(('Best of {} {}(): {:.9f}'.format(iterations, func.__name__, elapsed)))
    return result

The built-in timeit module works best from the IPython command line.

To time functions from within a module:

from timeit import default_timer as timer
import sys

def timefunc(func, *args, **kwargs):
    """Time a function. 

    args:
        iterations=3

    Usage example:
        timeit(myfunc, 1, b=2)
    """
    try:
        iterations = kwargs.pop('iterations')
    except KeyError:
        iterations = 3
    elapsed = sys.maxsize
    for _ in range(iterations):
        start = timer()
        result = func(*args, **kwargs)
        elapsed = min(timer() - start, elapsed)
    print(('Best of {} {}(): {:.9f}'.format(iterations, func.__name__, elapsed)))
    return result

回答 12

如何将Python REPL解释器与接受参数的函数一起使用的示例。

>>> import timeit                                                                                         

>>> def naive_func(x):                                                                                    
...     a = 0                                                                                             
...     for i in range(a):                                                                                
...         a += i                                                                                        
...     return a                                                                                          

>>> def wrapper(func, *args, **kwargs):                                                                   
...     def wrapper():                                                                                    
...         return func(*args, **kwargs)                                                                  
...     return wrapper                                                                                    

>>> wrapped = wrapper(naive_func, 1_000)                                                                  

>>> timeit.timeit(wrapped, number=1_000_000)                                                              
0.4458435332577161                                                                                        

Example of how to use Python REPL interpreter with function that accepts parameters.

>>> import timeit                                                                                         

>>> def naive_func(x):                                                                                    
...     a = 0                                                                                             
...     for i in range(a):                                                                                
...         a += i                                                                                        
...     return a                                                                                          

>>> def wrapper(func, *args, **kwargs):                                                                   
...     def wrapper():                                                                                    
...         return func(*args, **kwargs)                                                                  
...     return wrapper                                                                                    

>>> wrapped = wrapper(naive_func, 1_000)                                                                  

>>> timeit.timeit(wrapped, number=1_000_000)                                                              
0.4458435332577161                                                                                        

回答 13

您将创建两个函数,然后运行与此类似的操作。请注意,您要选择相同的执行/运行次数来比较apple与apple。
这已在Python 3.7下进行了测试。

在此处输入图片说明 这是易于复制的代码

!/usr/local/bin/python3
import timeit

def fibonacci(n):
    """
    Returns the n-th Fibonacci number.
    """
    if(n == 0):
        result = 0
    elif(n == 1):
        result = 1
    else:
        result = fibonacci(n-1) + fibonacci(n-2)
    return result

if __name__ == '__main__':
    import timeit
    t1 = timeit.Timer("fibonacci(13)", "from __main__ import fibonacci")
    print("fibonacci ran:",t1.timeit(number=1000), "milliseconds")

You would create two functions and then run something similar to this. Notice, you want to choose the same number of execution/run to compare apple to apple.
This was tested under Python 3.7.

enter image description here Here is the code for ease of copying it

!/usr/local/bin/python3
import timeit

def fibonacci(n):
    """
    Returns the n-th Fibonacci number.
    """
    if(n == 0):
        result = 0
    elif(n == 1):
        result = 1
    else:
        result = fibonacci(n-1) + fibonacci(n-2)
    return result

if __name__ == '__main__':
    import timeit
    t1 = timeit.Timer("fibonacci(13)", "from __main__ import fibonacci")
    print("fibonacci ran:",t1.timeit(number=1000), "milliseconds")

Python中的循环(或循环)导入

问题:Python中的循环(或循环)导入

如果两个模块相互导入会怎样?

为了概括这个问题,Python中的循环导入怎么办?

What will happen if two modules import each other?

To generalize the problem, what about the cyclic imports in Python?


回答 0

去年在comp.lang.python上对此进行了非常好的讨论。它相当彻底地回答了您的问题。

导入确实非常简单。只要记住以下几点:

‘import’和’from xxx import yyy’是可执行语句。它们在运行的程序到达该行时执行。

如果模块不在sys.modules中,则导入将在sys.modules中创建新的模块条目,然后在模块中执行代码。在执行完成之前,它不会将控制权返回给调用模块。

如果sys.modules中确实存在某个模块,则无论导入是否完成执行,导入都会简单地返回该模块。这就是循环导入可能返回部分为空的模块的原因。

最后,执行脚本在名为__main__的模块中运行,以其自己的名称导入脚本将创建一个与__main__无关的新模块。

放在一起,导入模块时就不会感到惊讶。

There was a really good discussion on this over at comp.lang.python last year. It answers your question pretty thoroughly.

Imports are pretty straightforward really. Just remember the following:

‘import’ and ‘from xxx import yyy’ are executable statements. They execute when the running program reaches that line.

If a module is not in sys.modules, then an import creates the new module entry in sys.modules and then executes the code in the module. It does not return control to the calling module until the execution has completed.

If a module does exist in sys.modules then an import simply returns that module whether or not it has completed executing. That is the reason why cyclic imports may return modules which appear to be partly empty.

Finally, the executing script runs in a module named __main__, importing the script under its own name will create a new module unrelated to __main__.

Take that lot together and you shouldn’t get any surprises when importing modules.


回答 1

如果在import foo内部barimport bar内部进行操作foo,它将正常工作。到实际运行任何东西时,两个模块都将完全加载,并且将相互引用。

问题是当您改为from foo import abc和时from bar import xyz。因为现在每个模块都需要先导入另一个模块(这样才能导入我们要导入的名称),然后才能导入它。

If you do import foo inside bar and import bar inside foo, it will work fine. By the time anything actually runs, both modules will be fully loaded and will have references to each other.

The problem is when instead you do from foo import abc and from bar import xyz. Because now each module requires the other module to already be imported (so that the name we are importing exists) before it can be imported.


回答 2

循环导入会终止,但是您需要注意不要在模块初始化期间使用循环导入的模块。

考虑以下文件:

a.py:

print "a in"
import sys
print "b imported: %s" % ("b" in sys.modules, )
import b
print "a out"

b.py:

print "b in"
import a
print "b out"
x = 3

如果执行a.py,您将获得以下信息:

$ python a.py
a in
b imported: False
b in
a in
b imported: True
a out
b out
a out

在第二次导入b.py时(在第二个中a in),Python解释器不会b再次导入,因为它已经存在于模块dict中。

如果您尝试在模块初始化期间b.x从中进行访问a,则会显示AttributeError

将以下行添加到a.py

print b.x

然后,输出为:

$ python a.py
a in                    
b imported: False
b in
a in
b imported: True
a out
Traceback (most recent call last):
  File "a.py", line 4, in <module>
    import b
  File "/home/shlomme/tmp/x/b.py", line 2, in <module>
    import a
 File "/home/shlomme/tmp/x/a.py", line 7, in <module>
    print b.x
AttributeError: 'module' object has no attribute 'x'

这是因为模块是在导入时执行的,并且在b.x访问时间时,该行x = 3尚未执行,只有在之后才执行b out

Cyclic imports terminate, but you need to be careful not to use the cyclically-imported modules during module initialization.

Consider the following files:

a.py:

print "a in"
import sys
print "b imported: %s" % ("b" in sys.modules, )
import b
print "a out"

b.py:

print "b in"
import a
print "b out"
x = 3

If you execute a.py, you’ll get the following:

$ python a.py
a in
b imported: False
b in
a in
b imported: True
a out
b out
a out

On the second import of b.py (in the second a in), the Python interpreter does not import b again, because it already exists in the module dict.

If you try to access b.x from a during module initialization, you will get an AttributeError.

Append the following line to a.py:

print b.x

Then, the output is:

$ python a.py
a in                    
b imported: False
b in
a in
b imported: True
a out
Traceback (most recent call last):
  File "a.py", line 4, in <module>
    import b
  File "/home/shlomme/tmp/x/b.py", line 2, in <module>
    import a
 File "/home/shlomme/tmp/x/a.py", line 7, in <module>
    print b.x
AttributeError: 'module' object has no attribute 'x'

This is because modules are executed on import and at the time b.x is accessed, the line x = 3 has not be executed yet, which will only happen after b out.


回答 3

正如其他答案所描述的那样,这种模式在python中是可以接受的:

def dostuff(self):
     from foo import bar
     ...

当其他模块导入文件时,这将避免执行import语句。仅当存在逻辑循环依赖关系时,这才会失败。

大多数循环导入实际上不是逻辑循环导入,而是引发ImportError错误,因为import()调用时会评估整个文件的顶级语句。

ImportErrors如果您确实希望进口货物居于首位,则几乎可以避免这些情况

考虑以下循环导入:

应用程式A

# profiles/serializers.py

from images.serializers import SimplifiedImageSerializer

class SimplifiedProfileSerializer(serializers.Serializer):
    name = serializers.CharField()

class ProfileSerializer(SimplifiedProfileSerializer):
    recent_images = SimplifiedImageSerializer(many=True)

应用程式B

# images/serializers.py

from profiles.serializers import SimplifiedProfileSerializer

class SimplifiedImageSerializer(serializers.Serializer):
    title = serializers.CharField()

class ImageSerializer(SimplifiedImageSerializer):
    profile = SimplifiedProfileSerializer()

大卫·比兹利(David Beazleys)精彩演讲模块和软件包:活着,让自己死!-PyCon 20151:54:00这是在python中处理循环导入的一种方法:

try:
    from images.serializers import SimplifiedImageSerializer
except ImportError:
    import sys
    SimplifiedImageSerializer = sys.modules[__package__ + '.SimplifiedImageSerializer']

这会尝试导入SimplifiedImageSerializer,如果ImportError引发,因为已经被导入,它将从importcache中将其拉出。

PS:您必须以David Beazley的声音阅读整篇文章。

As other answers describe this pattern is acceptable in python:

def dostuff(self):
     from foo import bar
     ...

Which will avoid the execution of the import statement when the file is imported by other modules. Only if there is a logical circular dependency, this will fail.

Most Circular Imports are not actually logical circular imports but rather raise ImportError errors, because of the way import() evaluates top level statements of the entire file when called.

These ImportErrors can almost always be avoided if you positively want your imports on top:

Consider this circular import:

App A

# profiles/serializers.py

from images.serializers import SimplifiedImageSerializer

class SimplifiedProfileSerializer(serializers.Serializer):
    name = serializers.CharField()

class ProfileSerializer(SimplifiedProfileSerializer):
    recent_images = SimplifiedImageSerializer(many=True)

App B

# images/serializers.py

from profiles.serializers import SimplifiedProfileSerializer

class SimplifiedImageSerializer(serializers.Serializer):
    title = serializers.CharField()

class ImageSerializer(SimplifiedImageSerializer):
    profile = SimplifiedProfileSerializer()

From David Beazleys excellent talk Modules and Packages: Live and Let Die! – PyCon 2015, 1:54:00, here is a way to deal with circular imports in python:

try:
    from images.serializers import SimplifiedImageSerializer
except ImportError:
    import sys
    SimplifiedImageSerializer = sys.modules[__package__ + '.SimplifiedImageSerializer']

This tries to import SimplifiedImageSerializer and if ImportError is raised, because it already is imported, it will pull it from the importcache.

PS: You have to read this entire post in David Beazley’s voice.


回答 4

我这里有一个让我印象深刻的例子!

foo.py

import bar

class gX(object):
    g = 10

bar.py

from foo import gX

o = gX()

main.py

import foo
import bar

print "all done"

在命令行中: $ python main.py

Traceback (most recent call last):
  File "m.py", line 1, in <module>
    import foo
  File "/home/xolve/foo.py", line 1, in <module>
    import bar
  File "/home/xolve/bar.py", line 1, in <module>
    from foo import gX
ImportError: cannot import name gX

I got an example here that struck me!

foo.py

import bar

class gX(object):
    g = 10

bar.py

from foo import gX

o = gX()

main.py

import foo
import bar

print "all done"

At the command line: $ python main.py

Traceback (most recent call last):
  File "m.py", line 1, in <module>
    import foo
  File "/home/xolve/foo.py", line 1, in <module>
    import bar
  File "/home/xolve/bar.py", line 1, in <module>
    from foo import gX
ImportError: cannot import name gX

回答 5

模块a.py:

import b
print("This is from module a")

模块b.py

import a
print("This is from module b")

运行“模块a”将输出:

>>> 
'This is from module a'
'This is from module b'
'This is from module a'
>>> 

它输出这3行,而由于循环导入而应该输出无穷大。下面列出了在运行“模块a”时逐行发生的情况:

  1. 第一行是 import b。因此它将访问模块b
  2. 模块b的第一行是 import a。因此它将访问模块a
  3. 模块a的第一行是,import b但是请注意,此行将不再执行,因为python中的每个文件仅执行一次导入行,因此无论在何时何地执行都无关紧要。因此它将传递到下一行并打印"This is from module a"
  4. 从模块b访问完整个模块a后,我们仍在模块b中。所以下一行会打印"This is from module b"
  5. 模块b行完全执行。因此,我们将回到模块b的起始模块a。
  6. import b行已经执行,将不再执行。下一行将打印"This is from module a",程序将完成。

Module a.py :

import b
print("This is from module a")

Module b.py

import a
print("This is from module b")

Running “Module a” will output:

>>> 
'This is from module a'
'This is from module b'
'This is from module a'
>>> 

It output this 3 lines while it was supposed to output infinitival because of circular importing. What happens line by line while running”Module a” is listed here:

  1. The first line is import b. so it will visit module b
  2. The first line at module b is import a. so it will visit module a
  3. The first line at module a is import b but note that this line won’t be executed again anymore, because every file in python execute an import line just for once, it does not matter where or when it is executed. so it will pass to the next line and print "This is from module a".
  4. After finish visiting whole module a from module b, we are still at module b. so the next line will print "This is from module b"
  5. Module b lines are executed completely. so we will go back to module a where we started module b.
  6. import b line have been executed already and won’t be executed again. the next line will print "This is from module a" and program will be finished.

回答 6

我完全同意pythoneer的回答。但是我偶然发现了一些有循环导入缺陷的代码,并在尝试添加单元测试时引起了问题。因此,在不做任何更改的情况下快速修补它,可以通过动态导入解决问题。

# Hack to import something without circular import issue
def load_module(name):
    """Load module using imp.find_module"""
    names = name.split(".")
    path = None
    for name in names:
        f, path, info = imp.find_module(name, path)
        path = [path]
    return imp.load_module(name, f, path[0], info)
constants = load_module("app.constants")

同样,这不是永久性的修复,但是可以帮助想要修复导入错误而无需更改太多代码的人。

干杯!

I completely agree with pythoneer’s answer here. But I have stumbled on some code that was flawed with circular imports and caused issues when trying to add unit tests. So to quickly patch it without changing everything you can resolve the issue by doing a dynamic import.

# Hack to import something without circular import issue
def load_module(name):
    """Load module using imp.find_module"""
    names = name.split(".")
    path = None
    for name in names:
        f, path, info = imp.find_module(name, path)
        path = [path]
    return imp.load_module(name, f, path[0], info)
constants = load_module("app.constants")

Again, this isn’t a permanent fix but may help someone that wants to fix an import error without changing too much of the code.

Cheers!


回答 7

这里有很多很好的答案。尽管通常可以快速解决问题,但有些解决方案比其他解决方案更具有Python风格,但如果您愿意进行一些重构,则另一种方法是分析代码的组织,并尝试消除循环依赖。例如,您可能会发现您具有:

档案a.py

from b import B

class A:
    @staticmethod
    def save_result(result):
        print('save the result')

    @staticmethod
    def do_something_a_ish(param):
        A.save_result(A.use_param_like_a_would(param))

    @staticmethod
    def do_something_related_to_b(param):
        B.do_something_b_ish(param)

文件b.py

from a import A

class B:
    @staticmethod
    def do_something_b_ish(param):
        A.save_result(B.use_param_like_b_would(param))

在这种情况下,只需将一个静态方法移至单独的文件,例如c.py

文件c.py

def save_result(result):
    print('save the result')

将允许save_result从A中删除方法,从而允许从b中的a中删除A的导入:

重构文件a.py

from b import B
from c import save_result

class A:
    @staticmethod
    def do_something_a_ish(param):
        A.save_result(A.use_param_like_a_would(param))

    @staticmethod
    def do_something_related_to_b(param):
        B.do_something_b_ish(param)

重构文件b.py

from c import save_result

class B:
    @staticmethod
    def do_something_b_ish(param):
        save_result(B.use_param_like_b_would(param))

总而言之,如果您有一个报告静态方法的工具(例如pylint或PyCharm),则仅在其上添加staticmethod修饰符可能不是使警告消失的最佳方法。即使该方法似乎与该类有关,也最好将其分开,尤其是当您有几个紧密相关的模块可能需要相同的功能并且打算实践DRY原理时。

There are a lot of great answers here. While there are usually quick solutions to the problem, some of which feel more pythonic than others, if you have the luxury of doing some refactoring, another approach is to analyze the organization of your code, and try to remove the circular dependency. You may find, for example, that you have:

File a.py

from b import B

class A:
    @staticmethod
    def save_result(result):
        print('save the result')

    @staticmethod
    def do_something_a_ish(param):
        A.save_result(A.use_param_like_a_would(param))

    @staticmethod
    def do_something_related_to_b(param):
        B.do_something_b_ish(param)

File b.py

from a import A

class B:
    @staticmethod
    def do_something_b_ish(param):
        A.save_result(B.use_param_like_b_would(param))

In this case, just moving one static method to a separate file, say c.py:

File c.py

def save_result(result):
    print('save the result')

will allow removing the save_result method from A, and thus allow removing the import of A from a in b:

Refactored File a.py

from b import B
from c import save_result

class A:
    @staticmethod
    def do_something_a_ish(param):
        A.save_result(A.use_param_like_a_would(param))

    @staticmethod
    def do_something_related_to_b(param):
        B.do_something_b_ish(param)

Refactored File b.py

from c import save_result

class B:
    @staticmethod
    def do_something_b_ish(param):
        save_result(B.use_param_like_b_would(param))

In summary, if you have a tool (e.g. pylint or PyCharm) that reports on methods that can be static, just throwing a staticmethod decorator on them might not be the best way to silence the warning. Even though the method seems related to the class, it might be better to separate it out, especially if you have several closely related modules that might need the same functionality and you intend to practice DRY principles.


回答 8

循环导入可能会造成混淆,因为导入有两件事:

  1. 它执行导入的模块代码
  2. 将导入模块添加到导入模块全局符号表中

前者仅执行一次,而后者在每个import语句中执行。当导入模块使用部分执行代码的已导入模块时,循环导入会产生情况。因此,它将不会看到import语句之后创建的对象。下面的代码示例对此进行了演示。

循环进口并不是不惜一切代价避免的最终罪恶。在诸如Flask之类的某些框架中,它们是很自然的,调整您的代码以消除它们并不能使代码变得更好。

main.py

print 'import b'
import b
print 'a in globals() {}'.format('a' in globals())
print 'import a'
import a
print 'a in globals() {}'.format('a' in globals())
if __name__ == '__main__':
    print 'imports done'
    print 'b has y {}, a is b.a {}'.format(hasattr(b, 'y'), a is b.a)

b.by

print "b in, __name__ = {}".format(__name__)
x = 3
print 'b imports a'
import a
y = 5
print "b out"

py

print 'a in, __name__ = {}'.format(__name__)
print 'a imports b'
import b
print 'b has x {}'.format(hasattr(b, 'x'))
print 'b has y {}'.format(hasattr(b, 'y'))
print "a out"

带有注释的python main.py输出

import b
b in, __name__ = b    # b code execution started
b imports a
a in, __name__ = a    # a code execution started
a imports b           # b code execution is already in progress
b has x True
b has y False         # b defines y after a import,
a out
b out
a in globals() False  # import only adds a to main global symbol table 
import a
a in globals() True
imports done
b has y True, a is b.a True # all b objects are available

Circular imports can be confusing because import does two things:

  1. it executes imported module code
  2. adds imported module to importing module global symbol table

The former is done only once, while the latter at each import statement. Circular import creates situation when importing module uses imported one with partially executed code. In consequence it will not see objects created after import statement. Below code sample demonstrates it.

Circular imports are not the ultimate evil to be avoided at all cost. In some frameworks like Flask they are quite natural and tweaking your code to eliminate them does not make the code better.

main.py

print 'import b'
import b
print 'a in globals() {}'.format('a' in globals())
print 'import a'
import a
print 'a in globals() {}'.format('a' in globals())
if __name__ == '__main__':
    print 'imports done'
    print 'b has y {}, a is b.a {}'.format(hasattr(b, 'y'), a is b.a)

b.by

print "b in, __name__ = {}".format(__name__)
x = 3
print 'b imports a'
import a
y = 5
print "b out"

a.py

print 'a in, __name__ = {}'.format(__name__)
print 'a imports b'
import b
print 'b has x {}'.format(hasattr(b, 'x'))
print 'b has y {}'.format(hasattr(b, 'y'))
print "a out"

python main.py output with comments

import b
b in, __name__ = b    # b code execution started
b imports a
a in, __name__ = a    # a code execution started
a imports b           # b code execution is already in progress
b has x True
b has y False         # b defines y after a import,
a out
b out
a in globals() False  # import only adds a to main global symbol table 
import a
a in globals() True
imports done
b has y True, a is b.a True # all b objects are available

回答 9

我通过以下方式解决了问题,并且工作正常,没有任何错误。考虑两个文件a.pyb.py

我添加到a.py它,它的工作。

if __name__ == "__main__":
        main ()

a.py:

import b
y = 2
def main():
    print ("a out")
    print (b.x)

if __name__ == "__main__":
    main ()

b.py:

import a
print ("b out")
x = 3 + a.y

我得到的输出是

>>> b out 
>>> a out 
>>> 5

I solved the problem the following way, and it works well without any error. Consider two files a.py and b.py.

I added this to a.py and it worked.

if __name__ == "__main__":
        main ()

a.py:

import b
y = 2
def main():
    print ("a out")
    print (b.x)

if __name__ == "__main__":
    main ()

b.py:

import a
print ("b out")
x = 3 + a.y

The output I get is

>>> b out 
>>> a out 
>>> 5

回答 10

好的,我想我有一个很酷的解决方案。假设您有file a和file b。你有一个def或一个class文件b要在模块来使用a,但你有别的东西,无论是一个defclass或者从文件变量a您在文件中定义或类需要b。您可以做的是,在文件底部a,在调用文件a中所需的文件中b的函数或类之后,但是在从文件b中调用所需的文件中调用函数或类之前a,说import b 然后,这是关键部分,在文件b中所有需要the defor classfrom file 的定义或类中a(叫它CLASS),你说from a import CLASS

之所以可行,是因为您可以导入文件b而无需Python在file中执行任何import语句b,因此可以避免任何循环导入。

例如:

档案a:

class A(object):

     def __init__(self, name):

         self.name = name

CLASS = A("me")

import b

go = B(6)

go.dostuff

文件b:

class B(object):

     def __init__(self, number):

         self.number = number

     def dostuff(self):

         from a import CLASS

         print "Hello " + CLASS.name + ", " + str(number) + " is an interesting number."

Ok, I think I have a pretty cool solution. Let’s say you have file a and file b. You have a def or a class in file b that you want to use in module a, but you have something else, either a def, class, or variable from file a that you need in your definition or class in file b. What you can do is, at the bottom of file a, after calling the function or class in file a that is needed in file b, but before calling the function or class from file b that you need for file a, say import b Then, and here is the key part, in all of the definitions or classes in file b that need the def or class from file a (let’s call it CLASS), you say from a import CLASS

This works because you can import file b without Python executing any of the import statements in file b, and thus you elude any circular imports.

For example:

File a:

class A(object):

     def __init__(self, name):

         self.name = name

CLASS = A("me")

import b

go = B(6)

go.dostuff

File b:

class B(object):

     def __init__(self, number):

         self.number = number

     def dostuff(self):

         from a import CLASS

         print "Hello " + CLASS.name + ", " + str(number) + " is an interesting number."

Voila.


什么是logits,softmax和softmax_cross_entropy_with_logits?

问题:什么是logits,softmax和softmax_cross_entropy_with_logits?

我正在这里浏览tensorflow API文档。在tensorflow文档中,他们使用了名为的关键字logits。它是什么?API文档中的许多方法都将其编写为

tf.nn.softmax(logits, name=None)

如果写的是什么是那些logitsTensors,为什么保持一个不同的名称,如logits

另一件事是,我无法区分两种方法。他们是

tf.nn.softmax(logits, name=None)
tf.nn.softmax_cross_entropy_with_logits(logits, labels, name=None)

它们之间有什么区别?这些文档对我来说还不清楚。我知道是什么tf.nn.softmax呢。但是没有其他。一个例子将非常有帮助。

I was going through the tensorflow API docs here. In the tensorflow documentation, they used a keyword called logits. What is it? In a lot of methods in the API docs it is written like

tf.nn.softmax(logits, name=None)

If what is written is those logits are only Tensors, why keeping a different name like logits?

Another thing is that there are two methods I could not differentiate. They were

tf.nn.softmax(logits, name=None)
tf.nn.softmax_cross_entropy_with_logits(logits, labels, name=None)

What are the differences between them? The docs are not clear to me. I know what tf.nn.softmax does. But not the other. An example will be really helpful.


回答 0

Logits只是意味着函数在较早的图层的未缩放输出上运行,并且理解单位的相对缩放是线性的。特别是,这意味着输入的总和可能不等于1,这意味着值不是概率(输入可能为5)。

tf.nn.softmax仅产生将softmax函数应用于输入张量的结果。softmax“压缩”输入,以便sum(input) = 1:这是一种规范化方法。softmax的输出形状与输入相同:它只是将值标准化。softmax的输出可以解释为概率。

a = tf.constant(np.array([[.1, .3, .5, .9]]))
print s.run(tf.nn.softmax(a))
[[ 0.16838508  0.205666    0.25120102  0.37474789]]

相比之下,tf.nn.softmax_cross_entropy_with_logits在应用softmax函数之后计算结果的交叉熵(但以数学上更仔细的方式将其全部合并在一起)。它类似于以下结果:

sm = tf.nn.softmax(x)
ce = cross_entropy(sm)

交叉熵是一个汇总指标:跨元素求和。tf.nn.softmax_cross_entropy_with_logits形状[2,5]张量的输出是一定形状的[2,1](将第一维视为批处理)。

如果要进行优化以最小化交叉熵,并且要在最后一层之后进行软最大化,则应使用tf.nn.softmax_cross_entropy_with_logits而不是自己进行处理,因为它以数学上正确的方式涵盖了数值不稳定的拐角情况。否则,您最终会在这里和那里添加少量epsilon,从而对其进行破解。

编辑于2016-02-07: 如果您具有单类标签,而一个对象只能属于一个类,则现在可以考虑使用tf.nn.sparse_softmax_cross_entropy_with_logits,这样就不必将标签转换为密集的一键热阵列。在0.6.0版本之后添加了此功能。

Logits simply means that the function operates on the unscaled output of earlier layers and that the relative scale to understand the units is linear. It means, in particular, the sum of the inputs may not equal 1, that the values are not probabilities (you might have an input of 5).

tf.nn.softmax produces just the result of applying the softmax function to an input tensor. The softmax “squishes” the inputs so that sum(input) = 1: it’s a way of normalizing. The shape of output of a softmax is the same as the input: it just normalizes the values. The outputs of softmax can be interpreted as probabilities.

a = tf.constant(np.array([[.1, .3, .5, .9]]))
print s.run(tf.nn.softmax(a))
[[ 0.16838508  0.205666    0.25120102  0.37474789]]

In contrast, tf.nn.softmax_cross_entropy_with_logits computes the cross entropy of the result after applying the softmax function (but it does it all together in a more mathematically careful way). It’s similar to the result of:

sm = tf.nn.softmax(x)
ce = cross_entropy(sm)

The cross entropy is a summary metric: it sums across the elements. The output of tf.nn.softmax_cross_entropy_with_logits on a shape [2,5] tensor is of shape [2,1] (the first dimension is treated as the batch).

If you want to do optimization to minimize the cross entropy AND you’re softmaxing after your last layer, you should use tf.nn.softmax_cross_entropy_with_logits instead of doing it yourself, because it covers numerically unstable corner cases in the mathematically right way. Otherwise, you’ll end up hacking it by adding little epsilons here and there.

Edited 2016-02-07: If you have single-class labels, where an object can only belong to one class, you might now consider using tf.nn.sparse_softmax_cross_entropy_with_logits so that you don’t have to convert your labels to a dense one-hot array. This function was added after release 0.6.0.


回答 1

简洁版本:

假设您有两个张量,其中y_hat包含每个类的计算得分(例如,来自y = W * x + b),并且y_true包含一个热编码的真实标签。

y_hat  = ... # Predicted label, e.g. y = tf.matmul(X, W) + b
y_true = ... # True label, one-hot encoded

如果将分数解释y_hat为未归一化的对数概率,则它们为logits

此外,以这种方式计算的总交叉熵损失为:

y_hat_softmax = tf.nn.softmax(y_hat)
total_loss = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), [1]))

基本上等于用函数计算的总交叉熵损失softmax_cross_entropy_with_logits()

total_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))

长版:

在神经网络的输出层中,您可能会计算一个数组,其中包含每个训练实例的类分数,例如来自计算y_hat = W*x + b。作为示例,我在下面创建了y_hat一个2 x 3的数组,其中行对应于训练实例,列对应于类。因此,这里有2个训练实例和3个类。

import tensorflow as tf
import numpy as np

sess = tf.Session()

# Create example y_hat.
y_hat = tf.convert_to_tensor(np.array([[0.5, 1.5, 0.1],[2.2, 1.3, 1.7]]))
sess.run(y_hat)
# array([[ 0.5,  1.5,  0.1],
#        [ 2.2,  1.3,  1.7]])

请注意,这些值未规范化(即,各行的总和不等于1)。为了对其进行归一化,我们可以应用softmax函数,该函数将输入解释为归一化的对数概率(aka logits),并输出归一化的线性概率。

y_hat_softmax = tf.nn.softmax(y_hat)
sess.run(y_hat_softmax)
# array([[ 0.227863  ,  0.61939586,  0.15274114],
#        [ 0.49674623,  0.20196195,  0.30129182]])

完全了解softmax输出在说什么很重要。下面我显示了一个表格,可以更清楚地表示上面的输出。可以看出,例如,训练实例1为“等级2”的概率为0.619。每个训练实例的类概率均已标准化,因此每行的总和为1.0。

                      Pr(Class 1)  Pr(Class 2)  Pr(Class 3)
                    ,--------------------------------------
Training instance 1 | 0.227863   | 0.61939586 | 0.15274114
Training instance 2 | 0.49674623 | 0.20196195 | 0.30129182

因此,现在我们有了每个训练实例的类概率,在这里我们可以采用每一行的argmax()来生成最终分类。从上面可以生成训练实例1属于“类别2”,训练实例2属于“类别1”。

这些分类正确吗?我们需要根据训练集中的真实标签进行衡量。您将需要一个一次性编码的y_true数组,其中行又是训练实例,列又是类。下面,我创建了y_true一个单热点数组示例,其中训练实例1的真实标签为“ Class 2”,训练实例2的真实标签为“ Class 3”。

y_true = tf.convert_to_tensor(np.array([[0.0, 1.0, 0.0],[0.0, 0.0, 1.0]]))
sess.run(y_true)
# array([[ 0.,  1.,  0.],
#        [ 0.,  0.,  1.]])

概率分布是否y_hat_softmax接近的概率分布y_true?我们可以使用交叉熵损失来测量误差。

交叉熵损失的公式

我们可以逐行计算交叉熵损失并查看结果。在下面我们可以看到训练实例1的损失为0.479,而训练实例2的损失为1.200。该结果之所以有意义,是因为在上面的示例中y_hat_softmax,训练实例1的最高机率是“类别2”,它与中的训练实例1相匹配y_true;但是,针对训练实例2的预测显示出“类别1”的最高概率,这与真实的类别“类别3”不匹配。

loss_per_instance_1 = -tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1])
sess.run(loss_per_instance_1)
# array([ 0.4790107 ,  1.19967598])

我们真正想要的是所有训练实例的总损失。因此我们可以计算:

total_loss_1 = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1]))
sess.run(total_loss_1)
# 0.83934333897877944

使用softmax_cross_entropy_with_logits()

相反,我们可以使用tf.nn.softmax_cross_entropy_with_logits()函数来计算总交叉熵损失,如下所示。

loss_per_instance_2 = tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true)
sess.run(loss_per_instance_2)
# array([ 0.4790107 ,  1.19967598])

total_loss_2 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))
sess.run(total_loss_2)
# 0.83934333897877922

请注意,total_loss_1total_loss_2在非常最后的数字有些小的差异产生几乎相同的结果。但是,您也可以使用第二种方法:它减少了一行代码,并减少了数值误差,因为softmax是在中完成的softmax_cross_entropy_with_logits()

Short version:

Suppose you have two tensors, where y_hat contains computed scores for each class (for example, from y = W*x +b) and y_true contains one-hot encoded true labels.

y_hat  = ... # Predicted label, e.g. y = tf.matmul(X, W) + b
y_true = ... # True label, one-hot encoded

If you interpret the scores in y_hat as unnormalized log probabilities, then they are logits.

Additionally, the total cross-entropy loss computed in this manner:

y_hat_softmax = tf.nn.softmax(y_hat)
total_loss = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), [1]))

is essentially equivalent to the total cross-entropy loss computed with the function softmax_cross_entropy_with_logits():

total_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))

Long version:

In the output layer of your neural network, you will probably compute an array that contains the class scores for each of your training instances, such as from a computation y_hat = W*x + b. To serve as an example, below I’ve created a y_hat as a 2 x 3 array, where the rows correspond to the training instances and the columns correspond to classes. So here there are 2 training instances and 3 classes.

import tensorflow as tf
import numpy as np

sess = tf.Session()

# Create example y_hat.
y_hat = tf.convert_to_tensor(np.array([[0.5, 1.5, 0.1],[2.2, 1.3, 1.7]]))
sess.run(y_hat)
# array([[ 0.5,  1.5,  0.1],
#        [ 2.2,  1.3,  1.7]])

Note that the values are not normalized (i.e. the rows don’t add up to 1). In order to normalize them, we can apply the softmax function, which interprets the input as unnormalized log probabilities (aka logits) and outputs normalized linear probabilities.

y_hat_softmax = tf.nn.softmax(y_hat)
sess.run(y_hat_softmax)
# array([[ 0.227863  ,  0.61939586,  0.15274114],
#        [ 0.49674623,  0.20196195,  0.30129182]])

It’s important to fully understand what the softmax output is saying. Below I’ve shown a table that more clearly represents the output above. It can be seen that, for example, the probability of training instance 1 being “Class 2” is 0.619. The class probabilities for each training instance are normalized, so the sum of each row is 1.0.

                      Pr(Class 1)  Pr(Class 2)  Pr(Class 3)
                    ,--------------------------------------
Training instance 1 | 0.227863   | 0.61939586 | 0.15274114
Training instance 2 | 0.49674623 | 0.20196195 | 0.30129182

So now we have class probabilities for each training instance, where we can take the argmax() of each row to generate a final classification. From above, we may generate that training instance 1 belongs to “Class 2” and training instance 2 belongs to “Class 1”.

Are these classifications correct? We need to measure against the true labels from the training set. You will need a one-hot encoded y_true array, where again the rows are training instances and columns are classes. Below I’ve created an example y_true one-hot array where the true label for training instance 1 is “Class 2” and the true label for training instance 2 is “Class 3”.

y_true = tf.convert_to_tensor(np.array([[0.0, 1.0, 0.0],[0.0, 0.0, 1.0]]))
sess.run(y_true)
# array([[ 0.,  1.,  0.],
#        [ 0.,  0.,  1.]])

Is the probability distribution in y_hat_softmax close to the probability distribution in y_true? We can use cross-entropy loss to measure the error.

Formula for cross-entropy loss

We can compute the cross-entropy loss on a row-wise basis and see the results. Below we can see that training instance 1 has a loss of 0.479, while training instance 2 has a higher loss of 1.200. This result makes sense because in our example above, y_hat_softmax showed that training instance 1’s highest probability was for “Class 2”, which matches training instance 1 in y_true; however, the prediction for training instance 2 showed a highest probability for “Class 1”, which does not match the true class “Class 3”.

loss_per_instance_1 = -tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1])
sess.run(loss_per_instance_1)
# array([ 0.4790107 ,  1.19967598])

What we really want is the total loss over all the training instances. So we can compute:

total_loss_1 = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1]))
sess.run(total_loss_1)
# 0.83934333897877944

Using softmax_cross_entropy_with_logits()

We can instead compute the total cross entropy loss using the tf.nn.softmax_cross_entropy_with_logits() function, as shown below.

loss_per_instance_2 = tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true)
sess.run(loss_per_instance_2)
# array([ 0.4790107 ,  1.19967598])

total_loss_2 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))
sess.run(total_loss_2)
# 0.83934333897877922

Note that total_loss_1 and total_loss_2 produce essentially equivalent results with some small differences in the very final digits. However, you might as well use the second approach: it takes one less line of code and accumulates less numerical error because the softmax is done for you inside of softmax_cross_entropy_with_logits().


回答 2

tf.nn.softmax计算通过softmax层的前向传播。计算模型输出的概率时,可以在评估模型时使用它。

tf.nn.softmax_cross_entropy_with_logits计算softmax层的成本。仅在训练期间使用。

logits是模型输出的未归一化对数概率(将softmax归一化之前对它们输出的值)。

tf.nn.softmax computes the forward propagation through a softmax layer. You use it during evaluation of the model when you compute the probabilities that the model outputs.

tf.nn.softmax_cross_entropy_with_logits computes the cost for a softmax layer. It is only used during training.

The logits are the unnormalized log probabilities output the model (the values output before the softmax normalization is applied to them).


回答 3

以上答案对所提问题有足够的描述。

除此之外,Tensorflow还优化了应用激活函数的操作,然后使用其自身的激活以及成本函数来计算成本。因此,它是一个很好的做法,使用:tf.nn.softmax_cross_entropy()tf.nn.softmax(); tf.nn.cross_entropy()

您可以在资源密集型模型中找到它们之间的显着差异。

Above answers have enough description for the asked question.

Adding to that, Tensorflow has optimised the operation of applying the activation function then calculating cost using its own activation followed by cost functions. Hence it is a good practice to use: tf.nn.softmax_cross_entropy() over tf.nn.softmax(); tf.nn.cross_entropy()

You can find prominent difference between them in a resource intensive model.


回答 4

曾经发生过的softmax就是logit,这就是J. Hinton一直在Coursera视频中重复的内容。

What ever goes to softmax is logit, this is what J. Hinton repeats in coursera videos all the time.


回答 5

Tensorflow 2.0兼容答案:的解释dga,并stackoverflowuser2010有很详细的关于Logits和相关的功能。

所有这些功能Tensorflow 1.x都可以正常使用,但是如果您从1.x (1.14, 1.15, etc)2.x (2.0, 2.1, etc..),则使用这些功能会导致错误。

因此,如果我们从迁移,请为上面讨论的所有功能指定2.0兼容的调用。 1.x to 2.x为社区的利益。

1.x中的功能

  1. tf.nn.softmax
  2. tf.nn.softmax_cross_entropy_with_logits
  3. tf.nn.sparse_softmax_cross_entropy_with_logits

从1.x迁移到2.x的相应功能

  1. tf.compat.v2.nn.softmax
  2. tf.compat.v2.nn.softmax_cross_entropy_with_logits
  3. tf.compat.v2.nn.sparse_softmax_cross_entropy_with_logits

有关从1.x迁移到2.x的更多信息,请参阅此迁移指南

Tensorflow 2.0 Compatible Answer: The explanations of dga and stackoverflowuser2010 are very detailed about Logits and the related Functions.

All those functions, when used in Tensorflow 1.x will work fine, but if you migrate your code from 1.x (1.14, 1.15, etc) to 2.x (2.0, 2.1, etc..), using those functions result in error.

Hence, specifying the 2.0 Compatible Calls for all the functions, we discussed above, if we migrate from 1.x to 2.x, for the benefit of the community.

Functions in 1.x:

  1. tf.nn.softmax
  2. tf.nn.softmax_cross_entropy_with_logits
  3. tf.nn.sparse_softmax_cross_entropy_with_logits

Respective Functions when Migrated from 1.x to 2.x:

  1. tf.compat.v2.nn.softmax
  2. tf.compat.v2.nn.softmax_cross_entropy_with_logits
  3. tf.compat.v2.nn.sparse_softmax_cross_entropy_with_logits

For more information about migration from 1.x to 2.x, please refer this Migration Guide.


回答 6

我肯定要强调的一件事是logit仅仅是原始输出,通常是最后一层的输出。这也可以是负值。如果我们将其用于“交叉熵”评估,如下所述:

-tf.reduce_sum(y_true * tf.log(logits))

那就行不通了。由于-ve的日志未定义。因此,使用o softmax激活将克服此问题。

这是我的理解,如果我错了,请纠正我。

One more thing that I would definitely like to highlight as logit is just a raw output, generally the output of last layer. This can be a negative value as well. If we use it as it’s for “cross entropy” evaluation as mentioned below:

-tf.reduce_sum(y_true * tf.log(logits))

then it wont work. As log of -ve is not defined. So using o softmax activation, will overcome this problem.

This is my understanding, please correct me if Im wrong.


如何使用python找出我的python路径?

问题:如何使用python找出我的python路径?

如何PYTHONPATH从Python脚本(或交互式外壳程序)中找出系统变量中列出了哪些目录?

How do I find out which directories are listed in my system’s PYTHONPATH variable, from within a Python script (or the interactive shell)?


回答 0

sys.path可能包括不是您的PYTHONPATH环境变量中特定的项目。要直接查询变量,请使用:

import os
try:
    user_paths = os.environ['PYTHONPATH'].split(os.pathsep)
except KeyError:
    user_paths = []

sys.path might include items that aren’t specifically in your PYTHONPATH environment variable. To query the variable directly, use:

import os
try:
    user_paths = os.environ['PYTHONPATH'].split(os.pathsep)
except KeyError:
    user_paths = []

回答 1

您可能还希望这样:

import sys
print(sys.path)

或者作为从终端的一行代码:

python -c "import sys; print('\n'.join(sys.path))"

注意:如果您安装了多个版本的Python,则应使用相应的命令python2python3

You would probably also want this:

import sys
print(sys.path)

Or as a one liner from the terminal:

python -c "import sys; print('\n'.join(sys.path))"

Caveat: If you have multiple versions of Python installed you should use a corresponding command python2 or python3.


回答 2

似乎无法编辑其他答案。有一个小错误,因为它仅适用于Windows。更通用的解决方案是使用os.sep,如下所示:

sys.path可能包含不是您的PYTHONPATH环境变量中特定的项目。要直接查询变量,请使用:

import os
os.environ['PYTHONPATH'].split(os.pathsep)

Can’t seem to edit the other answer. Has a minor error in that it is Windows-only. The more generic solution is to use os.sep as below:

sys.path might include items that aren’t specifically in your PYTHONPATH environment variable. To query the variable directly, use:

import os
os.environ['PYTHONPATH'].split(os.pathsep)

回答 3

PYTHONPATH是一个环境变量,其值是目录列表。设置后,Python将使用它与其他标准输入一起搜索导入的模块。和Python的“ sys.path”中列出的第三方库目录。

与其他任何环境变量一样,您可以将其导出到shell或〜/ .bashrc中,请参见此处。您可以在python中查询os.environ [‘PYTHONPATH’]的值,如下所示:

$ python3 -c "import os, sys; print(os.environ['PYTHONPATH']); print(sys.path) if 'PYTHONPATH' in sorted(os.environ) else print('PYTHONPATH is not defined')"

如果在shell中将IF定义为

$ export PYTHONPATH=$HOME/Documents/DjangoTutorial/mysite

然后结果=>

/home/Documents/DjangoTutorial/mysite
['', '/home/Documents/DjangoTutorial/mysite', '/usr/local/lib/python37.zip', '/usr/local/lib/python3.7', '/usr/local/lib/python3.7/lib-dynload', '/usr/local/lib/python3.7/site-packages']

ELSE结果=>

PYTHONPATH is not defined

要将PYTHONPATH设置为多个路径,请参见此处

注意,可以在运行时通过sys.path.insert(),del或remove()添加或删除搜索路径,但不能通过os.environ []添加或删除搜索路径。例:

>>> os.environ['PYTHONPATH']="$HOME/Documents/DjangoTutorial/mysite"
>>> 'PYTHONPATH' in sorted(os.environ)
True
>>> sys.path // but Not there
['', '/usr/local/lib/python37.zip', '/usr/local/lib/python3.7', '/usr/local/lib/python3.7/lib-dynload', '/usr/local/lib/python3.7/site-packages']

>>> sys.path.insert(0,os.environ['PYTHONPATH'])
>>> sys.path // It's there
['$HOME/Documents/DjangoTutorial/mysite', '', '/usr/local/lib/python37.zip', '/usr/local/lib/python3.7', '/usr/local/lib/python3.7/lib-dynload', '/usr/local/lib/python3.7/site-packages']
>>> 

总之,PYTHONPATH是在sys.path中为导入的模块指定Python搜索路径的一种方法。您也可以不借助PYTHONPATH将列表操作直接应用于sys.path。

PYTHONPATH is an environment variable whose value is a list of directories. Once set, it is used by Python to search for imported modules, along with other std. and 3rd-party library directories listed in Python’s “sys.path”.

As any other environment variables, you can either export it in shell or in ~/.bashrc, see here. You can query os.environ[‘PYTHONPATH’] for its value in Python as shown below:

$ python3 -c "import os, sys; print(os.environ['PYTHONPATH']); print(sys.path) if 'PYTHONPATH' in sorted(os.environ) else print('PYTHONPATH is not defined')"

IF defined in shell as

$ export PYTHONPATH=$HOME/Documents/DjangoTutorial/mysite

THEN result =>

/home/Documents/DjangoTutorial/mysite
['', '/home/Documents/DjangoTutorial/mysite', '/usr/local/lib/python37.zip', '/usr/local/lib/python3.7', '/usr/local/lib/python3.7/lib-dynload', '/usr/local/lib/python3.7/site-packages']

ELSE result =>

PYTHONPATH is not defined

To set PYTHONPATH to multiple paths, see here.

Note that one can add or delete a search path via sys.path.insert(), del or remove() at run-time, but NOT through os.environ[]. Example:

>>> os.environ['PYTHONPATH']="$HOME/Documents/DjangoTutorial/mysite"
>>> 'PYTHONPATH' in sorted(os.environ)
True
>>> sys.path // but Not there
['', '/usr/local/lib/python37.zip', '/usr/local/lib/python3.7', '/usr/local/lib/python3.7/lib-dynload', '/usr/local/lib/python3.7/site-packages']

>>> sys.path.insert(0,os.environ['PYTHONPATH'])
>>> sys.path // It's there
['$HOME/Documents/DjangoTutorial/mysite', '', '/usr/local/lib/python37.zip', '/usr/local/lib/python3.7', '/usr/local/lib/python3.7/lib-dynload', '/usr/local/lib/python3.7/site-packages']
>>> 

In summary, PYTHONPATH is one way of specifying the Python search path(s) for imported modules in sys.path. You can also apply list operations directly to sys.path without the aid of PYTHONPATH.


回答 4

当它给我一条错误消息时,Python告诉我它住在哪里:)

>>> import os
>>> os.environ['PYTHONPATH'].split(os.pathsep)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\martin\AppData\Local\Programs\Python\Python36-32\lib\os.py", line 669, in __getitem__
    raise KeyError(key) from None
KeyError: 'PYTHONPATH'
>>>

Python tells me where it lives when it gives me an error message :)

>>> import os
>>> os.environ['PYTHONPATH'].split(os.pathsep)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\martin\AppData\Local\Programs\Python\Python36-32\lib\os.py", line 669, in __getitem__
    raise KeyError(key) from None
KeyError: 'PYTHONPATH'
>>>

回答 5

用这个:

import sys
print(sys.executable)

或从cmd一行:

python -c "import sys; print(sys.executable)"

Use this:

import sys
print(sys.executable)

Or one line from the cmd:

python -c "import sys; print(sys.executable)"

如何在Python中获取父目录?

问题:如何在Python中获取父目录?

有人可以告诉我如何以跨平台方式在Python中获取路径的父目录。例如

C:\Program Files ---> C:\

C:\ ---> C:\

如果该目录没有父目录,它将返回目录本身。这个问题看似简单,但我无法通过Google进行深入研究。

Could someone tell me how to get the parent directory of a path in Python in a cross platform way. E.g.

C:\Program Files ---> C:\

and

C:\ ---> C:\

If the directory doesn’t have a parent directory, it returns the directory itself. The question might seem simple but I couldn’t dig it up through Google.


回答 0

从Python 3.4更新

使用pathlib模块。

from pathlib import Path
path = Path("/here/your/path/file.txt")
print(path.parent)

旧答案

尝试这个:

import os.path
print os.path.abspath(os.path.join(yourpath, os.pardir))

yourpath您想要父级的路径在哪里?

Update from Python 3.4

Use the pathlib module.

from pathlib import Path
path = Path("/here/your/path/file.txt")
print(path.parent)

Old answer

Try this:

import os.path
print os.path.abspath(os.path.join(yourpath, os.pardir))

where yourpath is the path you want the parent for.


回答 1

使用os.path.dirname

>>> os.path.dirname(r'C:\Program Files')
'C:\\'
>>> os.path.dirname('C:\\')
'C:\\'
>>>

警告:os.path.dirname()根据路径中是否包含斜杠给出不同的结果。这可能是您想要的语义,也可能不是。cf. @kender使用的答案os.path.join(yourpath, os.pardir)

Using os.path.dirname:

>>> os.path.dirname(r'C:\Program Files')
'C:\\'
>>> os.path.dirname('C:\\')
'C:\\'
>>>

Caveat: os.path.dirname() gives different results depending on whether a trailing slash is included in the path. This may or may not be the semantics you want. Cf. @kender’s answer using os.path.join(yourpath, os.pardir).


回答 2

Pathlib方法(Python 3.4+)

from pathlib import Path
Path('C:\Program Files').parent
# Returns a Pathlib object

传统方法

import os.path
os.path.dirname('C:\Program Files')
# Returns a string


我应该使用哪种方法?

在以下情况下,请使用传统方法:

  • 如果使用Pathlib对象,您会担心现有代码会生成错误。(由于Pathlib对象不能与字符串连接。)

  • 您的Python版本低于3.4。

  • 您需要一个字符串,并且您收到了一个字符串。假设您有一个表示文件路径的字符串,并且想要获取父目录,以便可以将其放入JSON字符串中。转换为Pathlib对象并为此再次返回是一种愚蠢的做法。

如果以上都不适用,请使用Pathlib。



什么是Pathlib?

如果您不知道Pathlib是什么,那么Pathlib模块是一个了不起的模块,它使您更轻松地处理文件。大多数(如果不是全部)与文件一起使用的内置Python模块将接受Pathlib对象和字符串。我在下面重点介绍了Pathlib文档中的几个示例,这些示例展示了您可以使用Pathlib进行的一些巧妙操作。

在目录树中导航:

>>> p = Path('/etc')
>>> q = p / 'init.d' / 'reboot'
>>> q
PosixPath('/etc/init.d/reboot')
>>> q.resolve()
PosixPath('/etc/rc.d/init.d/halt')

查询路径属性:

>>> q.exists()
True
>>> q.is_dir()
False

The Pathlib method (Python 3.4+)

from pathlib import Path
Path('C:\Program Files').parent
# Returns a Pathlib object

The traditional method

import os.path
os.path.dirname('C:\Program Files')
# Returns a string


Which method should I use?

Use the traditional method if:

  • You are worried about existing code generating errors if it were to use a Pathlib object. (Since Pathlib objects cannot be concatenated with strings.)

  • Your Python version is less than 3.4.

  • You need a string, and you received a string. Say for example you have a string representing a filepath, and you want to get the parent directory so you can put it in a JSON string. It would be kind of silly to convert to a Pathlib object and back again for that.

If none of the above apply, use Pathlib.



What is Pathlib?

If you don’t know what Pathlib is, the Pathlib module is a terrific module that makes working with files even easier for you. Most if not all of the built in Python modules that work with files will accept both Pathlib objects and strings. I’ve highlighted below a couple of examples from the Pathlib documentation that showcase some of the neat things you can do with Pathlib.

Navigating inside a directory tree:

>>> p = Path('/etc')
>>> q = p / 'init.d' / 'reboot'
>>> q
PosixPath('/etc/init.d/reboot')
>>> q.resolve()
PosixPath('/etc/rc.d/init.d/halt')

Querying path properties:

>>> q.exists()
True
>>> q.is_dir()
False

回答 3

import os
p = os.path.abspath('..')

C:\Program Files -> C:\\\

C:\ -> C:\\\

import os
p = os.path.abspath('..')

C:\Program Files —> C:\\\

C:\ —> C:\\\


回答 4

@kender的替代解决方案

import os
os.path.dirname(os.path.normpath(yourpath))

yourpath您想要父级的路径在哪里?

但是这种解决方案并不完美,因为它不能处理yourpath空字符串或点的情况。

这个其他解决方案可以更好地处理这种极端情况:

import os
os.path.normpath(os.path.join(yourpath, os.pardir))

在这里可以找到的每种情况的输出(输入路径是相对的):

os.path.dirname(os.path.normpath('a/b/'))          => 'a'
os.path.normpath(os.path.join('a/b/', os.pardir))  => 'a'

os.path.dirname(os.path.normpath('a/b'))           => 'a'
os.path.normpath(os.path.join('a/b', os.pardir))   => 'a'

os.path.dirname(os.path.normpath('a/'))            => ''
os.path.normpath(os.path.join('a/', os.pardir))    => '.'

os.path.dirname(os.path.normpath('a'))             => ''
os.path.normpath(os.path.join('a', os.pardir))     => '.'

os.path.dirname(os.path.normpath('.'))             => ''
os.path.normpath(os.path.join('.', os.pardir))     => '..'

os.path.dirname(os.path.normpath(''))              => ''
os.path.normpath(os.path.join('', os.pardir))      => '..'

os.path.dirname(os.path.normpath('..'))            => ''
os.path.normpath(os.path.join('..', os.pardir))    => '../..'

输入路径是绝对路径(Linux路径):

os.path.dirname(os.path.normpath('/a/b'))          => '/a'
os.path.normpath(os.path.join('/a/b', os.pardir))  => '/a'

os.path.dirname(os.path.normpath('/a'))            => '/'
os.path.normpath(os.path.join('/a', os.pardir))    => '/'

os.path.dirname(os.path.normpath('/'))             => '/'
os.path.normpath(os.path.join('/', os.pardir))     => '/'

An alternate solution of @kender

import os
os.path.dirname(os.path.normpath(yourpath))

where yourpath is the path you want the parent for.

But this solution is not perfect, since it will not handle the case where yourpath is an empty string, or a dot.

This other solution will handle more nicely this corner case:

import os
os.path.normpath(os.path.join(yourpath, os.pardir))

Here the outputs for every case that can find (Input path is relative):

os.path.dirname(os.path.normpath('a/b/'))          => 'a'
os.path.normpath(os.path.join('a/b/', os.pardir))  => 'a'

os.path.dirname(os.path.normpath('a/b'))           => 'a'
os.path.normpath(os.path.join('a/b', os.pardir))   => 'a'

os.path.dirname(os.path.normpath('a/'))            => ''
os.path.normpath(os.path.join('a/', os.pardir))    => '.'

os.path.dirname(os.path.normpath('a'))             => ''
os.path.normpath(os.path.join('a', os.pardir))     => '.'

os.path.dirname(os.path.normpath('.'))             => ''
os.path.normpath(os.path.join('.', os.pardir))     => '..'

os.path.dirname(os.path.normpath(''))              => ''
os.path.normpath(os.path.join('', os.pardir))      => '..'

os.path.dirname(os.path.normpath('..'))            => ''
os.path.normpath(os.path.join('..', os.pardir))    => '../..'

Input path is absolute (Linux path):

os.path.dirname(os.path.normpath('/a/b'))          => '/a'
os.path.normpath(os.path.join('/a/b', os.pardir))  => '/a'

os.path.dirname(os.path.normpath('/a'))            => '/'
os.path.normpath(os.path.join('/a', os.pardir))    => '/'

os.path.dirname(os.path.normpath('/'))             => '/'
os.path.normpath(os.path.join('/', os.pardir))     => '/'

回答 5

os.path.split(os.path.abspath(mydir))[0]
os.path.split(os.path.abspath(mydir))[0]

回答 6

os.path.abspath(os.path.join(somepath, '..'))

观察:

import posixpath
import ntpath

print ntpath.abspath(ntpath.join('C:\\', '..'))
print ntpath.abspath(ntpath.join('C:\\foo', '..'))
print posixpath.abspath(posixpath.join('/', '..'))
print posixpath.abspath(posixpath.join('/home', '..'))
os.path.abspath(os.path.join(somepath, '..'))

Observe:

import posixpath
import ntpath

print ntpath.abspath(ntpath.join('C:\\', '..'))
print ntpath.abspath(ntpath.join('C:\\foo', '..'))
print posixpath.abspath(posixpath.join('/', '..'))
print posixpath.abspath(posixpath.join('/home', '..'))

回答 7

import os
print"------------------------------------------------------------"
SITE_ROOT = os.path.dirname(os.path.realpath(__file__))
print("example 1: "+SITE_ROOT)
PARENT_ROOT=os.path.abspath(os.path.join(SITE_ROOT, os.pardir))
print("example 2: "+PARENT_ROOT)
GRANDPAPA_ROOT=os.path.abspath(os.path.join(PARENT_ROOT, os.pardir))
print("example 3: "+GRANDPAPA_ROOT)
print "------------------------------------------------------------"
import os
print"------------------------------------------------------------"
SITE_ROOT = os.path.dirname(os.path.realpath(__file__))
print("example 1: "+SITE_ROOT)
PARENT_ROOT=os.path.abspath(os.path.join(SITE_ROOT, os.pardir))
print("example 2: "+PARENT_ROOT)
GRANDPAPA_ROOT=os.path.abspath(os.path.join(PARENT_ROOT, os.pardir))
print("example 3: "+GRANDPAPA_ROOT)
print "------------------------------------------------------------"

回答 8

如果你想名称是作为参数,并提供该文件的直接父文件夹的不是绝对路径到该文件中:

os.path.split(os.path.dirname(currentDir))[1]

即具有的currentDir/home/user/path/to/myfile/file.ext

上面的命令将返回:

myfile

If you want only the name of the folder that is the immediate parent of the file provided as an argument and not the absolute path to that file:

os.path.split(os.path.dirname(currentDir))[1]

i.e. with a currentDir value of /home/user/path/to/myfile/file.ext

The above command will return:

myfile


回答 9

>>> import os
>>> os.path.basename(os.path.dirname(<your_path>))

例如在Ubuntu中:

>>> my_path = '/home/user/documents'
>>> os.path.basename(os.path.dirname(my_path))
# Output: 'user'

例如在Windows中:

>>> my_path = 'C:\WINDOWS\system32'
>>> os.path.basename(os.path.dirname(my_path))
# Output: 'WINDOWS'

两个示例都在Python 2.7中尝试过

>>> import os
>>> os.path.basename(os.path.dirname(<your_path>))

For example in Ubuntu:

>>> my_path = '/home/user/documents'
>>> os.path.basename(os.path.dirname(my_path))
# Output: 'user'

For example in Windows:

>>> my_path = 'C:\WINDOWS\system32'
>>> os.path.basename(os.path.dirname(my_path))
# Output: 'WINDOWS'

Both examples tried in Python 2.7


回答 10

import os.path

os.path.abspath(os.pardir)
import os.path

os.path.abspath(os.pardir)

回答 11

假设我们有类似的目录结构

1]

/home/User/P/Q/R

我们要从目录R访问“ P”的路径,然后可以使用

ROOT = os.path.abspath(os.path.join("..", os.pardir));

2]

/home/User/P/Q/R

我们要从目录R访问“ Q”目录的路径,然后可以使用

ROOT = os.path.abspath(os.path.join(".", os.pardir));

Suppose we have directory structure like

1]

/home/User/P/Q/R

We want to access the path of “P” from the directory R then we can access using

ROOT = os.path.abspath(os.path.join("..", os.pardir));

2]

/home/User/P/Q/R

We want to access the path of “Q” directory from the directory R then we can access using

ROOT = os.path.abspath(os.path.join(".", os.pardir));

回答 12

只需在Tung的答案中添加一些内容即可(rstrip('/')如果您使用的是unix盒,则需要更加安全一些)。

>>> input = "../data/replies/"
>>> os.path.dirname(input.rstrip('/'))
'../data'
>>> input = "../data/replies"
>>> os.path.dirname(input.rstrip('/'))
'../data'

但是,如果您不使用rstrip('/'),则输入为

>>> input = "../data/replies/"

会输出

>>> os.path.dirname(input)
'../data/replies'

这可能不是您想要的,"../data/replies/"并且"../data/replies"行为方式相同。

Just adding something to the Tung’s answer (you need to use rstrip('/') to be more of the safer side if you’re on a unix box).

>>> input = "../data/replies/"
>>> os.path.dirname(input.rstrip('/'))
'../data'
>>> input = "../data/replies"
>>> os.path.dirname(input.rstrip('/'))
'../data'

But, if you don’t use rstrip('/'), given your input is

>>> input = "../data/replies/"

would output,

>>> os.path.dirname(input)
'../data/replies'

which is probably not what you’re looking at as you want both "../data/replies/" and "../data/replies" to behave the same way.


回答 13

import os

dir_path = os.path.dirname(os.path.realpath(__file__))
parent_path = os.path.abspath(os.path.join(dir_path, os.pardir))
import os

dir_path = os.path.dirname(os.path.realpath(__file__))
parent_path = os.path.abspath(os.path.join(dir_path, os.pardir))

回答 14

print os.path.abspath(os.path.join(os.getcwd(), os.path.pardir))

您可以使用它来获取py文件当前位置的父目录。

print os.path.abspath(os.path.join(os.getcwd(), os.path.pardir))

You can use this to get the parent directory of the current location of your py file.


回答 15

获取父目录路径创建新目录(名称new_dir

获取父目录路径

os.path.abspath('..')
os.pardir

例子1

import os
print os.makedirs(os.path.join(os.path.dirname(__file__), os.pardir, 'new_dir'))

例子2

import os
print os.makedirs(os.path.join(os.path.dirname(__file__), os.path.abspath('..'), 'new_dir'))

GET Parent Directory Path and make New directory (name new_dir)

Get Parent Directory Path

os.path.abspath('..')
os.pardir

Example 1

import os
print os.makedirs(os.path.join(os.path.dirname(__file__), os.pardir, 'new_dir'))

Example 2

import os
print os.makedirs(os.path.join(os.path.dirname(__file__), os.path.abspath('..'), 'new_dir'))

回答 16

os.path.abspath('D:\Dir1\Dir2\..')

>>> 'D:\Dir1'

所以有..帮助

os.path.abspath('D:\Dir1\Dir2\..')

>>> 'D:\Dir1'

So a .. helps


回答 17

import os

def parent_filedir(n):
    return parent_filedir_iter(n, os.path.dirname(__file__))

def parent_filedir_iter(n, path):
    n = int(n)
    if n <= 1:
        return path
    return parent_filedir_iter(n - 1, os.path.dirname(path))

test_dir = os.path.abspath(parent_filedir(2))
import os

def parent_filedir(n):
    return parent_filedir_iter(n, os.path.dirname(__file__))

def parent_filedir_iter(n, path):
    n = int(n)
    if n <= 1:
        return path
    return parent_filedir_iter(n - 1, os.path.dirname(path))

test_dir = os.path.abspath(parent_filedir(2))

回答 18

上面给出的答案对于上移一个或两个目录级别都是非常好的,但是如果一个人需要遍历目录树许多级别(例如5或10),它们可能会变得有些麻烦。可以通过加入中的N os.pardirs 列表来简洁地完成此操作os.path.join。例:

import os
# Create list of ".." times 5
upup = [os.pardir]*5
# Extract list as arguments of join()
go_upup = os.path.join(*upup)
# Get abspath for current file
up_dir = os.path.abspath(os.path.join(__file__, go_upup))

The answers given above are all perfectly fine for going up one or two directory levels, but they may get a bit cumbersome if one needs to traverse the directory tree by many levels (say, 5 or 10). This can be done concisely by joining a list of N os.pardirs in os.path.join. Example:

import os
# Create list of ".." times 5
upup = [os.pardir]*5
# Extract list as arguments of join()
go_upup = os.path.join(*upup)
# Get abspath for current file
up_dir = os.path.abspath(os.path.join(__file__, go_upup))

用pip安装PIL

问题:用pip安装PIL

我正在尝试使用以下命令安装PIL(Python Imaging Library):

sudo pip install pil

但我收到以下消息:

Downloading/unpacking PIL
  You are installing a potentially insecure and unverifiable file. Future versions of pip will default to disallowing insecure files.
  Downloading PIL-1.1.7.tar.gz (506kB): 506kB downloaded
  Running setup.py egg_info for package PIL
    WARNING: '' not a valid package name; please use only.-separated package names in setup.py

Installing collected packages: PIL
  Running setup.py install for PIL
    WARNING: '' not a valid package name; please use only.-separated package names in setup.py
    --- using frameworks at /System/Library/Frameworks
    building '_imaging' extension
    clang -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch x86_64 -pipe -IlibImaging -I/System/Library/Frameworks/Python.framework/Versions/2.7/include -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c _imaging.c -o build/temp.macosx-10.8-intel-2.7/_imaging.o
    unable to execute clang: No such file or directory
    error: command 'clang' failed with exit status 1
    Complete output from command /usr/bin/python -c "import setuptools;__file__='/private/tmp/pip_build_root/PIL/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-AYrxVD-record/install-record.txt --single-version-externally-managed:
    WARNING: '' not a valid package name; please use only.-separated package names in setup.py

running install

running build

.
.
.
.

copying PIL/XVThumbImagePlugin.py -> build/lib.macosx-10.8-intel-2.7

running build_ext

--- using frameworks at /System/Library/Frameworks

building '_imaging' extension

creating build/temp.macosx-10.8-intel-2.7

creating build/temp.macosx-10.8-intel-2.7/libImaging

clang -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch x86_64 -pipe -IlibImaging -I/System/Library/Frameworks/Python.framework/Versions/2.7/include -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c _imaging.c -o build/temp.macosx-10.8-intel-2.7/_imaging.o

unable to execute clang: No such file or directory

error: command 'clang' failed with exit status 1

----------------------------------------
Cleaning up

您能帮我安装PIL吗?

I am trying to install PIL (the Python Imaging Library) using the command:

sudo pip install pil

but I get the following message:

Downloading/unpacking PIL
  You are installing a potentially insecure and unverifiable file. Future versions of pip will default to disallowing insecure files.
  Downloading PIL-1.1.7.tar.gz (506kB): 506kB downloaded
  Running setup.py egg_info for package PIL
    WARNING: '' not a valid package name; please use only.-separated package names in setup.py

Installing collected packages: PIL
  Running setup.py install for PIL
    WARNING: '' not a valid package name; please use only.-separated package names in setup.py
    --- using frameworks at /System/Library/Frameworks
    building '_imaging' extension
    clang -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch x86_64 -pipe -IlibImaging -I/System/Library/Frameworks/Python.framework/Versions/2.7/include -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c _imaging.c -o build/temp.macosx-10.8-intel-2.7/_imaging.o
    unable to execute clang: No such file or directory
    error: command 'clang' failed with exit status 1
    Complete output from command /usr/bin/python -c "import setuptools;__file__='/private/tmp/pip_build_root/PIL/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-AYrxVD-record/install-record.txt --single-version-externally-managed:
    WARNING: '' not a valid package name; please use only.-separated package names in setup.py

running install

running build

.
.
.
.

copying PIL/XVThumbImagePlugin.py -> build/lib.macosx-10.8-intel-2.7

running build_ext

--- using frameworks at /System/Library/Frameworks

building '_imaging' extension

creating build/temp.macosx-10.8-intel-2.7

creating build/temp.macosx-10.8-intel-2.7/libImaging

clang -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch x86_64 -pipe -IlibImaging -I/System/Library/Frameworks/Python.framework/Versions/2.7/include -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c _imaging.c -o build/temp.macosx-10.8-intel-2.7/_imaging.o

unable to execute clang: No such file or directory

error: command 'clang' failed with exit status 1

----------------------------------------
Cleaning up…

Could you please help me to install PIL??


回答 0

  1. 如上所述安装Xcode和Xcode命令行工具。
  2. 请改用Pillow,因为PIL基本已失效。枕头是PIL的保养品。

https://pypi.python.org/pypi/Pillow/2.2.1

pip install Pillow

如果您同时安装了两个Python,并且想为Python3安装此代码,请执行以下操作:

python3 -m pip install Pillow
  1. Install Xcode and Xcode Command Line Tools as mentioned.
  2. Use Pillow instead, as PIL is basically dead. Pillow is a maintained fork of PIL.

https://pypi.python.org/pypi/Pillow/2.2.1

pip install Pillow

If you have both Pythons installed and want to install this for Python3:

python3 -m pip install Pillow

回答 1

这对我有用:

apt-get install python-dev
apt-get install libjpeg-dev
apt-get install libjpeg8-dev
apt-get install libpng3
apt-get install libfreetype6-dev
ln -s /usr/lib/i386-linux-gnu/libfreetype.so /usr/lib
ln -s /usr/lib/i386-linux-gnu/libjpeg.so /usr/lib
ln -s /usr/lib/i386-linux-gnu/libz.so /usr/lib

pip install PIL  --allow-unverified PIL --allow-all-external

This works for me:

apt-get install python-dev
apt-get install libjpeg-dev
apt-get install libjpeg8-dev
apt-get install libpng3
apt-get install libfreetype6-dev
ln -s /usr/lib/i386-linux-gnu/libfreetype.so /usr/lib
ln -s /usr/lib/i386-linux-gnu/libjpeg.so /usr/lib
ln -s /usr/lib/i386-linux-gnu/libz.so /usr/lib

pip install PIL  --allow-unverified PIL --allow-all-external

回答 2

使用apt install非常简单,使用此命令即可完成

sudo apt-get install python-PIL

要么

sudo pip install pillow

要么

sudo easy_install pillow

It is very simple using apt install use this command to get it done

sudo apt-get install python-PIL

or

sudo pip install pillow

or

sudo easy_install pillow

回答 3

在Mac OS X上,使用以下命令:

sudo pip install https://effbot.org/media/downloads/Imaging-1.1.7.tar.gz

On Mac OS X, use this command:

sudo pip install https://effbot.org/media/downloads/Imaging-1.1.7.tar.gz

回答 4

您应该描述安装在这里

pip install image

You should install as described here:

pip install image

回答 5

我从这里的讨论中得到了答案:

我试过了

pip install --no-index -f http://dist.plone.org/thirdparty/ -U PIL

而且有效。

I got the answer from a discussion here:

I tried

pip install --no-index -f http://dist.plone.org/thirdparty/ -U PIL

and it worked.


回答 6

安装

pip install Pillow

然后,只需导入文件,例如

from PIL import Image

我正在使用Windows。它为我工作。

注意

Pillow是Python Imaging Library的功能替代品。要使用Pillow运行现有的PIL兼容代码,需要对其进行修改以从PIL命名空间而不是全局命名空间导入Imaging模块。

即更改:

import Image

至:

from PIL import Image

https://pypi.org/project/枕头/2.2.1/

Install

pip install Pillow

Then, Just import in your file like,

from PIL import Image

I am using windows. It is working for me.

NOTE:

Pillow is a functional drop-in replacement for the Python Imaging Library. To run your existing PIL-compatible code with Pillow, it needs to be modified to import the Imaging module from the PIL namespace instead of the global namespace.

i.e. change:

import Image

to:

from PIL import Image

https://pypi.org/project/Pillow/2.2.1/


回答 7

我认为您在Mac上。请参阅如何在Mac OS X 10.7.2 Lion上安装PIL

如果使用[homebrew] [],则可以使用just安装PIL brew install pil。然后,您可能需要将安装目录($(brew --prefix)/lib/python2.7/site-packages)添加到PYTHONPATH中,或者将PIL目录本身的位置添加到PIL.pth任何site-packages目录中名为file的文件中,内容如下:

/usr/local/lib/python2.7/site-packages/PIL

(假设brew --prefix/usr/local)。

另外,您也可以从源代码下载/构建/安装它:

# download
curl -O -L http://effbot.org/media/downloads/Imaging-1.1.7.tar.gz
# extract
tar -xzf Imaging-1.1.7.tar.gz
cd Imaging-1.1.7
# build and install
python setup.py build
sudo python setup.py install
# or install it for just you without requiring admin permissions:
# python setup.py install --user

我刚刚(在OSX 10.7.2,XCode 4.2.1和System Python 2.7.1上)运行了上面的代码,尽管在我的环境中某些内容可能不是默认值,但它的构建还不错。

[homebrew]:http : //mxcl.github.com/homebrew/ “ Homebrew”

I take it you’re on Mac. See How can I install PIL on mac os x 10.7.2 Lion

If you use [homebrew][], you can install the PIL with just brew install pil. You may then need to add the install directory ($(brew --prefix)/lib/python2.7/site-packages) to your PYTHONPATH, or add the location of PIL directory itself in a file called PIL.pth file in any of your site-packages directories, with the contents:

/usr/local/lib/python2.7/site-packages/PIL

(assuming brew --prefix is /usr/local).

Alternatively, you can just download/build/install it from source:

# download
curl -O -L http://effbot.org/media/downloads/Imaging-1.1.7.tar.gz
# extract
tar -xzf Imaging-1.1.7.tar.gz
cd Imaging-1.1.7
# build and install
python setup.py build
sudo python setup.py install
# or install it for just you without requiring admin permissions:
# python setup.py install --user

I ran the above just now (on OSX 10.7.2, with XCode 4.2.1 and System Python 2.7.1) and it built just fine, though there is a possibility that something in my environment is non-default.

[homebrew]: http://mxcl.github.com/homebrew/ “Homebrew”


回答 8

如今,每个人都在PIL上使用友好的PIL叉子Pillow。

代替: sudo pip install pil

做: sudo pip install pillow

$ sudo apt-get install python-imaging
$ sudo -H pip install pillow

These days, everyone uses Pillow, a friendly PIL fork, over PIL.

Instead of: sudo pip install pil

Do: sudo pip install pillow

$ sudo apt-get install python-imaging
$ sudo -H pip install pillow

回答 9

对于Ubuntu,PIL不再起作用。我总是得到:

找不到与PIL匹配的分布

因此,安装python-imaging:

sudo apt-get install python-imaging

For Ubuntu, PIL is not working any more. I always get:

No matching distribution found for PIL

So install python-imaging:

sudo apt-get install python-imaging

回答 10

我遇到了同样的问题,但是通过安装 python-dev

在安装PIL之前,请运行以下命令:

sudo apt-get install python-dev

然后安装PIL:

pip install PIL

I’m having the same problem, but it gets solved with installation of python-dev.

Before installing PIL, run following command:

sudo apt-get install python-dev

Then install PIL:

pip install PIL

回答 11

安装过程中出现了一些错误。以防万一有人也有这个。尽管我已经是管理员用户,但不是root用户。

File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/Library/Python/2.7/site-packages/PIL'

Storing debug log for failure in /Users/wzbozon/Library/Logs/pip.log

添加“ sudo”解决了问题,使用sudo可以解决问题:

~/Documents/mv-server: $ sudo pip install Pillow

I had some errors during installation. Just in case somebody has this too. Despite that I already was sitting under admin user, but not root.

File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/Library/Python/2.7/site-packages/PIL'

Storing debug log for failure in /Users/wzbozon/Library/Logs/pip.log

Adding “sudo” solved the problem, with sudo it worked:

~/Documents/mv-server: $ sudo pip install Pillow

回答 12

对于CentOS:

yum install python-imaging

For CentOS:

yum install python-imaging

回答 13

我尝试了所有答案,但失败了。直接从官方站点获取源代码,然后构建安装成功。

  1. 前往网站 http://www.pythonware.com/products/pil/#pil117
  2. 单击“ Python Imaging Library 1.1.7源工具包”以下载源
  3. tar xf Imaging-1.1.7.tar.gz
  4. cd Imaging-1.1.7
  5. sudo python setup.py install

I tried all the answers, but failed. Directly get the source from the official site and then build install success.

  1. Go to the site http://www.pythonware.com/products/pil/#pil117
  2. Click “Python Imaging Library 1.1.7 Source Kit” to download the source
  3. tar xf Imaging-1.1.7.tar.gz
  4. cd Imaging-1.1.7
  5. sudo python setup.py install

回答 14

我用 sudo port install py27-Pillow

I nailed it by using sudo port install py27-Pillow


回答 15

尝试这个:

sudo pip install PIL --allow-external PIL --allow-unverified PIL

Try this:

sudo pip install PIL --allow-external PIL --allow-unverified PIL

回答 16

(窗口)如果Pilow不起作用,请尝试从http://www.pythonware.com/products/pil/下载pil

(Window) If Pilow not work try download pil at http://www.pythonware.com/products/pil/


回答 17

  • 首先,您应该运行此程序sudo apt-get build-dep python-imaging,它将为您提供可能需要的所有依赖项

  • 然后跑 sudo apt-get update && sudo apt-get -y upgrade

  • 其次是 sudo apt-get install python-pip

  • 然后最后安装Pil pip install pillow

  • First you should run this sudo apt-get build-dep python-imaging which will give you all the dependencies that you might need

  • Then run sudo apt-get update && sudo apt-get -y upgrade

  • Followed by sudo apt-get install python-pip

  • And then finally install Pil pip install pillow


回答 18

使用之前,请先搜索软件包管理器pip。在Arch Linux上,您可以通过以下方式获取PIL:pacman -S python2-pillow

Search on package manager before using pip. On Arch linux you can get PIL by pacman -S python2-pillow


回答 19

还有另一个名为的Python打包工具conda。当某些库需要安装C ++和其他非纯Python的绑定时,Conda优于pip(我认为)。Conda的安装中还包括点子,因此您仍然可以使用点子,但您也可以从conda中受益。

默认情况下,Conda还安装IPython,pil和许多其他库。我想你会喜欢的。

There’s another Python package tool called conda. Conda is preferred (I believe) over pip when there are libraries that need to install C++ and other bindings that aren’t pure Python. Conda includes pip in its installation as well so you can still use pip, but you also get the benefits of conda.

Conda also installs IPython, pil, and many other libraries by default. I think you’ll like it.


是否可以使用pip从私有GitHub存储库安装软件包?

问题:是否可以使用pip从私有GitHub存储库安装软件包?

我正在尝试从私有GitHub存储库安装Python软件包。对于公共存储库,我可以发出以下正常运行的命令:

pip install git+git://github.com/django/django.git

但是,如果我尝试将其用于私有存储库:

pip install git+git://github.com/echweb/echweb-utils.git

我得到以下输出:

Downloading/unpacking git+git://github.com/echweb/echweb-utils.git
Cloning Git repository git://github.com/echweb/echweb-utils.git to /var/folders/cB/cB85g9P7HM4jcPn7nrvWRU+++TI/-Tmp-/pip-VRsIoo-build
Complete output from command /usr/local/bin/git clone git://github.com/echweb/echweb-utils.git /var/folders/cB/cB85g9P7HM4jcPn7nrvWRU+++TI/-Tmp-/pip-VRsIoo-build:
fatal: The remote end hung up unexpectedly

Cloning into /var/folders/cB/cB85g9P7HM4jcPn7nrvWRU+++TI/-Tmp-/pip-VRsIoo-build...

----------------------------------------
Command /usr/local/bin/git clone git://github.com/echweb/echweb-utils.git /var/folders/cB/cB85g9P7HM4jcPn7nrvWRU+++TI/-Tmp-/pip-VRsIoo-build failed with error code 128

我猜这是因为我试图在不提供任何身份验证的情况下访问私有存储库。因此,我尝试使用Git + ssh希望pip使用我的SSH公钥进行身份验证:

pip install git+ssh://github.com/echweb/echweb-utils.git

这给出以下输出:

Downloading/unpacking git+ssh://github.com/echweb/echweb-utils.git
Cloning Git repository ssh://github.com/echweb/echweb-utils.git to /var/folders/cB/cB85g9P7HM4jcPn7nrvWRU+++TI/-Tmp-/pip-DQB8s4-build
Complete output from command /usr/local/bin/git clone ssh://github.com/echweb/echweb-utils.git /var/folders/cB/cB85g9P7HM4jcPn7nrvWRU+++TI/-Tmp-/pip-DQB8s4-build:
Cloning into /var/folders/cB/cB85g9P7HM4jcPn7nrvWRU+++TI/-Tmp-/pip-DQB8s4-build...

Permission denied (publickey).

fatal: The remote end hung up unexpectedly

----------------------------------------
Command /usr/local/bin/git clone ssh://github.com/echweb/echweb-utils.git /var/folders/cB/cB85g9P7HM4jcPn7nrvWRU+++TI/-Tmp-/pip-DQB8s4-build failed with error code 128

我正在努力实现的目标是否可能?如果是这样,我该怎么办?

I am trying to install a Python package from a private GitHub repository. For a public repository, I can issue the following command which works fine:

pip install git+git://github.com/django/django.git

However, if I try this for a private repository:

pip install git+git://github.com/echweb/echweb-utils.git

I get the following output:

Downloading/unpacking git+git://github.com/echweb/echweb-utils.git
Cloning Git repository git://github.com/echweb/echweb-utils.git to /var/folders/cB/cB85g9P7HM4jcPn7nrvWRU+++TI/-Tmp-/pip-VRsIoo-build
Complete output from command /usr/local/bin/git clone git://github.com/echweb/echweb-utils.git /var/folders/cB/cB85g9P7HM4jcPn7nrvWRU+++TI/-Tmp-/pip-VRsIoo-build:
fatal: The remote end hung up unexpectedly

Cloning into /var/folders/cB/cB85g9P7HM4jcPn7nrvWRU+++TI/-Tmp-/pip-VRsIoo-build...

----------------------------------------
Command /usr/local/bin/git clone git://github.com/echweb/echweb-utils.git /var/folders/cB/cB85g9P7HM4jcPn7nrvWRU+++TI/-Tmp-/pip-VRsIoo-build failed with error code 128

I guess this is because I am trying to access a private repository without providing any authentication. I therefore tried to use Git + ssh hoping that pip would use my SSH public key to authenticate:

pip install git+ssh://github.com/echweb/echweb-utils.git

This gives the following output:

Downloading/unpacking git+ssh://github.com/echweb/echweb-utils.git
Cloning Git repository ssh://github.com/echweb/echweb-utils.git to /var/folders/cB/cB85g9P7HM4jcPn7nrvWRU+++TI/-Tmp-/pip-DQB8s4-build
Complete output from command /usr/local/bin/git clone ssh://github.com/echweb/echweb-utils.git /var/folders/cB/cB85g9P7HM4jcPn7nrvWRU+++TI/-Tmp-/pip-DQB8s4-build:
Cloning into /var/folders/cB/cB85g9P7HM4jcPn7nrvWRU+++TI/-Tmp-/pip-DQB8s4-build...

Permission denied (publickey).

fatal: The remote end hung up unexpectedly

----------------------------------------
Command /usr/local/bin/git clone ssh://github.com/echweb/echweb-utils.git /var/folders/cB/cB85g9P7HM4jcPn7nrvWRU+++TI/-Tmp-/pip-DQB8s4-build failed with error code 128

Is what I am trying to achieve even possible? If so, how can I do it?


回答 0

您可以使用git+sshURI方案,但是必须设置用户名:

pip install git+ssh://git@github.com/echweb/echweb-utils.git

git@在URI中看到该部分了吗?

PS:另请参阅有关部署密钥

PPS:在我的安装中,“ git + ssh” URI方案仅适用于“可编辑”的要求:

pip install -e URI#egg=EggName

切记:在命令中使用遥控器的地址之前:,请将要git remote -v打印的/字符更改为字符pip

$ git remote -v
origin  git@github.com:echweb/echweb-utils.git (fetch)
                      ^ change this to a '/' character

如果您忘记了,则会收到此错误:

ssh: Could not resolve hostname github.com:echweb:
         nodename nor servname provided, or not known

You can use the git+ssh URI scheme, but you must set a username:

pip install git+ssh://git@github.com/echweb/echweb-utils.git

Do you see the git@ part into the URI?

PS: Also read about deploy keys.

PPS: In my installation, the “git+ssh” URI scheme works only with “editable” requirements:

pip install -e URI#egg=EggName

Remember: Change the : character that git remote -v prints to a / character before using the remote’s address in the pip command:

$ git remote -v
origin  git@github.com:echweb/echweb-utils.git (fetch)
                      ^ change this to a '/' character

If you forget, you will get this error:

ssh: Could not resolve hostname github.com:echweb:
         nodename nor servname provided, or not known

回答 1

作为另一项技术,如果您在本地克隆了专用存储库,则可以执行以下操作:

pip install git+file://c:/repo/directory

更现代的是,您可以执行此操作(这-e将意味着您不必在更改被反映之前就提交更改):

pip install -e C:\repo\directory

As an additional technique, if you have the private repository cloned locally, you can do:

pip install git+file://c:/repo/directory

More modernly, you can just do this (and the -e will mean you don’t have to commit changes before they’re reflected):

pip install -e C:\repo\directory

回答 2

您可以使用HTTPS URL直接执行此操作,如下所示:

pip install git+https://github.com/username/repo.git

例如,这也可以仅将这一行添加到Django项目中的requirements.txt中。

You can do it directly with the HTTPS URL like this:

pip install git+https://github.com/username/repo.git

This also works just appending that line in the requirements.txt in a Django project, for instance.


回答 3

它也可以与Bitbucket一起使用

pip install git+ssh://git@bitbucket.org/username/projectname.git

在这种情况下,Pip将使用您的SSH密钥。

It also works with Bitbucket:

pip install git+ssh://git@bitbucket.org/username/projectname.git

Pip will use your SSH keys in this case.


回答 4

需求文件的语法在此处给出:

https://pip.pypa.io/zh_CN/latest/reference/pip_install.html#requirements-file-format

因此,例如,使用:

-e git+http://github.com/rwillmer/django-behave#egg=django-behave

如果您希望源在安装后继续存在。

要不就

git+http://github.com/rwillmer/django-behave#egg=django-behave

如果您只想安装它。

The syntax for the requirements file is given here:

https://pip.pypa.io/en/latest/reference/pip_install.html#requirements-file-format

So for example, use:

-e git+http://github.com/rwillmer/django-behave#egg=django-behave

if you want the source to stick around after installation.

Or just

git+http://github.com/rwillmer/django-behave#egg=django-behave

if you just want it to be installed.


回答 5

我发现使用令牌比使用SSH密钥容易得多。我在这方面找不到很多好的文档,因此我主要是通过反复试验来遇到此解决方案。此外,从pip和setuptools安装有一些细微的差异。但是这种方式对两者都适用。

GitHub尚未提供(目前,截至2016年8月)提供了一种获取私有存储库zip / tarball的简便方法。因此,您需要指向setuptools来告诉setuptools您指向的是Git存储库:

from setuptools import setup
import os
# Get the deploy key from https://help.github.com/articles/git-automation-with-oauth-tokens/
github_token = os.environ['GITHUB_TOKEN']

setup(
    # ...
    install_requires='package',
    dependency_links = [
    'git+https://{github_token}@github.com/user/{package}.git/@{version}#egg={package}-0'
        .format(github_token=github_token, package=package, version=master)
        ]

这里有几点注意事项:

  • 对于私有存储库,您需要通过GitHub进行身份验证;我发现的最简单的方法是创建OAuth令牌,将其放入您的环境中,然后将其包含在URL中
  • 即使在PyPI上没有任何软件包,您也需要在链接末尾包含一些版本号(在此处0)。这必须是实际数字,而不是单词。
  • 您需要以 git+序号告诉setuptools它是克隆存储库,而不是指向zip / tarball
  • version 可以是分支,标签或提交哈希
  • --process-dependency-links如果从pip安装,则需要提供

I found it much easier to use tokens than SSH keys. I couldn’t find much good documentation on this, so I came across this solution mainly through trial and error. Further, installing from pip and setuptools have some subtle differences; but this way should work for both.

GitHub don’t (currently, as of August 2016) offer an easy way to get the zip / tarball of private repositories. So you need to point setuptools to tell setuptools that you’re pointing to a Git repository:

from setuptools import setup
import os
# Get the deploy key from https://help.github.com/articles/git-automation-with-oauth-tokens/
github_token = os.environ['GITHUB_TOKEN']

setup(
    # ...
    install_requires='package',
    dependency_links = [
    'git+https://{github_token}@github.com/user/{package}.git/@{version}#egg={package}-0'
        .format(github_token=github_token, package=package, version=master)
        ]

A couple of notes here:

  • For private repositories, you need to authenticate with GitHub; the simplest way I found is to create an OAuth token, drop that into your environment, and then include it with the URL
  • You need to include some version number (here is 0) at the end of the link, even if there’s isn’t any package on PyPI. This has to be a actual number, not a word.
  • You need to preface with git+ to tell setuptools it’s to clone the repository, rather than pointing at a zip / tarball
  • version can be a branch, a tag, or a commit hash
  • You need to supply --process-dependency-links if installing from pip

回答 6

我想出了一种自动“点安装”不需要密码提示的GitLab私有存储库的方法。这种方法使用GitLab“部署密钥”和SSH配置文件,因此您可以使用个人SSH密钥以外的其他密钥进行部署(在我的情况下,由“机器人”使用)。也许有人会使用GitHub进行验证。

创建一个新的SSH密钥:

ssh-keygen -t rsa -C "GitLab_Robot_Deploy_Key"

该文件应显示为~/.ssh/GitLab_Robot_Deploy_Key~/.ssh/GitLab_Robot_Deploy_Key.pub

~/.ssh/GitLab_Robot_Deploy_Key.pub文件的内容复制并粘贴到GitLab的“部署密钥”对话框中。

测试新的部署密钥

以下命令告诉SSH使用新的部署密钥来建立连接。成功后,您将收到消息:“欢迎使用GitLab,用户名!”

ssh -T -i ~/.ssh/GitLab_Robot_Deploy_Key git@gitlab.mycorp.com

创建SSH配置文件

接下来,使用编辑器创建~/.ssh/config文件。添加以下内容。“主机”值可以是您想要的任何值(请记住它,因为稍后会使用它)。HostName是您的GitLab实例的URL。IdentifyFile是您在第一步中创建的SSH密钥文件的路径。

Host GitLab
  HostName gitlab.mycorp.com
  IdentityFile ~/.ssh/GitLab_Robot_Deploy_Key

将SSH指向配置文件

oxyum为我们提供了通过SSH使用pip的方法:

pip install git+ssh://git@gitlab.mycorp.com/my_name/my_repo.git

我们只需要对其稍作修改即可使SSH使用我们的新Deploy Key。为此,我们将SSH指向SSH配置文件中的Host条目。只需将命令中的“ gitlab.mycorp.com”替换为我们在SSH配置文件中使用的主机名即可:

pip install git+ssh://git@GitLab/my_name/my_repo.git

该软件包现在应该安装,没有任何密码提示。

参考文献A
参考文献B

I figured out a way to automagically ‘pip install’ a GitLab private repository that requires no password prompt. This approach uses GitLab “Deploy Keys” and an SSH configuration file, so you can deploy using keys other than your personal SSH keys (in my case, for use by a ‘bot). Perhaps someone kind soul can verify using GitHub.

Create a New SSH key:

ssh-keygen -t rsa -C "GitLab_Robot_Deploy_Key"

The file should show up as ~/.ssh/GitLab_Robot_Deploy_Key and ~/.ssh/GitLab_Robot_Deploy_Key.pub.

Copy and paste the contents of the ~/.ssh/GitLab_Robot_Deploy_Key.pub file into the GitLab “Deploy Keys” dialog.

Test the New Deploy Key

The following command tells SSH to use your new deploy key to set up the connection. On success, you should get the message: “Welcome to GitLab, UserName!”

ssh -T -i ~/.ssh/GitLab_Robot_Deploy_Key git@gitlab.mycorp.com

Create the SSH Configuration File

Next, use an editor to create a ~/.ssh/config file. Add the following contents. The ‘Host’ value can be anything you want (just remember it, because you’ll be using it later). The HostName is the URL to your GitLab instance. The IdentifyFile is path to the SSH key file you created in the first step.

Host GitLab
  HostName gitlab.mycorp.com
  IdentityFile ~/.ssh/GitLab_Robot_Deploy_Key

Point SSH to the Configuration file

oxyum gave us the recipe for using pip with SSH:

pip install git+ssh://git@gitlab.mycorp.com/my_name/my_repo.git

We just need to modify it a bit to make SSH use our new Deploy Key. We do that by pointing SSH to the Host entry in the SSH configuration file. Just replace the ‘gitlab.mycorp.com’ in the command to the host name we used in the SSH configuration file:

pip install git+ssh://git@GitLab/my_name/my_repo.git

The package should now install without any password prompt.

Reference A
Reference B


回答 7

从GitHub安装时,我可以使用:

pip install git+ssh://git@github.com/<username>/<projectname>.git#egg=<eggname>

但是,由于我必须以pip as身份运行sudo,所以SSH密钥不再可与GitHub一起使用,并且“ git clone”在“权限被拒绝(公共密钥)”上失败。使用git+https允许我以sudo的身份运行命令,并让GitHub询问我的用户名/密码。

sudo pip install git+https://github.com/<username>/<projectname>.git#egg=<eggname>

When I was installing from GitHub I was able to use:

pip install git+ssh://git@github.com/<username>/<projectname>.git#egg=<eggname>

But, since I had to run pip as sudo, the SSH keys were not working with GitHub any more, and “git clone” failed on “Permission denied (publickey)”. Using git+https allowed me to run the command as sudo, and have GitHub ask me for my user/password.

sudo pip install git+https://github.com/<username>/<projectname>.git#egg=<eggname>

回答 8

您还可以通过提供登录凭据(登录名和密码,或部署令牌)通过git + https://github.com / … URL 安装私有存储库依赖关系,以使用该文件卷曲.netrc

echo "machine github.com login ei-grad password mypasswordshouldbehere" > ~/.netrc
pip install "git+https://github.com/ei-grad/my_private_repo.git#egg=my_private_repo"

You can also install a private repository dependency via git+https://github.com/… URL by providing login credentials (login and password, or deploy token) for curl with the .netrc file:

echo "machine github.com login ei-grad password mypasswordshouldbehere" > ~/.netrc
pip install "git+https://github.com/ei-grad/my_private_repo.git#egg=my_private_repo"

回答 9

如果要从CI服务器等中的需求文件中安装依赖项,可以执行以下操作:

git config --global credential.helper 'cache'
echo "protocol=https
host=example.com
username=${GIT_USER}
password=${GIT_PASS}
" | git credential approve
pip install -r requirements.txt

就我而言,我使用GIT_USER=gitlab-ci-tokenGIT_PASS=${CI_JOB_TOKEN}

该方法具有明显的优势。您只有一个包含所有依赖项的需求文件。

If you want to install dependencies from a requirements file within a CI server or alike, you can do this:

git config --global credential.helper 'cache'
echo "protocol=https
host=example.com
username=${GIT_USER}
password=${GIT_PASS}
" | git credential approve
pip install -r requirements.txt

In my case, I used GIT_USER=gitlab-ci-token and GIT_PASS=${CI_JOB_TOKEN}.

This method has a clear advantage. You have a single requirements file which contains all of your dependencies.


回答 10

如果您不想使用SSH,则可以在HTTPS URL中添加用户名和密码。

下面的代码假定您在工作目录中有一个包含密码的名为“ pass”的文件。

export PASS=$(cat pass)
pip install git+https://<username>:$PASS@github.com/echweb/echweb-utils.git

If you don’t want to use SSH, you could add the username and password in the HTTPS URL.

The code below assumes that you have a file called “pass” in the working directory that contains your password.

export PASS=$(cat pass)
pip install git+https://<username>:$PASS@github.com/echweb/echweb-utils.git

回答 11

oxyum的解决方案可以解决此问题。我只想指出,如果您使用进行安装,则需要小心,sudo因为密钥也必须存储为root(例如,/root/.ssh)。

然后你可以输入

sudo pip install git+ssh://git@github.com/echweb/echweb-utils.git

oxyum’s solution is OK for this answer. I just want to point out that you need to be careful if you are installing using sudo as the keys must be stored for root too (for example, /root/.ssh).

Then you can type

sudo pip install git+ssh://git@github.com/echweb/echweb-utils.git

回答 12

如果您在GitHub,GitLab等上拥有自己的库/软件包,则必须添加一个标签以提交该库的具体版本,例如v2.0,然后可以安装软件包:

pip install git+ssh://link/name/repo.git@v2.0

这对我有用。其他解决方案对我不起作用。

If you have your own library/package on GitHub, GitLab, etc., you have to add a tag to commit with a concrete version of the library, for example, v2.0, and then you can install your package:

pip install git+ssh://link/name/repo.git@v2.0

This works for me. Other solutions haven’t worked for me.


回答 13

这是一种对我有用的快速方法。只需分叉存储库,并使用您自己的GitHub帐户进行安装即可

pip install git+https://github.com/yourName/repoName

Here’s a quick method that worked for me. Simply fork the repo and install it from your own GitHub account with

pip install git+https://github.com/yourName/repoName

回答 14

只需从原始git clone命令(或从git remote -v)复制遥控器。您将获得如下内容:

  • 位桶: git+ssh://git@bitbucket.org:your_account/my_pro.git

  • 的GitHub: git+ssh://git@github.com:your_account/my_pro.git

接下来,您需要替换:/旁边的域名。

因此,使用以下命令进行安装:

pip install git+ssh://git@bitbucket.org/your_account/my_pro.git

Just copy the remote from the original git clone command (or from git remote -v). You will get something like this:

  • Bitbucket: git+ssh://git@bitbucket.org:your_account/my_pro.git

  • GitHub: git+ssh://git@github.com:your_account/my_pro.git

Next, you need to replace : with / next to the domain name.

So install using:

pip install git+ssh://git@bitbucket.org/your_account/my_pro.git

回答 15

你可以试试

pip install git+git@gitlab.mycorp.com/my_name/my_repo.git

没有ssh:...。这对我行得通。

You may try

pip install git+git@gitlab.mycorp.com/my_name/my_repo.git

without ssh:.... That works for me.


如何在浏览器中增加Jupyter / ipython笔记本的单元格宽度?

问题:如何在浏览器中增加Jupyter / ipython笔记本的单元格宽度?

我想在浏览器中增加ipython笔记本的宽度。我有一个高分辨率的屏幕,我想扩展单元格的宽度/大小以利用这个额外的空间。

谢谢!


编辑:5/2017

我现在使用jupyterthemes:https : //github.com/dunovank/jupyter-themes

和此命令:

jt -t oceans16 -f roboto -fs 12 -cellw 100%

可以将宽度设置为100%,并且主题很好。

I would like to increase the width of the ipython notebook in my browser. I have a high-resolution screen, and I would like to expand the cell width/size to make use of this extra space.

Thanks!


edit: 5/2017

I now use jupyterthemes: https://github.com/dunovank/jupyter-themes

and this command:

jt -t oceans16 -f roboto -fs 12 -cellw 100%

which sets the width to 100% with a nice theme.


回答 0

如果您不想更改默认设置,而只想更改正在使用的当前笔记本的宽度,则可以在单元格中输入以下内容:

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

If you don’t want to change your default settings, and you only want to change the width of the current notebook you’re working on, you can enter the following into a cell:

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

回答 1

div.cell解决方案实际上不适用于我的IPython,但是幸运的是有人提出了适用于新IPython的可行解决方案:

创建包含内容的文件~/.ipython/profile_default/static/custom/custom.css(iPython)或~/.jupyter/custom/custom.css(Jupyter)

.container { width:100% !important; }

然后重新启动iPython / Jupyter笔记本。请注意,这将影响所有笔记本电脑。

That div.cell solution didn’t actually work on my IPython, however luckily someone suggested a working solution for new IPythons:

Create a file ~/.ipython/profile_default/static/custom/custom.css (iPython) or ~/.jupyter/custom/custom.css (Jupyter) with content

.container { width:100% !important; }

Then restart iPython/Jupyter notebooks. Note that this will affect all notebooks.


回答 2

为了使它与jupyter(版本4.0.6)一起使用,我创建了以下内容~/.jupyter/custom/custom.css

/* Make the notebook cells take almost all available width */
.container {
    width: 99% !important;
}   

/* Prevent the edit cell highlight box from getting clipped;
 * important so that it also works when cell is in edit mode*/
div.cell.selected {
    border-left-width: 1px !important;
}

To get this to work with jupyter (version 4.0.6) I created ~/.jupyter/custom/custom.css containing:

/* Make the notebook cells take almost all available width */
.container {
    width: 99% !important;
}   

/* Prevent the edit cell highlight box from getting clipped;
 * important so that it also works when cell is in edit mode*/
div.cell.selected {
    border-left-width: 1px !important;
}

回答 3

是时候使用jupyterlab

最后,笔记本电脑急需升级。默认情况下,它使用窗口的整个宽度,就像其他任何成熟的本机IDE一样。

您要做的就是:

pip install jupyterlab
# if you use conda
conda install -c conda-forge jupyterlab
# to run 
jupyter lab    # instead of jupyter notebook

这是blog.Jupyter.org的屏幕截图

It’s time to use jupyterlab

Finally, a much-needed upgrade has come to notebooks. By default, it uses the full width of your window like any other full-fledged native IDE.

All you have to do is:

pip install jupyterlab
# if you use conda
conda install -c conda-forge jupyterlab
# to run 
jupyter lab    # instead of jupyter notebook

Here is a screenshot from blog.Jupyter.org


回答 4

全新安装后,我通常要做的是修改存储所有视觉样式的主css文件。我使用Miniconda,但位置与其他人相似C:\Miniconda3\Lib\site-packages\notebook\static\style\style.min.css

在某些屏幕上,这些分辨率是不同的,并且大于1。为安全起见,我将所有分辨率更改为98%,因此,如果从笔记本电脑上的外接屏幕断开连接,则屏幕宽度仍为98%。

然后,将1140px替换为屏幕宽度的98%

@media (min-width: 1200px) {
  .container {
    width: 1140px;
  }
}

在此处输入图片说明

编辑后

@media (min-width: 1200px) {
  .container {
    width: 98%;
  }
}

在此处输入图片说明 保存并重新启动笔记本


更新资料

最近不得不在已安装的环境中扩展Jupyter单元,这导致我回到这里提醒自己。

如果您需要在虚拟环境中进行安装,请先安装jupyter。您可以在此子目录中找到css文件

env/lib/python3.6/site-packages/notebook/static/style/stye.min.css

What I do usually after new installation is to modify the main css file where all visual styles are stored. I use Miniconda but location is similar with others C:\Miniconda3\Lib\site-packages\notebook\static\style\style.min.css

With some screens these resolutions are different and more than 1. To be on the safe side I change all to 98% so if I disconnect from my external screens on my laptop I still have 98% screen width.

Then just replace 1140px with 98% of the screen width.

@media (min-width: 1200px) {
  .container {
    width: 1140px;
  }
}

enter image description here

After editing

@media (min-width: 1200px) {
  .container {
    width: 98%;
  }
}

enter image description here Save and restart your notebook


Update

Recently had to wider Jupyter cells on an environment it is installed, which led me to come back here and remind myself.

If you need to do it in virtual env you installed jupyter on. You can find the css file in this subdir

env/lib/python3.6/site-packages/notebook/static/style/stye.min.css

回答 5

您可以通过从任何单元格调用样式表来设置笔记本的CSS。作为示例,请看Navier Stokes类12个步骤

特别是,创建一个包含

<style>
    div.cell{
        width:100%;
        margin-left:1%;
        margin-right:auto;
    }
</style>

应该给你一个起点。但是,可能有必要也进行调整,例如div.text_cell_render处理降价和代码单元。

如果是该文件,custom.css则添加包含以下内容的单元格:

from IPython.core.display import HTML
def css_styling():
    styles = open("custom.css", "r").read()
    return HTML(styles)
css_styling()

这将应用所有样式,尤其是更改像元宽度。

You can set the CSS of a notebook by calling a stylesheet from any cell. As an example, take a look at the 12 Steps to Navier Stokes course.

In particular, creating a file containing

<style>
    div.cell{
        width:100%;
        margin-left:1%;
        margin-right:auto;
    }
</style>

should give you a starting point. However, it may be necessary to also adjust e.g div.text_cell_render to deal with markdown as well as code cells.

If that file is custom.css then add a cell containing:

from IPython.core.display import HTML
def css_styling():
    styles = open("custom.css", "r").read()
    return HTML(styles)
css_styling()

This will apply all the stylings, and, in particular, change the cell width.


回答 6

(从2018年开始,我建议您尝试使用JupyterHub / JupyterLab。它使用监视器的整个宽度。如果这不是一种选择,则可能是因为您使用的是基于云的Jupyter即服务提供商,继续阅读)

(时尚被指控窃取用户数据,我已改为使用Stylus插件)

我建议使用时尚浏览器插件。这样,您可以覆盖所有笔记本的css,而无需向笔记本中添加任何代码。我们不喜欢在.ipython / profile_default中更改配置,因为我们正在为整个团队运行共享的Jupyter服务器,并且宽度是用户首选项。

我专门为垂直方向的高分辨率屏幕设计了一种样式,该样式使单元格变宽,并在底部添加了一些空白区域,因此您可以将最后一个单元格放置在屏幕的中央。 https://userstyles.org/styles/131230/jupyter-wide 当然,如果您使用其他布局,或者您不希望最后有多余的空格,则可以根据自己的喜好修改我的CSS。

最后但并非最不重要的一点是,Stylish是包含在工具集中的出色工具,因为您可以根据自己的喜好轻松自定义其他站点/工具(例如Jira,Podio,Slack等)。

@media (min-width: 1140px) {
  .container {
    width: 1130px;
  }
}

.end_space {
  height: 800px;
}

(As of 2018, I would advise trying out JupyterHub/JupyterLab. It uses the full width of the monitor. If this is not an option, maybe since you are using one of the cloud-based Jupyter-as-a-service providers, keep reading)

(Stylish is accused of stealing user data, I have moved on to using Stylus plugin instead)

I recommend using Stylish Browser Plugin. This way you can override css for all notebooks, without adding any code to notebooks. We don’t like to change configuration in .ipython/profile_default, since we are running a shared Jupyter server for the whole team and width is a user preference.

I made a style specifically for vertically-oriented high-res screens, that makes cells wider and adds a bit of empty-space in the bottom, so you can position the last cell in the centre of the screen. https://userstyles.org/styles/131230/jupyter-wide You can, of course, modify my css to your liking, if you have a different layout, or you don’t want extra empty-space in the end.

Last but not least, Stylish is a great tool to have in your toolset, since you can easily customise other sites/tools to your liking (e.g. Jira, Podio, Slack, etc.)

@media (min-width: 1140px) {
  .container {
    width: 1130px;
  }
}

.end_space {
  height: 800px;
}

回答 7

对于Chrome用户,我建议使用Stylebot,它可以让您覆盖任何页面上的所有CSS,还可以搜索和安装其他共享自定义CSS。但是,出于我们的目的,我们不需要任何高级主题。打开Stylebot,更改为Edit CSS。Jupyter捕获了一些击键,因此您将无法在其中键入以下代码。只需复制和粘贴,或者仅编辑即可:

#notebook-container.container {
    width: 90%;
}

根据需要更改宽度,我发现90%的外观比100%的外观好。但这完全取决于您。

For Chrome users, I recommend Stylebot, which will let you override any CSS on any page, also let you search and install other share custom CSS. However, for our purpose we don’t need any advance theme. Open Stylebot, change to Edit CSS. Jupyter captures some keystrokes, so you will not be able to type the code below in. Just copy and paste, or just your editor:

#notebook-container.container {
    width: 90%;
}

Change the width as you like, I find 90% looks nicer than 100%. But it is totally up to your eye.


回答 8

这是我最终使用的代码。它将输入和输出单元格向左和向右拉伸。请注意,输入/输出编号指示将消失:

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
display(HTML("<style>.output_result { max-width:100% !important; }</style>"))
display(HTML("<style>.prompt { display:none !important; }</style>"))

This is the code I ended up using. It stretches input & output cells to the left and right. Note that the input/output number indication will be gone:

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
display(HTML("<style>.output_result { max-width:100% !important; }</style>"))
display(HTML("<style>.prompt { display:none !important; }</style>"))

回答 9

我对@ jvd10的解决方案进行了一些修改。“!important”似乎太强了,以至于显示TOC侧栏时容器不能很好地适应。我将其删除并添加了“最小宽度”以限制最小宽度。

这是我的.juyputer / custom / custom.css:

/* Make the notebook cells take almost all available width and limit minimal width to 1110px */
.container {
    width: 99%;
    min-width: 1110px;
}   

/* Prevent the edit cell highlight box from getting clipped;
 * important so that it also works when cell is in edit mode*/
div.cell.selected {
    border-left-width: 1px;
}

I made some modification to @jvd10’s solution. The ‘!important’ seems too strong that the container doesn’t adapt well when TOC sidebar is displayed. I removed it and added ‘min-width’ to limit the minimal width.

Here is my .juyputer/custom/custom.css:

/* Make the notebook cells take almost all available width and limit minimal width to 1110px */
.container {
    width: 99%;
    min-width: 1110px;
}   

/* Prevent the edit cell highlight box from getting clipped;
 * important so that it also works when cell is in edit mode*/
div.cell.selected {
    border-left-width: 1px;
}

回答 10

我尝试了一切,但对我没有用,最终我将数据框显示为HTML,如下所示

from IPython.display import HTML    
HTML (pd.to_html())

I tried everything and nothing worked for me, I ended up using displaying my data frame as HTML as follows

from IPython.display import HTML    
HTML (pd.to_html())

回答 11

对于Firefox / Chrome用户,一种实现100%宽度的好方法是使用自定义TamperMonkey脚本。

好处是

  1. 在浏览器中配置一次,无需修改服务器配置。
  2. 与多个jupyter服务器一起使用。
  3. TamperMonkey受信任,维护且稳定。
  4. 通过javascript可以进行许多其他自定义。

该脚本对我有用https://gist.githubusercontent.com/mrk-andreev/2a9c2538fad0b687c27e192d5948834f/raw/6aa1148573dc20a22fca126e56e3b03f4abf281b/jpn_tmonkey.js

For Firefox/Chrome users, a nice way to achieve 100% width is to use a custom TamperMonkey script.

The benefits are

  1. configure this once in your browser, no need to modify the server configuration.
  2. works with multiple jupyter servers.
  3. TamperMonkey is trusted, maintained, and stable.
  4. Lots of additional customization is possible via javascript.

This script works for me https://gist.githubusercontent.com/mrk-andreev/2a9c2538fad0b687c27e192d5948834f/raw/6aa1148573dc20a22fca126e56e3b03f4abf281b/jpn_tmonkey.js


生成文件的MD5校验和

问题:生成文件的MD5校验和

有没有简单的方法可以在Python中生成(和检查)文件列表的MD5校验和?(我正在处理一个小程序,我想确认文件的校验和)。

Is there any simple way of generating (and checking) MD5 checksums of a list of files in Python? (I have a small program I’m working on, and I’d like to confirm the checksums of the files).


回答 0

您可以使用hashlib.md5()

请注意,有时您将无法在内存中容纳整个文件。在这种情况下,您将必须顺序读取4096个字节的块并将其提供给md5方法:

import hashlib
def md5(fname):
    hash_md5 = hashlib.md5()
    with open(fname, "rb") as f:
        for chunk in iter(lambda: f.read(4096), b""):
            hash_md5.update(chunk)
    return hash_md5.hexdigest()

注意: 如果只需要打包字节use ,hash_md5.hexdigest()则将返回摘要的十六进制字符串表示形式return hash_md5.digest(),因此您不必转换回去。

You can use hashlib.md5()

Note that sometimes you won’t be able to fit the whole file in memory. In that case, you’ll have to read chunks of 4096 bytes sequentially and feed them to the md5 method:

import hashlib
def md5(fname):
    hash_md5 = hashlib.md5()
    with open(fname, "rb") as f:
        for chunk in iter(lambda: f.read(4096), b""):
            hash_md5.update(chunk)
    return hash_md5.hexdigest()

Note: hash_md5.hexdigest() will return the hex string representation for the digest, if you just need the packed bytes use return hash_md5.digest(), so you don’t have to convert back.


回答 1

有一种方法使内存效率很低

单个文件:

import hashlib
def file_as_bytes(file):
    with file:
        return file.read()

print hashlib.md5(file_as_bytes(open(full_path, 'rb'))).hexdigest()

文件列表:

[(fname, hashlib.md5(file_as_bytes(open(fname, 'rb'))).digest()) for fname in fnamelst]

但是,请记住,MD5已知已损坏,并且不应将其用于任何目的,因为漏洞分析可能确实很棘手,并且分析代码可能用于将来的安全性问题是不可能的。恕我直言,应该从库中将其完全删除,以便使用它的每个人都必须进行更新。因此,这是您应该做的:

[(fname, hashlib.sha256(file_as_bytes(open(fname, 'rb'))).digest()) for fname in fnamelst]

如果只需要128位摘要,则可以执行.digest()[:16]

这将为您提供一个元组列表,每个元组都包含其文件名和哈希值。

我再次强烈质疑您对MD5的使用。您至少应该使用SHA1,并且鉴于SHA1中发现的最新缺陷,可能甚至没有。有人认为,只要您不将MD5用于“加密”目的,就可以了。但是,事情的范围最终趋向于超出您最初的预期,并且偶然的漏洞分析可能证明是完全有缺陷的。最好只是养成使用正确算法的习惯。只是输入了不同的字母而已。没那么难。

这是一种更复杂但内存有效的方法

import hashlib

def hash_bytestr_iter(bytesiter, hasher, ashexstr=False):
    for block in bytesiter:
        hasher.update(block)
    return hasher.hexdigest() if ashexstr else hasher.digest()

def file_as_blockiter(afile, blocksize=65536):
    with afile:
        block = afile.read(blocksize)
        while len(block) > 0:
            yield block
            block = afile.read(blocksize)


[(fname, hash_bytestr_iter(file_as_blockiter(open(fname, 'rb')), hashlib.md5()))
    for fname in fnamelst]

再说一次,由于MD5损坏了,不再应该再使用了:

[(fname, hash_bytestr_iter(file_as_blockiter(open(fname, 'rb')), hashlib.sha256()))
    for fname in fnamelst]

同样,如果只需要128位摘要[:16]hash_bytestr_iter(...)则可以在调用之后放置。

There is a way that’s pretty memory inefficient.

single file:

import hashlib
def file_as_bytes(file):
    with file:
        return file.read()

print hashlib.md5(file_as_bytes(open(full_path, 'rb'))).hexdigest()

list of files:

[(fname, hashlib.md5(file_as_bytes(open(fname, 'rb'))).digest()) for fname in fnamelst]

Recall though, that MD5 is known broken and should not be used for any purpose since vulnerability analysis can be really tricky, and analyzing any possible future use your code might be put to for security issues is impossible. IMHO, it should be flat out removed from the library so everybody who uses it is forced to update. So, here’s what you should do instead:

[(fname, hashlib.sha256(file_as_bytes(open(fname, 'rb'))).digest()) for fname in fnamelst]

If you only want 128 bits worth of digest you can do .digest()[:16].

This will give you a list of tuples, each tuple containing the name of its file and its hash.

Again I strongly question your use of MD5. You should be at least using SHA1, and given recent flaws discovered in SHA1, probably not even that. Some people think that as long as you’re not using MD5 for ‘cryptographic’ purposes, you’re fine. But stuff has a tendency to end up being broader in scope than you initially expect, and your casual vulnerability analysis may prove completely flawed. It’s best to just get in the habit of using the right algorithm out of the gate. It’s just typing a different bunch of letters is all. It’s not that hard.

Here is a way that is more complex, but memory efficient:

import hashlib

def hash_bytestr_iter(bytesiter, hasher, ashexstr=False):
    for block in bytesiter:
        hasher.update(block)
    return hasher.hexdigest() if ashexstr else hasher.digest()

def file_as_blockiter(afile, blocksize=65536):
    with afile:
        block = afile.read(blocksize)
        while len(block) > 0:
            yield block
            block = afile.read(blocksize)


[(fname, hash_bytestr_iter(file_as_blockiter(open(fname, 'rb')), hashlib.md5()))
    for fname in fnamelst]

And, again, since MD5 is broken and should not really ever be used anymore:

[(fname, hash_bytestr_iter(file_as_blockiter(open(fname, 'rb')), hashlib.sha256()))
    for fname in fnamelst]

Again, you can put [:16] after the call to hash_bytestr_iter(...) if you only want 128 bits worth of digest.


回答 2

我显然没有添加任何根本上没有新的内容,而是在我要评论状态之前添加了此答案,并且代码区域使事情更加清晰了-无论如何,特别是要从Omnifarious的答案中回答@Nemo的问题:

我碰巧在考虑校验和(特别是在这里寻找有关块大小的建议),并且发现此方法可能比您期望的要快。以最快的(但相当典型值)timeit.timeit/usr/bin/time从每个执行校验和的约文件的几种方法的结果。11MB:

$ ./sum_methods.py
crc32_mmap(filename) 0.0241742134094
crc32_read(filename) 0.0219960212708
subprocess.check_output(['cksum', filename]) 0.0553209781647
md5sum_mmap(filename) 0.0286180973053
md5sum_read(filename) 0.0311000347137
subprocess.check_output(['md5sum', filename]) 0.0332629680634
$ time md5sum /tmp/test.data.300k
d3fe3d5d4c2460b5daacc30c6efbc77f  /tmp/test.data.300k

real    0m0.043s
user    0m0.032s
sys     0m0.010s
$ stat -c '%s' /tmp/test.data.300k
11890400

因此,对于11MB的文件来说,Python和/ usr / bin / md5sum大约都需要30毫秒。相关md5sum功能(md5sum_read在上面的清单中)与Omnifarious的功能非常相似:

import hashlib
def md5sum(filename, blocksize=65536):
    hash = hashlib.md5()
    with open(filename, "rb") as f:
        for block in iter(lambda: f.read(blocksize), b""):
            hash.update(block)
    return hash.hexdigest()

当然,这些都是单次运行的(mmap至少进行几十次运行时,总是总是更快一些),并且f.read(blocksize)在缓冲区用完后,我的通常会获得额外的收入,但是它是相当可重复的,并且md5sum在命令行上显示不一定比Python实现要快…

编辑:抱歉,很长的延迟,已经有一段时间没有看到了,但是为了回答@EdRandall的问题,我将写下一个Adler32实现。但是,我还没有运行基准测试。它基本上与CRC32相同:除了初始化,更新和摘要调用外,其他所有操作都是zlib.adler32()调用:

import zlib
def adler32sum(filename, blocksize=65536):
    checksum = zlib.adler32("")
    with open(filename, "rb") as f:
        for block in iter(lambda: f.read(blocksize), b""):
            checksum = zlib.adler32(block, checksum)
    return checksum & 0xffffffff

请注意,这必须与空字符串从零对他们的总和启动时开始,随着阿德勒资金做的确有所不同"",这是1– CRC可以开始0代替。AND需要使用-ing使其成为32位无符号整数,以确保其在Python版本之间返回相同的值。

I’m clearly not adding anything fundamentally new, but added this answer before I was up to commenting status, plus the code regions make things more clear — anyway, specifically to answer @Nemo’s question from Omnifarious’s answer:

I happened to be thinking about checksums a bit (came here looking for suggestions on block sizes, specifically), and have found that this method may be faster than you’d expect. Taking the fastest (but pretty typical) timeit.timeit or /usr/bin/time result from each of several methods of checksumming a file of approx. 11MB:

$ ./sum_methods.py
crc32_mmap(filename) 0.0241742134094
crc32_read(filename) 0.0219960212708
subprocess.check_output(['cksum', filename]) 0.0553209781647
md5sum_mmap(filename) 0.0286180973053
md5sum_read(filename) 0.0311000347137
subprocess.check_output(['md5sum', filename]) 0.0332629680634
$ time md5sum /tmp/test.data.300k
d3fe3d5d4c2460b5daacc30c6efbc77f  /tmp/test.data.300k

real    0m0.043s
user    0m0.032s
sys     0m0.010s
$ stat -c '%s' /tmp/test.data.300k
11890400

So, looks like both Python and /usr/bin/md5sum take about 30ms for an 11MB file. The relevant md5sum function (md5sum_read in the above listing) is pretty similar to Omnifarious’s:

import hashlib
def md5sum(filename, blocksize=65536):
    hash = hashlib.md5()
    with open(filename, "rb") as f:
        for block in iter(lambda: f.read(blocksize), b""):
            hash.update(block)
    return hash.hexdigest()

Granted, these are from single runs (the mmap ones are always a smidge faster when at least a few dozen runs are made), and mine’s usually got an extra f.read(blocksize) after the buffer is exhausted, but it’s reasonably repeatable and shows that md5sum on the command line is not necessarily faster than a Python implementation…

EDIT: Sorry for the long delay, haven’t looked at this in some time, but to answer @EdRandall’s question, I’ll write down an Adler32 implementation. However, I haven’t run the benchmarks for it. It’s basically the same as the CRC32 would have been: instead of the init, update, and digest calls, everything is a zlib.adler32() call:

import zlib
def adler32sum(filename, blocksize=65536):
    checksum = zlib.adler32("")
    with open(filename, "rb") as f:
        for block in iter(lambda: f.read(blocksize), b""):
            checksum = zlib.adler32(block, checksum)
    return checksum & 0xffffffff

Note that this must start off with the empty string, as Adler sums do indeed differ when starting from zero versus their sum for "", which is 1 — CRC can start with 0 instead. The AND-ing is needed to make it a 32-bit unsigned integer, which ensures it returns the same value across Python versions.


回答 3

在Python 3.8+中,您可以执行

import hashlib
with open("your_filename.txt", "rb") as f:
    file_hash = hashlib.md5()
    while chunk := f.read(8192):
        file_hash.update(chunk)

print(file_hash.digest())
print(file_hash.hexdigest())  # to get a printable str instead of bytes

考虑使用hashlib.blake2b而不是md5(只需在上面的代码段中替换md5blake2b)。它的加密安全性比MD5 更快

In Python 3.8+ you can do

import hashlib
with open("your_filename.txt", "rb") as f:
    file_hash = hashlib.md5()
    while chunk := f.read(8192):
        file_hash.update(chunk)

print(file_hash.digest())
print(file_hash.hexdigest())  # to get a printable str instead of bytes

Consider using hashlib.blake2b instead of md5 (just replace md5 with blake2b in the above snippet). It’s cryptographically secure and faster than MD5.


回答 4

hashlib.md5(pathlib.Path('path/to/file').read_bytes()).hexdigest()
hashlib.md5(pathlib.Path('path/to/file').read_bytes()).hexdigest()

回答 5

我认为依靠invoke包和md5sum二进制文件比子进程或md5包更方便

import invoke

def get_file_hash(path):

    return invoke.Context().run("md5sum {}".format(path), hide=True).stdout.split(" ")[0]

当然,这假定您已经安装了invoke和md5sum。

I think relying on invoke package and md5sum binary is a bit more convenient than subprocess or md5 package

import invoke

def get_file_hash(path):

    return invoke.Context().run("md5sum {}".format(path), hide=True).stdout.split(" ")[0]

This of course assumes you have invoke and md5sum installed.