标签归档:python-3.x

是否可以“破解” Python的打印功能?

问题:是否可以“破解” Python的打印功能?

注意:此问题仅供参考。我很想知道这样做有多深入到Python内部。

不久之前,在某个问题的内部开始了一个讨论,该问题是关于传递给print语句的字符串是否可以在调用to之后/期间进行修改print。例如,考虑以下功能:

def print_something():
    print('This cat was scared.')

现在,当print运行时,到终端的输出应显示:

This dog was scared.

请注意,单词“ cat”已被单词“ dog”代替。某处某种方式能够修改那些内部缓冲区以更改打印的内容。假设这样做是在未经原始代码作者明确许可的情况下进行的(因此,被黑客/劫持)。

这个评论从智者@abarnert,尤其让我思考:

有两种方法可以做到这一点,但是它们都很丑陋,绝不应该这样做。最丑陋的方法是code用一个带有不同co_consts 列表的对象替换 函数内部的对象。接下来可能是进入C API以访问str的内部缓冲区。[…]

因此,看起来这实际上是可能的。

这是我解决此问题的幼稚方法:

>>> import inspect
>>> exec(inspect.getsource(print_something).replace('cat', 'dog'))
>>> print_something()
This dog was scared.

当然,这exec很糟糕,但这并不能真正回答问题,因为 print调用when / after之后,它实际上并未进行任何修改。

正如@abarnert解释的那样,它将如何进行?

Note: This question is for informational purposes only. I am interested to see how deep into Python’s internals it is possible to go with this.

Not very long ago, a discussion began inside a certain question regarding whether the strings passed to print statements could be modified after/during the call to print has been made. For example, consider the function:

def print_something():
    print('This cat was scared.')

Now, when print is run, then the output to the terminal should display:

This dog was scared.

Notice the word “cat” has been replaced by the word “dog”. Something somewhere somehow was able to modify those internal buffers to change what was printed. Assume this is done without the original code author’s explicit permission (hence, hacking/hijacking).

This comment from the wise @abarnert, in particular, got me thinking:

There are a couple of ways to do that, but they’re all very ugly, and should never be done. The least ugly way is to probably replace the code object inside the function with one with a different co_consts list. Next is probably reaching into the C API to access the str’s internal buffer. […]

So, it looks like this is actually possible.

Here’s my naive way of approaching this problem:

>>> import inspect
>>> exec(inspect.getsource(print_something).replace('cat', 'dog'))
>>> print_something()
This dog was scared.

Of course, exec is bad, but that doesn’t really answer the question, because it does not actually modify anything during when/after print is called.

How would it be done as @abarnert has explained it?


回答 0

首先,实际上没有那么多hacky方式。我们要做的就是更改print打印内容,对吗?

_print = print
def print(*args, **kw):
    args = (arg.replace('cat', 'dog') if isinstance(arg, str) else arg
            for arg in args)
    _print(*args, **kw)

或者,类似地,您可以选择Monkey补丁sys.stdout而不是print


同样,这个exec … getsource …想法也没有错。好吧,这当然有很多问题,但是比这里的要少…


但是,如果您确实想修改函数对象的代码常量,则可以这样做。

如果您真的想真正使用代码对象,则应该使用bytecode(完成时)或byteplay(直到那时,或者对于较旧的Python版本)之类的库,而不是手动执行。即使对于这种琐碎的事情,CodeType初始化器还是很痛苦的。如果您确实需要做一些固定的工作lnotab,那么只有疯子才会手动进行。

另外,不用说,并非所有的Python实现都使用CPython风格的代码对象。这段代码可以在CPython 3.7中使用,并且可能所有版本都可以回溯到至少2.2,但需要进行一些细微的更改(不是代码黑客的东西,而是生成器表达式之类的东西),但是不适用于任何版本的IronPython。

import types

def print_function():
    print ("This cat was scared.")

def main():
    # A function object is a wrapper around a code object, with
    # a bit of extra stuff like default values and closure cells.
    # See inspect module docs for more details.
    co = print_function.__code__
    # A code object is a wrapper around a string of bytecode, with a
    # whole bunch of extra stuff, including a list of constants used
    # by that bytecode. Again see inspect module docs. Anyway, inside
    # the bytecode for string (which you can read by typing
    # dis.dis(string) in your REPL), there's going to be an
    # instruction like LOAD_CONST 1 to load the string literal onto
    # the stack to pass to the print function, and that works by just
    # reading co.co_consts[1]. So, that's what we want to change.
    consts = tuple(c.replace("cat", "dog") if isinstance(c, str) else c
                   for c in co.co_consts)
    # Unfortunately, code objects are immutable, so we have to create
    # a new one, copying over everything except for co_consts, which
    # we'll replace. And the initializer has a zillion parameters.
    # Try help(types.CodeType) at the REPL to see the whole list.
    co = types.CodeType(
        co.co_argcount, co.co_kwonlyargcount, co.co_nlocals,
        co.co_stacksize, co.co_flags, co.co_code,
        consts, co.co_names, co.co_varnames, co.co_filename,
        co.co_name, co.co_firstlineno, co.co_lnotab,
        co.co_freevars, co.co_cellvars)
    print_function.__code__ = co
    print_function()

main()

入侵代码对象可能会出什么问题?大多数情况下,segfaults RuntimeError会耗尽整个堆栈,更正常RuntimeError的segfault 会被处理,或者垃圾值可能只会引发a TypeErrorAttributeError当您尝试使用它们时。例如,尝试创建一个代码对象,该对象只带有一个RETURN_VALUE在堆栈上没有任何内容的字节码b'S\0'(3.6以上的字节码,b'S'之前),或者一个空元组(表示字节码中是否co_consts有a LOAD_CONST 0,或者varnames减1,因此最高的字节LOAD_FAST实际上加载了一个freevar) / cellvar单元格。为了获得一些真正的乐趣,如果您lnotab弄错了太多,那么只有在调试器中运行代码时,您的代码才会出现段错误。

使用bytecodebyteplay不会保护您免受所有这些问题的影响,但是它们确实具有一些基本的健全性检查,并且好的助手可以让您执行诸如插入代码块之类的事情,并使其担心更新所有偏移量和标签,以便您能够弄错了,等等。(此外,它们使您不必键入该可笑的6行构造函数,也不必调试由此产生的愚蠢的错字。)


现在进入第二。

我提到代码对象是不可变的。当然,const是一个元组,因此我们不能直接更改它。const元组中的东西是一个字符串,我们也不能直接更改它。这就是为什么我必须构建一个新的字符串来构建一个新的元组来构建一个新的代码对象的原因。

但是,如果您可以直接更改字符串怎么办?

好吧,在足够深入的内容下,所有内容都只是指向某些C数据的指针,对吗?如果您使用的是CPython,则有一个C API可以访问对象,并且可以使用ctypes它从Python本身内部访问该API,这是一个很糟糕的想法,他们将它们放在pythonapistdlib的ctypes模块中。:)您需要了解的最重要的技巧id(x)是实际指向x内存的指针(作为int)。

不幸的是,用于字符串的C API不能让我们安全地获取已经冻结的字符串的内部存储。因此,请放心,我们只需要阅读头文件并自己找到该存储即可。

如果您使用的是CPython 3.4-3.7(旧版本有所不同,谁知道未来),那么将使用紧凑ASCII格式存储由纯ASCII组成的模块中的字符串文字。提早结束,并且ASCII字节的缓冲区立即在内存中。如果您在字符串或某些非文字字符串中输入非ASCII字符,这将中断(可能在段错误中),但是您可以阅读其他4种方式来访问不同类型字符串的缓冲区。

为了使事情变得简单一些,我在superhackyinternalsGitHub上使用了该项目。(这是有意不可pip安装的,因为您真的不应该使用它,除非尝试在本地构建解释器等。)

import ctypes
import internals # https://github.com/abarnert/superhackyinternals/blob/master/internals.py

def print_function():
    print ("This cat was scared.")

def main():
    for c in print_function.__code__.co_consts:
        if isinstance(c, str):
            idx = c.find('cat')
            if idx != -1:
                # Too much to explain here; just guess and learn to
                # love the segfaults...
                p = internals.PyUnicodeObject.from_address(id(c))
                assert p.compact and p.ascii
                addr = id(c) + internals.PyUnicodeObject.utf8_length.offset
                buf = (ctypes.c_int8 * 3).from_address(addr + idx)
                buf[:3] = b'dog'

    print_function()

main()

如果您想玩这些东西,int则比起隐藏起来要简单得多str。而且,通过更改2to 的值来猜测可以破坏什么,容易得多1,对吗?实际上,忘记想象,让我们开始吧(superhackyinternals再次使用类型):

>>> n = 2
>>> pn = PyLongObject.from_address(id(n))
>>> pn.ob_digit[0]
2
>>> pn.ob_digit[0] = 1
>>> 2
1
>>> n * 3
3
>>> i = 10
>>> while i < 40:
...     i *= 2
...     print(i)
10
10
10

…假设代码框具有无限长的滚动条。

我在IPython中尝试过同样的事情,并且第一次尝试2在提示符下进行评估,它陷入了某种不间断的无限循环。大概2是在REPL循环中将数字用于某物,而股票解释器不是吗?

First, there’s actually a much less hacky way. All we want to do is change what print prints, right?

_print = print
def print(*args, **kw):
    args = (arg.replace('cat', 'dog') if isinstance(arg, str) else arg
            for arg in args)
    _print(*args, **kw)

Or, similarly, you can monkeypatch sys.stdout instead of print.


Also, nothing wrong with the exec … getsource … idea. Well, of course there’s plenty wrong with it, but less than what follows here…


But if you do want to modify the function object’s code constants, we can do that.

If you really want to play around with code objects for real, you should use a library like bytecode (when it’s finished) or byteplay (until then, or for older Python versions) instead of doing it manually. Even for something this trivial, the CodeType initializer is a pain; if you actually need to do stuff like fixing up lnotab, only a lunatic would do that manually.

Also, it goes without saying that not all Python implementations use CPython-style code objects. This code will work in CPython 3.7, and probably all versions back to at least 2.2 with a few minor changes (and not the code-hacking stuff, but things like generator expressions), but it won’t work with any version of IronPython.

import types

def print_function():
    print ("This cat was scared.")

def main():
    # A function object is a wrapper around a code object, with
    # a bit of extra stuff like default values and closure cells.
    # See inspect module docs for more details.
    co = print_function.__code__
    # A code object is a wrapper around a string of bytecode, with a
    # whole bunch of extra stuff, including a list of constants used
    # by that bytecode. Again see inspect module docs. Anyway, inside
    # the bytecode for string (which you can read by typing
    # dis.dis(string) in your REPL), there's going to be an
    # instruction like LOAD_CONST 1 to load the string literal onto
    # the stack to pass to the print function, and that works by just
    # reading co.co_consts[1]. So, that's what we want to change.
    consts = tuple(c.replace("cat", "dog") if isinstance(c, str) else c
                   for c in co.co_consts)
    # Unfortunately, code objects are immutable, so we have to create
    # a new one, copying over everything except for co_consts, which
    # we'll replace. And the initializer has a zillion parameters.
    # Try help(types.CodeType) at the REPL to see the whole list.
    co = types.CodeType(
        co.co_argcount, co.co_kwonlyargcount, co.co_nlocals,
        co.co_stacksize, co.co_flags, co.co_code,
        consts, co.co_names, co.co_varnames, co.co_filename,
        co.co_name, co.co_firstlineno, co.co_lnotab,
        co.co_freevars, co.co_cellvars)
    print_function.__code__ = co
    print_function()

main()

What could go wrong with hacking up code objects? Mostly just segfaults, RuntimeErrors that eat up the whole stack, more normal RuntimeErrors that can be handled, or garbage values that will probably just raise a TypeError or AttributeError when you try to use them. For examples, try creating a code object with just a RETURN_VALUE with nothing on the stack (bytecode b'S\0' for 3.6+, b'S' before), or with an empty tuple for co_consts when there’s a LOAD_CONST 0 in the bytecode, or with varnames decremented by 1 so the highest LOAD_FAST actually loads a freevar/cellvar cell. For some real fun, if you get the lnotab wrong enough, your code will only segfault when run in the debugger.

Using bytecode or byteplay won’t protect you from all of those problems, but they do have some basic sanity checks, and nice helpers that let you do things like insert a chunk of code and let it worry about updating all offsets and labels so you can’t get it wrong, and so on. (Plus, they keep you from having to type in that ridiculous 6-line constructor, and having to debug the silly typos that come from doing so.)


Now on to #2.

I mentioned that code objects are immutable. And of course the consts are a tuple, so we can’t change that directly. And the thing in the const tuple is a string, which we also can’t change directly. That’s why I had to build a new string to build a new tuple to build a new code object.

But what if you could change a string directly?

Well, deep enough under the covers, everything is just a pointer to some C data, right? If you’re using CPython, there’s a C API to access the objects, and you can use ctypes to access that API from within Python itself, which is such a terrible idea that they put a pythonapi right there in the stdlib’s ctypes module. :) The most important trick you need to know is that id(x) is the actual pointer to x in memory (as an int).

Unfortunately, the C API for strings won’t let us safely get at the internal storage of an already-frozen string. So screw safely, let’s just read the header files and find that storage ourselves.

If you’re using CPython 3.4 – 3.7 (it’s different for older versions, and who knows for the future), a string literal from a module that’s made of pure ASCII is going to be stored using the compact ASCII format, which means the struct ends early and the buffer of ASCII bytes follows immediately in memory. This will break (as in probably segfault) if you put a non-ASCII character in the string, or certain kinds of non-literal strings, but you can read up on the other 4 ways to access the buffer for different kinds of strings.

To make things slightly easier, I’m using the superhackyinternals project off my GitHub. (It’s intentionally not pip-installable because you really shouldn’t be using this except to experiment with your local build of the interpreter and the like.)

import ctypes
import internals # https://github.com/abarnert/superhackyinternals/blob/master/internals.py

def print_function():
    print ("This cat was scared.")

def main():
    for c in print_function.__code__.co_consts:
        if isinstance(c, str):
            idx = c.find('cat')
            if idx != -1:
                # Too much to explain here; just guess and learn to
                # love the segfaults...
                p = internals.PyUnicodeObject.from_address(id(c))
                assert p.compact and p.ascii
                addr = id(c) + internals.PyUnicodeObject.utf8_length.offset
                buf = (ctypes.c_int8 * 3).from_address(addr + idx)
                buf[:3] = b'dog'

    print_function()

main()

If you want to play with this stuff, int is a whole lot simpler under the covers than str. And it’s a lot easier to guess what you can break by changing the value of 2 to 1, right? Actually, forget imagining, let’s just do it (using the types from superhackyinternals again):

>>> n = 2
>>> pn = PyLongObject.from_address(id(n))
>>> pn.ob_digit[0]
2
>>> pn.ob_digit[0] = 1
>>> 2
1
>>> n * 3
3
>>> i = 10
>>> while i < 40:
...     i *= 2
...     print(i)
10
10
10

… pretend that code box has an infinite-length scrollbar.

I tried the same thing in IPython, and the first time I tried to evaluate 2 at the prompt, it went into some kind of uninterruptable infinite loop. Presumably it’s using the number 2 for something in its REPL loop, while the stock interpreter isn’t?


回答 1

Monkey补丁 print

print是一个内置函数,因此它将使用模块(或Python 2)中print定义的函数。因此,无论何时要修改或更改内置函数的行为,都可以在该模块中简单地重新分配名称。builtins__builtin__

此过程称为monkey-patching

# Store the real print function in another variable otherwise
# it will be inaccessible after being modified.
_print = print  

# Actual implementation of the new print
def custom_print(*args, **options):
    _print('custom print called')
    _print(*args, **options)

# Change the print function globally
import builtins
builtins.print = custom_print

之后,即使是在外部模块中,每个print调用也都将通过。custom_printprint

但是,您实际上并不想打印其他文本,而是要更改打印的文本。一种解决方法是将其替换为要打印的字符串:

_print = print  

def custom_print(*args, **options):
    # Get the desired seperator or the default whitspace
    sep = options.pop('sep', ' ')
    # Create the final string
    printed_string = sep.join(args)
    # Modify the final string
    printed_string = printed_string.replace('cat', 'dog')
    # Call the default print function
    _print(printed_string, **options)

import builtins
builtins.print = custom_print

实际上,如果您运行:

>>> def print_something():
...     print('This cat was scared.')
>>> print_something()
This dog was scared.

或者,如果您将其写入文件:

test_file.py

def print_something():
    print('This cat was scared.')

print_something()

并导入:

>>> import test_file
This dog was scared.
>>> test_file.print_something()
This dog was scared.

因此,它确实按预期工作。

但是,如果您只想临时打印Monkey补丁,可以将其包装在上下文管理器中:

import builtins

class ChangePrint(object):
    def __init__(self):
        self.old_print = print

    def __enter__(self):
        def custom_print(*args, **options):
            # Get the desired seperator or the default whitspace
            sep = options.pop('sep', ' ')
            # Create the final string
            printed_string = sep.join(args)
            # Modify the final string
            printed_string = printed_string.replace('cat', 'dog')
            # Call the default print function
            self.old_print(printed_string, **options)

        builtins.print = custom_print

    def __exit__(self, *args, **kwargs):
        builtins.print = self.old_print

因此,当您运行时,它取决于上下文,显示的内容是:

>>> with ChangePrint() as x:
...     test_file.print_something()
... 
This dog was scared.
>>> test_file.print_something()
This cat was scared.

这样便可以print通过Monkey补丁“破解” 。

修改目标,而不是 print

如果您看一下签名,print则会注意到默认情况下有一个file参数sys.stdout。请注意,这是一个动态默认参数(每次调用时都会真正查找),而不像Python中的普通默认参数。因此,如果您进行更改,则实际上将打印到其他目标会更加方便,因为Python还提供了一个功能(从Python 3.4开始,但是为早期的Python版本创建等效功能很容易)。sys.stdoutprintsys.stdout printredirect_stdout

缺点是它不适用于print不打印到的语句,sys.stdout并且创建自己的语句stdout并不是很简单。

import io
import sys

class CustomStdout(object):
    def __init__(self, *args, **kwargs):
        self.current_stdout = sys.stdout

    def write(self, string):
        self.current_stdout.write(string.replace('cat', 'dog'))

但是,这也可以:

>>> import contextlib
>>> with contextlib.redirect_stdout(CustomStdout()):
...     test_file.print_something()
... 
This dog was scared.
>>> test_file.print_something()
This cat was scared.

摘要

@abarnet已经提到了其中一些观点,但是我想更详细地探讨这些选项。特别是如何跨模块(使用builtins/ __builtin__)修改它,以及如何仅临时更改(使用contextmanagers)。

Monkey-patch print

print is a builtin function so it will use the print function defined in the builtins module (or __builtin__ in Python 2). So whenever you want to modify or change the behavior of a builtin function you can simply reassign the name in that module.

This process is called monkey-patching.

# Store the real print function in another variable otherwise
# it will be inaccessible after being modified.
_print = print  

# Actual implementation of the new print
def custom_print(*args, **options):
    _print('custom print called')
    _print(*args, **options)

# Change the print function globally
import builtins
builtins.print = custom_print

After that every print call will go through custom_print, even if the print is in an external module.

However you don’t really want to print additional text, you want to change the text that is printed. One way to go about that is to replace it in the string that would be printed:

_print = print  

def custom_print(*args, **options):
    # Get the desired seperator or the default whitspace
    sep = options.pop('sep', ' ')
    # Create the final string
    printed_string = sep.join(args)
    # Modify the final string
    printed_string = printed_string.replace('cat', 'dog')
    # Call the default print function
    _print(printed_string, **options)

import builtins
builtins.print = custom_print

And indeed if you run:

>>> def print_something():
...     print('This cat was scared.')
>>> print_something()
This dog was scared.

Or if you write that to a file:

test_file.py

def print_something():
    print('This cat was scared.')

print_something()

and import it:

>>> import test_file
This dog was scared.
>>> test_file.print_something()
This dog was scared.

So it really works as intended.

However, in case you only temporarily want to monkey-patch print you could wrap this in a context-manager:

import builtins

class ChangePrint(object):
    def __init__(self):
        self.old_print = print

    def __enter__(self):
        def custom_print(*args, **options):
            # Get the desired seperator or the default whitspace
            sep = options.pop('sep', ' ')
            # Create the final string
            printed_string = sep.join(args)
            # Modify the final string
            printed_string = printed_string.replace('cat', 'dog')
            # Call the default print function
            self.old_print(printed_string, **options)

        builtins.print = custom_print

    def __exit__(self, *args, **kwargs):
        builtins.print = self.old_print

So when you run that it depends on the context what is printed:

>>> with ChangePrint() as x:
...     test_file.print_something()
... 
This dog was scared.
>>> test_file.print_something()
This cat was scared.

So that’s how you could “hack” print by monkey-patching.

Modify the target instead of the print

If you look at the signature of print you’ll notice a file argument which is sys.stdout by default. Note that this is a dynamic default argument (it really looks up sys.stdout every time you call print) and not like normal default arguments in Python. So if you change sys.stdout print will actually print to the different target even more convenient that Python also provides a redirect_stdout function (from Python 3.4 on, but it’s easy to create an equivalent function for earlier Python versions).

The downside is that it won’t work for print statements that don’t print to sys.stdout and that creating your own stdout isn’t really straightforward.

import io
import sys

class CustomStdout(object):
    def __init__(self, *args, **kwargs):
        self.current_stdout = sys.stdout

    def write(self, string):
        self.current_stdout.write(string.replace('cat', 'dog'))

However this also works:

>>> import contextlib
>>> with contextlib.redirect_stdout(CustomStdout()):
...     test_file.print_something()
... 
This dog was scared.
>>> test_file.print_something()
This cat was scared.

Summary

Some of these points have already be mentioned by @abarnet but I wanted to explore these options in more detail. Especially how to modify it across modules (using builtins/__builtin__) and how to make that change only temporary (using contextmanagers).


回答 2

print函数捕获所有输出然后对其进行处理的一种简单方法是将输出流更改为其他内容,例如文件。

我将使用PHP命名约定(ob_startob_get_contents,…)

from functools import partial
output_buffer = None
print_orig = print
def ob_start(fname="print.txt"):
    global print
    global output_buffer
    print = partial(print_orig, file=output_buffer)
    output_buffer = open(fname, 'w')
def ob_end():
    global output_buffer
    close(output_buffer)
    print = print_orig
def ob_get_contents(fname="print.txt"):
    return open(fname, 'r').read()

用法:

print ("Hi John")
ob_start()
print ("Hi John")
ob_end()
print (ob_get_contents().replace("Hi", "Bye"))

将打印

嗨约翰再见约翰

A simple way to capture all output from a print function and then process it, is to change the output stream to something else, e.g. a file.

I’ll use a PHP naming conventions (ob_start, ob_get_contents,…)

from functools import partial
output_buffer = None
print_orig = print
def ob_start(fname="print.txt"):
    global print
    global output_buffer
    print = partial(print_orig, file=output_buffer)
    output_buffer = open(fname, 'w')
def ob_end():
    global output_buffer
    close(output_buffer)
    print = print_orig
def ob_get_contents(fname="print.txt"):
    return open(fname, 'r').read()

Usage:

print ("Hi John")
ob_start()
print ("Hi John")
ob_end()
print (ob_get_contents().replace("Hi", "Bye"))

Would print

Hi John Bye John


回答 3

让我们将其与帧自省结合起来!

import sys

_print = print

def print(*args, **kw):
    frame = sys._getframe(1)
    _print(frame.f_code.co_name)
    _print(*args, **kw)

def greetly(name, greeting = "Hi")
    print(f"{greeting}, {name}!")

class Greeter:
    def __init__(self, greeting = "Hi"):
        self.greeting = greeting
    def greet(self, name):
        print(f"{self.greeting}, {name}!")

您会发现此技巧在调用函数或方法的每个问候语前都有序。这对于日志记录或调试可能非常有用;特别是因为它可以让您“劫持”第三方代码中的打印语句。

Let’s combine this with frame introspection!

import sys

_print = print

def print(*args, **kw):
    frame = sys._getframe(1)
    _print(frame.f_code.co_name)
    _print(*args, **kw)

def greetly(name, greeting = "Hi")
    print(f"{greeting}, {name}!")

class Greeter:
    def __init__(self, greeting = "Hi"):
        self.greeting = greeting
    def greet(self, name):
        print(f"{self.greeting}, {name}!")

You’ll find this trick prefaces every greeting with the calling function or method. This might be very useful for logging or debugging; especially as it lets you “hijack” print statements in third party code.


对于Python 3.x整数,比位移快两倍?

问题:对于Python 3.x整数,比位移快两倍?

我正在查看sorted_containers的来源,很惊讶地看到这一行

self._load, self._twice, self._half = load, load * 2, load >> 1

load是整数。为什么在一个位置使用位移,而在另一位置使用乘法?移位可能比整数除以2快,但这是合理的,但是为什么不还用移位代替乘法呢?我对以下情况进行了基准测试:

  1. (时间,分)
  2. (班次,班次)
  3. (时间,班次)
  4. (平移,除法)

并发现#3始终比其他替代方法快:

# self._load, self._twice, self._half = load, load * 2, load >> 1

import random
import timeit
import pandas as pd

x = random.randint(10 ** 3, 10 ** 6)

def test_naive():
    a, b, c = x, 2 * x, x // 2

def test_shift():
    a, b, c = x, x << 1, x >> 1    

def test_mixed():
    a, b, c = x, x * 2, x >> 1    

def test_mixed_swapped():
    a, b, c = x, x << 1, x // 2

def observe(k):
    print(k)
    return {
        'naive': timeit.timeit(test_naive),
        'shift': timeit.timeit(test_shift),
        'mixed': timeit.timeit(test_mixed),
        'mixed_swapped': timeit.timeit(test_mixed_swapped),
    }

def get_observations():
    return pd.DataFrame([observe(k) for k in range(100)])

问题:

我的考试有效吗?如果是这样,为什么(乘法,移位)比(移位,移位)快?

我在Ubuntu 14.04上运行Python 3.5。

编辑

以上是问题的原始陈述。Dan Getz在回答中提供了出色的解释。

为了完整起见,以下是x不适用乘法优化时的较大示例示例。

I was looking at the source of sorted_containers and was surprised to see this line:

self._load, self._twice, self._half = load, load * 2, load >> 1

Here load is an integer. Why use bit shift in one place, and multiplication in another? It seems reasonable that bit shifting may be faster than integral division by 2, but why not replace the multiplication by a shift as well? I benchmarked the the following cases:

  1. (times, divide)
  2. (shift, shift)
  3. (times, shift)
  4. (shift, divide)

and found that #3 is consistently faster than other alternatives:

# self._load, self._twice, self._half = load, load * 2, load >> 1

import random
import timeit
import pandas as pd

x = random.randint(10 ** 3, 10 ** 6)

def test_naive():
    a, b, c = x, 2 * x, x // 2

def test_shift():
    a, b, c = x, x << 1, x >> 1    

def test_mixed():
    a, b, c = x, x * 2, x >> 1    

def test_mixed_swapped():
    a, b, c = x, x << 1, x // 2

def observe(k):
    print(k)
    return {
        'naive': timeit.timeit(test_naive),
        'shift': timeit.timeit(test_shift),
        'mixed': timeit.timeit(test_mixed),
        'mixed_swapped': timeit.timeit(test_mixed_swapped),
    }

def get_observations():
    return pd.DataFrame([observe(k) for k in range(100)])

The question:

Is my test valid? If so, why is (multiply, shift) faster than (shift, shift)?

I run Python 3.5 on Ubuntu 14.04.

Edit

Above is the original statement of the question. Dan Getz provides an excellent explanation in his answer.

For the sake of completeness, here are sample illustrations for larger x when multiplication optimizations do not apply.


回答 0

这似乎是因为小数的乘法在CPython 3.5中得到了优化,而小数的左移则没有。正向左移始终会创建一个较大的整数对象,以存储结果,作为计算的一部分,而对于您在测试中使用的排序的乘法,特殊的优化可避免这种情况,并创建正确大小的整数对象。这可以在Python整数实现的源代码中看到。

由于Python中的整数是任意精度的,因此它们存储为整数“数字”的数组,每个整数位数的位数受到限制。因此,在一般情况下,涉及整数的运算不是单个运算,而是需要处理多个“数字”的情况。在pyport.h中,此位限制在64位平台上定义为 30位,否则为15位。(为了简化说明,我从这里开始将其称为30。但是请注意,如果您使用的是针对32位编译的Python,则基准测试的结果取决于是否x小于32,768。)

当操作的输入和输出保持在此30位限制内时,可以以优化的方式而不是一般的方式来处理操作。整数乘法实现的开始如下:

static PyObject *
long_mul(PyLongObject *a, PyLongObject *b)
{
    PyLongObject *z;

    CHECK_BINOP(a, b);

    /* fast path for single-digit multiplication */
    if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
        stwodigits v = (stwodigits)(MEDIUM_VALUE(a)) * MEDIUM_VALUE(b);
#ifdef HAVE_LONG_LONG
        return PyLong_FromLongLong((PY_LONG_LONG)v);
#else
        /* if we don't have long long then we're almost certainly
           using 15-bit digits, so v will fit in a long.  In the
           unlikely event that we're using 30-bit digits on a platform
           without long long, a large v will just cause us to fall
           through to the general multiplication code below. */
        if (v >= LONG_MIN && v <= LONG_MAX)
            return PyLong_FromLong((long)v);
#endif
    }

因此,当两个整数相乘时,每个整数都适合一个30位数字,这由CPython解释器直接进行乘法运算,而不是将整数作为数组使用。(MEDIUM_VALUE()在正整数对象上调用仅得到其第一个30位数字。)如果结果适合单个30位数字,PyLong_FromLongLong()则将在相对较少的操作中注意到这一点,并创建一个单个数字整数对象进行存储它。

相反,左移并没有以这种方式优化,每个左移都将整数作为数组进行处理。特别是,如果您查看的源代码long_lshift(),那么在很小但为正的左移的情况下,总是创建一个2位整数对象,只要稍后将其长度截断为1:(我的评论/*** ***/

static PyObject *
long_lshift(PyObject *v, PyObject *w)
{
    /*** ... ***/

    wordshift = shiftby / PyLong_SHIFT;   /*** zero for small w ***/
    remshift  = shiftby - wordshift * PyLong_SHIFT;   /*** w for small w ***/

    oldsize = Py_ABS(Py_SIZE(a));   /*** 1 for small v > 0 ***/
    newsize = oldsize + wordshift;
    if (remshift)
        ++newsize;   /*** here newsize becomes at least 2 for w > 0, v > 0 ***/
    z = _PyLong_New(newsize);

    /*** ... ***/
}

整数除法

您没有问过整数地板除法与右移相比性能更差的问题,因为这符合您(和我)的期望。但是,将一个小正数除以另一个小正数也不会像小乘法那样优化。每个//都使用函数来计算商余数long_divrem()。该余数是针对一个带乘法的小除数计算的,并存储在新分配的整数对象中,在这种情况下,该整数对象将立即丢弃。

This seems to be because multiplication of small numbers is optimized in CPython 3.5, in a way that left shifts by small numbers are not. Positive left shifts always create a larger integer object to store the result, as part of the calculation, while for multiplications of the sort you used in your test, a special optimization avoids this and creates an integer object of the correct size. This can be seen in the source code of Python’s integer implementation.

Because integers in Python are arbitrary-precision, they are stored as arrays of integer “digits”, with a limit on the number of bits per integer digit. So in the general case, operations involving integers are not single operations, but instead need to handle the case of multiple “digits”. In pyport.h, this bit limit is defined as 30 bits on 64-bit platform, or 15 bits otherwise. (I’ll just call this 30 from here on to keep the explanation simple. But note that if you were using Python compiled for 32-bit, your benchmark’s result would depend on if x were less than 32,768 or not.)

When an operation’s inputs and outputs stay within this 30-bit limit, the operation can be handled in an optimized way instead of the general way. The beginning of the integer multiplication implementation is as follows:

static PyObject *
long_mul(PyLongObject *a, PyLongObject *b)
{
    PyLongObject *z;

    CHECK_BINOP(a, b);

    /* fast path for single-digit multiplication */
    if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
        stwodigits v = (stwodigits)(MEDIUM_VALUE(a)) * MEDIUM_VALUE(b);
#ifdef HAVE_LONG_LONG
        return PyLong_FromLongLong((PY_LONG_LONG)v);
#else
        /* if we don't have long long then we're almost certainly
           using 15-bit digits, so v will fit in a long.  In the
           unlikely event that we're using 30-bit digits on a platform
           without long long, a large v will just cause us to fall
           through to the general multiplication code below. */
        if (v >= LONG_MIN && v <= LONG_MAX)
            return PyLong_FromLong((long)v);
#endif
    }

So when multiplying two integers where each fits in a 30-bit digit, this is done as a direct multiplication by the CPython interpreter, instead of working with the integers as arrays. (MEDIUM_VALUE() called on a positive integer object simply gets its first 30-bit digit.) If the result fits in a single 30-bit digit, PyLong_FromLongLong() will notice this in a relatively small number of operations, and create a single-digit integer object to store it.

In contrast, left shifts are not optimized this way, and every left shift deals with the integer being shifted as an array. In particular, if you look at the source code for long_lshift(), in the case of a small but positive left shift, a 2-digit integer object is always created, if only to have its length truncated to 1 later: (my comments in /*** ***/)

static PyObject *
long_lshift(PyObject *v, PyObject *w)
{
    /*** ... ***/

    wordshift = shiftby / PyLong_SHIFT;   /*** zero for small w ***/
    remshift  = shiftby - wordshift * PyLong_SHIFT;   /*** w for small w ***/

    oldsize = Py_ABS(Py_SIZE(a));   /*** 1 for small v > 0 ***/
    newsize = oldsize + wordshift;
    if (remshift)
        ++newsize;   /*** here newsize becomes at least 2 for w > 0, v > 0 ***/
    z = _PyLong_New(newsize);

    /*** ... ***/
}

Integer division

You didn’t ask about the worse performance of integer floor division compared to right shifts, because that fit your (and my) expectations. But dividing a small positive number by another small positive number is not as optimized as small multiplications, either. Every // computes both the quotient and the remainder using the function long_divrem(). This remainder is computed for a small divisor with a multiplication, and is stored in a newly-allocated integer object, which in this situation is immediately discarded.


tf.app.run()如何工作?

问题:tf.app.run()如何工作?

tf.app.run()Tensorflow 中的工作如何翻译演示?

在中tensorflow/models/rnn/translate/translate.py,有一个呼叫到tf.app.run()。如何处理?

if __name__ == "__main__":
    tf.app.run() 

How does tf.app.run() work in Tensorflow translate demo?

In tensorflow/models/rnn/translate/translate.py, there is a call to tf.app.run(). How is it being handled?

if __name__ == "__main__":
    tf.app.run() 

回答 0

if __name__ == "__main__":

表示当前文件在shell下执行,而不是作为模块导入。

tf.app.run()

如您所见,通过文件 app.py

def run(main=None, argv=None):
  """Runs the program with an optional 'main' function and 'argv' list."""
  f = flags.FLAGS

  # Extract the args from the optional `argv` list.
  args = argv[1:] if argv else None

  # Parse the known flags from that list, or from the command
  # line otherwise.
  # pylint: disable=protected-access
  flags_passthrough = f._parse_flags(args=args)
  # pylint: enable=protected-access

  main = main or sys.modules['__main__'].main

  # Call the main function, passing through any arguments
  # to the final program.
  sys.exit(main(sys.argv[:1] + flags_passthrough))

让我们逐行中断:

flags_passthrough = f._parse_flags(args=args)

这样可以确保您通过命令行传递的参数有效,例如, python my_model.py --data_dir='...' --max_iteration=10000实际上,此功能是基于python标准argparse模块实现的。

main = main or sys.modules['__main__'].main

第一个main在右边=是当前函数的第一个参数run(main=None, argv=None) 。While sys.modules['__main__']表示当前正在运行的文件(例如my_model.py)。

因此有两种情况:

  1. 您没有的main功能,my_model.py那么您必须调用tf.app.run(my_main_running_function)

  2. 您在中具有main功能my_model.py。(通常是这种情况。)

最后一行:

sys.exit(main(sys.argv[:1] + flags_passthrough))

确保使用解析后的参数正确调用您的main(argv)or my_main_running_function(argv)函数。

if __name__ == "__main__":

means current file is executed under a shell instead of imported as a module.

tf.app.run()

As you can see through the file app.py

def run(main=None, argv=None):
  """Runs the program with an optional 'main' function and 'argv' list."""
  f = flags.FLAGS

  # Extract the args from the optional `argv` list.
  args = argv[1:] if argv else None

  # Parse the known flags from that list, or from the command
  # line otherwise.
  # pylint: disable=protected-access
  flags_passthrough = f._parse_flags(args=args)
  # pylint: enable=protected-access

  main = main or sys.modules['__main__'].main

  # Call the main function, passing through any arguments
  # to the final program.
  sys.exit(main(sys.argv[:1] + flags_passthrough))

Let’s break line by line:

flags_passthrough = f._parse_flags(args=args)

This ensures that the argument you pass through command line is valid,e.g. python my_model.py --data_dir='...' --max_iteration=10000 Actually, this feature is implemented based on python standard argparse module.

main = main or sys.modules['__main__'].main

The first main in right side of = is the first argument of current function run(main=None, argv=None) . While sys.modules['__main__'] means current running file(e.g. my_model.py).

So there are two cases:

  1. You don’t have a main function in my_model.py Then you have to call tf.app.run(my_main_running_function)

  2. you have a main function in my_model.py. (This is mostly the case.)

Last line:

sys.exit(main(sys.argv[:1] + flags_passthrough))

ensures your main(argv) or my_main_running_function(argv) function is called with parsed arguments properly.


回答 1

这只是一个非常快速的包装程序,可以处理标志解析,然后分派到您自己的主程序。参见代码

It’s just a very quick wrapper that handles flag parsing and then dispatches to your own main. See the code.


回答 2

没什么特别的tf.app。这只是一个通用的入口点脚本

使用可选的“ main”功能和“ argv”列表运行程序。

它与神经网络无关,它只是调用main函数,并传递给它的任何参数。

There is nothing special in tf.app. This is just a generic entry point script, which

Runs the program with an optional ‘main’ function and ‘argv’ list.

It has nothing to do with neural networks and it just calls the main function, passing through any arguments to it.


回答 3

简单来说,的工作tf.app.run()首先设置全局标志以供以后使用,例如:

from tensorflow.python.platform import flags
f = flags.FLAGS

然后使用一组参数运行自定义的main函数。

例如,在TensorFlow NMT代码库中,用于训练/推理的程序执行的第一个入口点就是从这一点开始(请参见下面的代码)

if __name__ == "__main__":
  nmt_parser = argparse.ArgumentParser()
  add_arguments(nmt_parser)
  FLAGS, unparsed = nmt_parser.parse_known_args()
  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

在使用解析参数之后argparsetf.app.run()您将运行函数“ main”,其定义如下:

def main(unused_argv):
  default_hparams = create_hparams(FLAGS)
  train_fn = train.train
  inference_fn = inference.inference
  run_main(FLAGS, default_hparams, train_fn, inference_fn)

因此,在设置了供全局使用的标志之后,tf.app.run()只需运行main传递给它的函数argv作为其参数即可。

PS:正如萨尔瓦多·达利(Salvador Dali)的回答所说,我猜这只是一个很好的软件工程实践,尽管我不确定TensorFlow是否会执行main比使用常规CPython 进行的函数优化的运行。

In simple terms, the job of tf.app.run() is to first set the global flags for later usage like:

from tensorflow.python.platform import flags
f = flags.FLAGS

and then run your custom main function with a set of arguments.

For e.g. in TensorFlow NMT codebase, the very first entry point for the program execution for training/inference starts at this point (see below code)

if __name__ == "__main__":
  nmt_parser = argparse.ArgumentParser()
  add_arguments(nmt_parser)
  FLAGS, unparsed = nmt_parser.parse_known_args()
  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

After parsing the arguments using argparse, with tf.app.run() you run the function “main” which is defined like:

def main(unused_argv):
  default_hparams = create_hparams(FLAGS)
  train_fn = train.train
  inference_fn = inference.inference
  run_main(FLAGS, default_hparams, train_fn, inference_fn)

So, after setting the flags for global use, tf.app.run() simply runs that main function that you pass to it with argv as its parameters.

P.S.: As Salvador Dali’s answer says, it’s just a good software engineering practice, I guess, although I’m not sure whether TensorFlow performs any optimized run of the main function than that was run using normal CPython.


回答 4

Google代码很大程度上取决于要在库/二进制文件/ python脚本中访问的全局标志,因此tf.app.run()解析出这些标志以在FLAGs(或类似的变量)中创建全局状态,然后调用python main( ) 正如它应该。

如果他们没有对tf.app.run()的调用,则用户可能会忘记进行FLAG解析,从而导致这些库/二进制文件/脚本无法访问所需的FLAG。

Google code depends on a lot on global flags being accessing in libraries/binaries/python scripts and so tf.app.run() parses out those flags to create a global state in FLAGs(or something similar) variable and then calls python main() as it should.

If they didn’t have this call to tf.app.run(), then users might forget to do FLAGs parsing, leading to these libraries/binaries/scripts not having access to FLAGs they need.


回答 5

2.0兼容答:如果你想使用tf.app.run()Tensorflow 2.0,我们应该使用的命令,

tf.compat.v1.app.run()或者您可以使用tf_upgrade_v21.x代码转换为2.0

2.0 Compatible Answer: If you want to use tf.app.run() in Tensorflow 2.0, we should use the command,

tf.compat.v1.app.run() or you can use tf_upgrade_v2 to convert 1.x code to 2.0.


Python 3中的Concurrent.futures与Multiprocessing

问题:Python 3中的Concurrent.futures与Multiprocessing

Python 3.2引入了Concurrent Futures,这似乎是较旧的线程和多处理模块的一些高级组合。

与较旧的多处理模块相比,将此功能用于与CPU绑定的任务有什么优点和缺点?

本文建议他们更容易使用-是这样吗?

Python 3.2 introduced Concurrent Futures, which appear to be some advanced combination of the older threading and multiprocessing modules.

What are the advantages and disadvantages of using this for CPU bound tasks over the older multiprocessing module?

This article suggests they’re much easier to work with – is that the case?


回答 0

我不会称其为concurrent.futures“高级”,它是一个更简单的接口,其工作原理几乎相同,无论您使用多个线程还是多个进程作为基础并行化ization头。

所以,像“简单的界面”的几乎所有情况下,大同小异的取舍都参与:它有一个浅的学习曲线,这在很大程度上只是因为有可用的要少得多,以学习; 但是,由于它提供的选项较少,最终可能会以丰富的界面无法实现的方式使您感到沮丧。

就与CPU绑定的任务而言,这还不够具体,以至于说不出什么意义。对于CPython下与CPU绑定的任务,您需要多个进程而不是多个线程才能获得加速的机会。但是,获得多少加速(如果有)取决于硬件,操作系统的详细信息,尤其取决于特定任务需要多少进程间通信。在幕后,所有进程间并行化头都依赖于相同的OS原语-用于获得这些原语的高级API并不是底线速度的主要因素。

编辑:示例

这是您引用的文章中显示的最终代码,但是我添加了一个导入语句以使其正常工作:

from concurrent.futures import ProcessPoolExecutor
def pool_factorizer_map(nums, nprocs):
    # Let the executor divide the work among processes by using 'map'.
    with ProcessPoolExecutor(max_workers=nprocs) as executor:
        return {num:factors for num, factors in
                                zip(nums,
                                    executor.map(factorize_naive, nums))}

这里使用的是完全一样的东西multiprocessing

import multiprocessing as mp
def mp_factorizer_map(nums, nprocs):
    with mp.Pool(nprocs) as pool:
        return {num:factors for num, factors in
                                zip(nums,
                                    pool.map(factorize_naive, nums))}

请注意,multiprocessing.Pool在Python 3.3中添加了使用对象作为上下文管理器的功能。

哪一个更容易使用?大声笑;-)他们本质上是相同的。

一个区别是Pool支持这样的事情,你可能不知道是多么容易的许多不同的方式可以是直到你攀上了学习曲线相当一路上扬。

同样,所有这些不同的方式都是优点和缺点。它们是优势,因为在某些情况下可能需要灵活性。它们之所以成为弱点,是因为“最好只有一种明显的方法”。concurrent.futures从长远来看,专案(如果可能)坚持下去的项目可能会更容易维护,因为在如何使用其最小限度的API方面缺乏免费的新颖性。

I wouldn’t call concurrent.futures more “advanced” – it’s a simpler interface that works very much the same regardless of whether you use multiple threads or multiple processes as the underlying parallelization gimmick.

So, like virtually all instances of “simpler interface”, much the same trade-offs are involved: it has a shallower learning curve, in large part just because there’s so much less available to be learned; but, because it offers fewer options, it may eventually frustrate you in ways the richer interfaces won’t.

So far as CPU-bound tasks go, that’s way too under-specified to say much meaningful. For CPU-bound tasks under CPython, you need multiple processes rather than multiple threads to have any chance of getting a speedup. But how much (if any) of a speedup you get depends on the details of your hardware, your OS, and especially on how much inter-process communication your specific tasks require. Under the covers, all inter-process parallelization gimmicks rely on the same OS primitives – the high-level API you use to get at those isn’t a primary factor in bottom-line speed.

Edit: example

Here’s the final code shown in the article you referenced, but I’m adding an import statement needed to make it work:

from concurrent.futures import ProcessPoolExecutor
def pool_factorizer_map(nums, nprocs):
    # Let the executor divide the work among processes by using 'map'.
    with ProcessPoolExecutor(max_workers=nprocs) as executor:
        return {num:factors for num, factors in
                                zip(nums,
                                    executor.map(factorize_naive, nums))}

Here’s exactly the same thing using multiprocessing instead:

import multiprocessing as mp
def mp_factorizer_map(nums, nprocs):
    with mp.Pool(nprocs) as pool:
        return {num:factors for num, factors in
                                zip(nums,
                                    pool.map(factorize_naive, nums))}

Note that the ability to use multiprocessing.Pool objects as context managers was added in Python 3.3.

As for which one is easier to work with, they’re essentially identical.

One difference is that Pool supports so many different ways of doing things that you may not realize how easy it can be until you’ve climbed quite a way up the learning curve.

Again, all those different ways are both a strength and a weakness. They’re a strength because the flexibility may be required in some situations. They’re a weakness because of “preferably only one obvious way to do it”. A project sticking exclusively (if possible) to concurrent.futures will probably be easier to maintain over the long run, due to the lack of gratuitous novelty in how its minimal API can be used.


斜杠在help()输出中意味着什么?

问题:斜杠在help()输出中意味着什么?

在闭括号前/,Python 3.4的help输出是什么意思range

>>> help(range)
Help on class range in module builtins:

class range(object)
 |  range(stop) -> range object
 |  range(start, stop[, step]) -> range object
 |  
 |  Return a virtual sequence of numbers from start to stop by step.
 |  
 |  Methods defined here:
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.

                                        ...

What does the / mean in Python 3.4’s help output for range before the closing parenthesis?

>>> help(range)
Help on class range in module builtins:

class range(object)
 |  range(stop) -> range object
 |  range(start, stop[, step]) -> range object
 |  
 |  Return a virtual sequence of numbers from start to stop by step.
 |  
 |  Methods defined here:
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.

                                        ...

回答 0

它象征着结束位置唯一参数,参数,你不能作为关键字参数使用。在Python 3.8之前,只能在C API中指定此类参数。

这意味着keyto 的参数__contains__只能通过position(range(5).__contains__(3))传递,而不能作为关键字参数(range(5).__contains__(key=3))传递,这可以通过pure-python函数中的position参数完成。

另请参阅Argument Clinic文档:

要将所有参数标记为Argument Clinic中的“仅位置”,请/在最后一个参数之后单独添加一行,并使其与参数行缩进。

和(最近添加的)Python FAQ

函数的参数列表中的斜杠表示该函数之前的参数仅是位置参数。仅位置参数是没有外部可用名称的参数。调用仅接受位置参数的函数后,参数将仅基于其位置映射到参数。

3.8版开始,该语法现已成为Python语言规范的一部分,请参阅PEP 570 – 仅Python位置参数。在PEP 570之前,已经保留了该语法以供将来将来包含在Python中,请参阅PEP 457- 仅位置参数的语法

仅位置参数可以导致更清晰的API,使原本仅C语言的模块的纯Python实现更加一致且易于维护,并且由于仅位置参数需要很少的处理,因此它们可导致更快的Python代码。

It signifies the end of the positional only parameters, parameters you cannot use as keyword parameters. Before Python 3.8, such parameters could only be specified in the C API.

It means the key argument to __contains__ can only be passed in by position (range(5).__contains__(3)), not as a keyword argument (range(5).__contains__(key=3)), something you can do with positional arguments in pure-python functions.

Also see the Argument Clinic documentation:

To mark all parameters as positional-only in Argument Clinic, add a / on a line by itself after the last parameter, indented the same as the parameter lines.

and the (very recent addition to) the Python FAQ:

A slash in the argument list of a function denotes that the parameters prior to it are positional-only. Positional-only parameters are the ones without an externally-usable name. Upon calling a function that accepts positional-only parameters, arguments are mapped to parameters based solely on their position.

The syntax is now part of the Python language specification, as of version 3.8, see PEP 570 – Python Positional-Only Parameters. Before PEP 570, the syntax was already reserved for possible future inclusion in Python, see PEP 457 – Syntax For Positional-Only Parameters.

Positional-only parameters can lead to cleaner and clearer APIs, make pure-Python implementations of otherwise C-only modules more consistent and easier to maintain, and because positional-only parameters require very little processing, they lead to faster Python code.


回答 1

我自己问了这个问题。:)发现这/是Guido最初在这里提出的。

替代方案:使用’/’怎么样?它与“ *”相反,后者表示“关键字参数”,而“ /”不是新字符。

然后他的提议获胜

嘿。如果是这样,我的“ /”建议将获胜:

 def foo(pos_only, /, pos_or_kw, *, kw_only): ...

我认为涉及此的非常相关的文件是PEP 570。回顾部分看起来不错。

回顾

用例将确定在函数定义中使用哪些参数:

 def f(pos1, pos2, /, pos_or_kwd, *, kwd1, kwd2):

作为指导:

仅在名称无关紧要或名称没有含义且仅会以相同顺序传递少数参数的情况下,才使用仅位置。当名称具有含义且通过使用名称明确表示功能定义时,请仅使用关键字。


如果函数以 /

def foo(p1, p2, /)

这意味着所有功能参数都是位置性的。

I asked this question myself. :) Found out that / was originally proposed by Guido in here.

Alternative proposal: how about using ‘/’ ? It’s kind of the opposite of ‘*’ which means “keyword argument”, and ‘/’ is not a new character.

Then his proposal won.

Heh. If that’s true, my ‘/’ proposal wins:

 def foo(pos_only, /, pos_or_kw, *, kw_only): ...

I think the very relevant document covering this is PEP 570. Where recap section looks nice.

Recap

The use case will determine which parameters to use in the function definition:

 def f(pos1, pos2, /, pos_or_kwd, *, kwd1, kwd2):

As guidance:

Use positional-only if names do not matter or have no meaning, and there are only a few arguments which will always be passed in the same order. Use keyword-only when names have meaning and the function definition is more understandable by being explicit with names.


If the function ends with /

def foo(p1, p2, /)

This means all functional arguments are positional.


回答 2

正斜杠(/)表示之前的所有参数都是位置唯一的参数。在接受PEP 570之后,在python 3.8中添加了仅位置参数功能。最初,此表示法是在PEP 457-仅位置参数表示法中定义的

在函数定义中,Foraward斜杠(/)之前的参数仅是位置参数,后跟斜杠(/)的参数根据语法可以是任何种类。仅在调用函数时根据参数的位置将参数映射到仅位置参数。通过关键字(名称)传递仅位置参数无效。

让我们来看下面的例子

def foo(a, b, / , x, y):
   print("positional ", a, b)
   print("positional or keyword", x, y)

在上面的函数定义中,参数a和b仅是位置信息,而x或y可以是位置信息或关键字。

以下函数调用有效

foo(40, 20, 99, 39)
foo(40, 3.14, "hello", y="world")
foo(1.45, 3.14, x="hello", y="world")

但是,以下函数调用无效,从而引发TypeError异常,因为a,b没有作为位置参数传递,而是作为关键字传递

foo(a=1.45, b=3.14, x=1, y=4)

TypeError:foo()获得了一些仅位置参数作为关键字参数传递:’a,b’

python中的许多内置函数仅接受位置参数,而按关键字传递参数没有意义。例如,内置函数len仅接受一个positional(only)参数,如果将len调用为len(obj =“ hello world”)会损害可读性,则检查help(len)。

>>> help(len)
Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.

仅位置参数使基础c /库函数易于维护。它允许将来仅更改位置参数的参数名称,而不会破坏使用API​​的客户端代码的风险

最后但并非最不重要的一点是,仅位置参数允许我们使用其名称在可变长度关键字参数中使用。检查以下示例

>>> def f(a, b, /, **kwargs):
...     print(a, b, kwargs)
...
>>> f(10, 20, a=1, b=2, c=3)         # a and b are used in two ways
10 20 {'a': 1, 'b': 2, 'c': 3}

仅位置参数比较好此处在python中的函数参数类型中进行了解释:仅位置参数

仅位置参数语法已正式添加到python3.8中。签出python3.8的新功能-仅位置参数

与PEP相关:PEP 570-Python仅位置参数

Forward Slash (/) indicates all arguments prior to it are positional only argument. Positional only arguments feature was added in python 3.8 after PEP 570 was accepted. Initially this notation was defined in PEP 457 – Notation for Notation For Positional-Only Parameters

Parameters in function definition prior Foraward slash (/) are positional only and parameters followed by slash(/) can be of any kind as per syntax. Where arguments are mapped to positional only parameters solely based on their position upon calling a function. Passing positional-only parameters by keywords(name) is invalid.

Let’s take following example

def foo(a, b, / , x, y):
   print("positional ", a, b)
   print("positional or keyword", x, y)

Here in the above function definition parameters a and b are positional-only, while x or y can be either positional or keyword.

Following function calls are valid

foo(40, 20, 99, 39)
foo(40, 3.14, "hello", y="world")
foo(1.45, 3.14, x="hello", y="world")

But, following function call is not valid which raises an exception TypeError since a, b are not passed as positional arguments instead passed as keyword

foo(a=1.45, b=3.14, x=1, y=4)

TypeError: foo() got some positional-only arguments passed as keyword arguments: ‘a, b’

Many built in function in python accept positional only arguments where passing arguments by keyword doesn’t make sense. For example built-in function len accepts only one positional(only) argument, Where calling len as len(obj=”hello world”) impairs readability, check help(len).

>>> help(len)
Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.

Positional only parameters make underlying c/library functions easy to maintain. It allows parameters names of positional only parameters to be changes in future without risk of breaking client code that uses API

Last but not least, positional only parameters allow us to use their names to be used in variable length keyword arguments. Check following example

>>> def f(a, b, /, **kwargs):
...     print(a, b, kwargs)
...
>>> f(10, 20, a=1, b=2, c=3)         # a and b are used in two ways
10 20 {'a': 1, 'b': 2, 'c': 3}

Positional only parameters is better Explained here at Types of function arguments in python: Positional Only Parameters

Positional-only parameters syntax was officially added to python3.8. Checkout what’s new python3.8 – positional only arguments

PEP Related: PEP 570 — Python Positional-Only Parameters


TypeError:module .__ init __()最多接受2个参数(给定3个)

问题:TypeError:module .__ init __()最多接受2个参数(给定3个)

我在名为的文件中定义了一个类Object.py。当我尝试从另一个文件中的此类继承时,调用构造函数将引发异常:

TypeError: module.__init__() takes at most 2 arguments (3 given)

这是我的代码:

import Object

class Visitor(Object):
    pass

instance = Visitor()  # this line throws the exception

我究竟做错了什么?

I have defined a class in a file named Object.py. When I try to inherit from this class in another file, calling the constructor throws an exception:

TypeError: module.__init__() takes at most 2 arguments (3 given)

This is my code:

import Object

class Visitor(Object):
    pass

instance = Visitor()  # this line throws the exception

What am I doing wrong?


回答 0

发生错误是因为Object是模块,而不是类。因此,您的继承权很严格。

将您的导入语句更改为:

from Object import ClassName

和您的类定义为:

class Visitor(ClassName):

要么

将您的类定义更改为:

class Visitor(Object.ClassName):
   etc

Your error is happening because Object is a module, not a class. So your inheritance is screwy.

Change your import statement to:

from Object import ClassName

and your class definition to:

class Visitor(ClassName):

or

change your class definition to:

class Visitor(Object.ClassName):
   etc

回答 1

即使在@Mickey Perlstein的回答和他3个小时的侦探工作之后,我仍然花了几分钟才将其应用于我自己的烂摊子。如果有人像我一样需要更多帮助,这就是我的处境。

  • 响应是一个模块
  • 响应是响应模块中的基类
  • GeoJsonResponse是从Response派生的新类

初始GeoJsonResponse类:

from pyexample.responses import Response

class GeoJsonResponse(Response):

    def __init__(self, geo_json_data):

看起来不错 在尝试调试事物之前,没有问题,这是当您收到一堆看似模糊的错误消息时,如下所示:

从pyexample.responses中导入GeoJsonResponse .. \ pyexample \ responses \ GeoJsonResponse.py:12:在(模块)类GeoJsonResponse(Response)中:

E TypeError:module()最多接受2个参数(给定3个)

==================================错误=============== ======================

___________________收集测试错误/test_geojson.py ____________________

pyexample.responses中的test_geojson.py:2:在(模块)中导入GeoJsonResponse .. \ pyexample \ responses \ GeoJsonResponse.py:12:在(模块)中

class GeoJsonResponse(Response):E TypeError:module()最多接受2个参数(给定3个)

错误:找不到:\ PyExample \ tests \ test_geojson.py :: TestGeoJson :: test_api_response

C:\ Python37 \ lib \ site-packages \ aenum__init __。py:163

(在[]中的任何一个都没有名称’PyExample \ tests \ test_geojson.py :: TestGeoJson :: test_api_response’)

错误尽了最大的努力,向我指出了正确的方向,@ Mickey Perlstein的回答仍然无效,花了我一分钟时间,将所有内容整合到我自己的上下文中:

我正在导入模块

from pyexample.responses import Response

当我应该导入该类时

from pyexample.responses.Response import Response

希望这对某人有帮助。(以我的辩护,还为时过早。)

Even after @Mickey Perlstein’s answer and his 3 hours of detective work, it still took me a few more minutes to apply this to my own mess. In case anyone else is like me and needs a little more help, here’s what was going on in my situation.

  • responses is a module
  • Response is a base class within the responses module
  • GeoJsonResponse is a new class derived from Response

Initial GeoJsonResponse class:

from pyexample.responses import Response

class GeoJsonResponse(Response):

    def __init__(self, geo_json_data):

Looks fine. No problems until you try to debug the thing, which is when you get a bunch of seemingly vague error messages like this:

from pyexample.responses import GeoJsonResponse ..\pyexample\responses\GeoJsonResponse.py:12: in (module) class GeoJsonResponse(Response):

E TypeError: module() takes at most 2 arguments (3 given)

=================================== ERRORS ====================================

___________________ ERROR collecting tests/test_geojson.py ____________________

test_geojson.py:2: in (module) from pyexample.responses import GeoJsonResponse ..\pyexample\responses \GeoJsonResponse.py:12: in (module)

class GeoJsonResponse(Response): E TypeError: module() takes at most 2 arguments (3 given)

ERROR: not found: \PyExample\tests\test_geojson.py::TestGeoJson::test_api_response

C:\Python37\lib\site-packages\aenum__init__.py:163

(no name ‘PyExample\ tests\test_geojson.py::TestGeoJson::test_api_response’ in any of [])

The errors were doing their best to point me in the right direction, and @Mickey Perlstein’s answer was dead on, it just took me a minute to put it all together in my own context:

I was importing the module:

from pyexample.responses import Response

when I should have been importing the class:

from pyexample.responses.Response import Response

Hope this helps someone. (In my defense, it’s still pretty early.)


回答 2

您也可以在Python 3.6.1中执行以下操作

from Object import Object as Parent

和您的类定义为:

class Visitor(Parent):

You may also do the following in Python 3.6.1

from Object import Object as Parent

and your class definition to:

class Visitor(Parent):

回答 3

from Object import Object

要么

From Class_Name import Class_name

如果Object是.py文件。

from Object import Object

or

From Class_Name import Class_name

If Object is a .py file.


回答 4

在我遇到问题的情况下,当我尝试扩展类时,我指的是模块。

import logging
class UserdefinedLogging(logging):

如果查看文档信息,您将看到“日志记录”显示为模块。

在这种特定情况下,我必须简单地继承日志记录模块才能为日志记录创建一个额外的类。

In my case where I had the problem I was referring to a module when I tried extending the class.

import logging
class UserdefinedLogging(logging):

If you look at the Documentation Info, you’ll see “logging” displayed as module.

In this specific case I had to simply inherit the logging module to create an extra class for the logging.


为什么`if None .__ eq __(“ a”)`似乎评估为True(但不完全)?

问题:为什么`if None .__ eq __(“ a”)`似乎评估为True(但不完全)?

如果您在Python 3.7中执行以下语句,它将(根据我的测试)打印b

if None.__eq__("a"):
    print("b")

但是,None.__eq__("a")计算为NotImplemented

当然,"a".__eq__("a")计算结果为True,并"b".__eq__("a")计算结果为False

我最初是在测试函数的返回值时发现此问题的,但是在第二种情况下却未返回任何内容-因此,该函数返回了None

这里发生了什么?

If you execute the following statement in Python 3.7, it will (from my testing) print b:

if None.__eq__("a"):
    print("b")

However, None.__eq__("a") evaluates to NotImplemented.

Naturally, "a".__eq__("a") evaluates to True, and "b".__eq__("a") evaluates to False.

I initially discovered this when testing the return value of a function, but didn’t return anything in the second case — so, the function returned None.

What’s going on here?


回答 0

这是一个很好的例子,说明为什么__dunder__不应该直接使用这些方法,因为它们通常不是等效操作符的适当替代;您应该使用==运算符来代替相等性比较,或者在这种特殊情况下,当检查时None,请使用is(跳至答案的底部以获取更多信息)。

你做完了

None.__eq__('a')
# NotImplemented

NotImplemented由于所比较的类型不同,返回的结果不同。考虑另一个示例,其中以这种方式比较了具有不同类型的两个对象,例如1'a'。这样做(1).__eq__('a')也不正确,并且会返回NotImplemented。比较这两个值是否相等的正确方法是

1 == 'a'
# False

这里发生的是

  1. 首先,(1).__eq__('a')尝试,然后返回NotImplemented。这表明不支持该操作,因此
  2. 'a'.__eq__(1)被调用,它也返回相同的NotImplemented。所以,
  3. 将对象视为不一样,然后False将其返回。

这是一个不错的小MCVE,它使用一些自定义类来说明这种情况:

class A:
    def __eq__(self, other):
        print('A.__eq__')
        return NotImplemented

class B:
    def __eq__(self, other):
        print('B.__eq__')
        return NotImplemented

class C:
    def __eq__(self, other):
        print('C.__eq__')
        return True

a = A()
b = B()
c = C()

print(a == b)
# A.__eq__
# B.__eq__
# False

print(a == c)
# A.__eq__
# C.__eq__
# True

print(c == a)
# C.__eq__
# True

当然,这并不能解释为什么该操作返回true。这是因为NotImplemented实际上是一个真实值:

bool(None.__eq__("a"))
# True

和…一样,

bool(NotImplemented)
# True

有关什么值被视为真实和虚假的更多信息,请参阅真值测试的文档部分以及此答案。值得注意的是,这里NotImplemented是truthy,但它会是一个不同的故事有类中定义一个__bool____len__方法返回False0分别。


如果要==使用与运算符等效的功能,请使用operator.eq

import operator
operator.eq(1, 'a')
# False

但是,如前所述,对于要检查的特定情况,请None使用is

var = 'a'
var is None
# False

var2 = None
var2 is None
# True

其功能等效项是使用operator.is_

operator.is_(var2, None)
# True

None是一个特殊对象,并且在任何时间内存中只有1个版本。IOW,它是NoneType该类的唯一单例(但是同一对象可以具有任意数量的引用)。该PEP8方针更加明确:

与单例之类的比较None应始终使用isis not,而不应使用相等运算符。

综上所述,对于单身人士喜欢None,与基准检查is是比较合适的,虽然两者==is会工作得很好。

This is a great example of why the __dunder__ methods should not be used directly as they are quite often not appropriate replacements for their equivalent operators; you should use the == operator instead for equality comparisons, or in this special case, when checking for None, use is (skip to the bottom of the answer for more information).

You’ve done

None.__eq__('a')
# NotImplemented

Which returns NotImplemented since the types being compared are different. Consider another example where two objects with different types are being compared in this fashion, such as 1 and 'a'. Doing (1).__eq__('a') is also not correct, and will return NotImplemented. The right way to compare these two values for equality would be

1 == 'a'
# False

What happens here is

  1. First, (1).__eq__('a') is tried, which returns NotImplemented. This indicates that the operation is not supported, so
  2. 'a'.__eq__(1) is called, which also returns the same NotImplemented. So,
  3. The objects are treated as if they are not the same, and False is returned.

Here’s a nice little MCVE using some custom classes to illustrate how this happens:

class A:
    def __eq__(self, other):
        print('A.__eq__')
        return NotImplemented

class B:
    def __eq__(self, other):
        print('B.__eq__')
        return NotImplemented

class C:
    def __eq__(self, other):
        print('C.__eq__')
        return True

a = A()
b = B()
c = C()

print(a == b)
# A.__eq__
# B.__eq__
# False

print(a == c)
# A.__eq__
# C.__eq__
# True

print(c == a)
# C.__eq__
# True

Of course, that doesn’t explain why the operation returns true. This is because NotImplemented is actually a truthy value:

bool(None.__eq__("a"))
# True

Same as,

bool(NotImplemented)
# True

For more information on what values are considered truthy and falsy, see the docs section on Truth Value Testing, as well as this answer. It is worth noting here that NotImplemented is truthy, but it would have been a different story had the class defined a __bool__ or __len__ method that returned False or 0 respectively.


If you want the functional equivalent of the == operator, use operator.eq:

import operator
operator.eq(1, 'a')
# False

However, as mentioned earlier, for this specific scenario, where you are checking for None, use is:

var = 'a'
var is None
# False

var2 = None
var2 is None
# True

The functional equivalent of this is using operator.is_:

operator.is_(var2, None)
# True

None is a special object, and only 1 version exists in memory at any point of time. IOW, it is the sole singleton of the NoneType class (but the same object may have any number of references). The PEP8 guidelines make this explicit:

Comparisons to singletons like None should always be done with is or is not, never the equality operators.

In summary, for singletons like None, a reference check with is is more appropriate, although both == and is will work just fine.


回答 1

您看到的结果是由于以下事实造成的:

None.__eq__("a") # evaluates to NotImplemented

评估为NotImplemented,其NotImplemented真实值记录为True

https://docs.python.org/3/library/constants.html

这应该由二进制特殊的方法被返回(如特殊的值__eq__()__lt__()__add__()__rsub__(),等等),以指示该操作不相对于另一种类型的实施; 可通过就地二进制特殊的方法(例如被返回__imul__()__iand__()为了相同的目的,等等)。它的真实价值是真实的。

如果您__eq()__手动调用该方法,而不仅仅是使用==,则需要准备好处理它可能返回NotImplemented并且其真实值是true 的可能性。

The result you are seeing is caused by that fact that

None.__eq__("a") # evaluates to NotImplemented

evaluates to NotImplemented, and NotImplemented‘s truth value is documented to be True:

https://docs.python.org/3/library/constants.html

Special value which should be returned by the binary special methods (e.g. __eq__(), __lt__(), __add__(), __rsub__(), etc.) to indicate that the operation is not implemented with respect to the other type; may be returned by the in-place binary special methods (e.g. __imul__(), __iand__(), etc.) for the same purpose. Its truth value is true.

If you call the __eq()__ method manually rather than just using ==, you need to be prepared to deal with the possibility it may return NotImplemented and that its truth value is true.


回答 2

正如您已经想过的None.__eq__("a")NotImplemented但是如果尝试类似

if NotImplemented:
    print("Yes")
else:
    print("No")

结果是

这意味着 NotImplemented true

因此,问题的结果显而易见:

None.__eq__(something) Yield NotImplemented

bool(NotImplemented)评估为True

所以if None.__eq__("a")永远是真的

As you already figured None.__eq__("a") evaluates to NotImplemented however if you try something like

if NotImplemented:
    print("Yes")
else:
    print("No")

the result is

yes

this mean that the truth value of NotImplemented true

Therefor the outcome of the question is obvious:

None.__eq__(something) yields NotImplemented

And bool(NotImplemented) evaluates to True

So if None.__eq__("a") is always True


回答 3

为什么?

它返回一个NotImplemented,是的:

>>> None.__eq__('a')
NotImplemented
>>> 

但是,如果您看一下:

>>> bool(NotImplemented)
True
>>> 

NotImplemented实际上是一个真实的值,所以这就是它返回的原因b,任何True会通过的东西,不会通过的东西False

怎么解决呢?

您必须检查它是否为True,因此请更加可疑,如下所示:

>>> NotImplemented == True
False
>>> 

所以你会做:

>>> if None.__eq__('a') == True:
    print('b')


>>> 

如您所见,它不会返回任何内容。

Why?

It returns a NotImplemented, yeah:

>>> None.__eq__('a')
NotImplemented
>>> 

But if you look at this:

>>> bool(NotImplemented)
True
>>> 

NotImplemented is actually a truthy value, so that’s why it returns b, anything that is True will pass, anything that is False wouldn’t.

How to solve it?

You have to check if it is True, so be more suspicious, as you see:

>>> NotImplemented == True
False
>>> 

So you would do:

>>> if None.__eq__('a') == True:
    print('b')


>>> 

And as you see, it wouldn’t return anything.


在Python中生成随机字母

问题:在Python中生成随机字母

有没有一种方法可以在Python中生成随机字母(如random.randint,但用于字母)?random.randint的范围功能会很好,但是拥有仅输出随机字母的生成器总比没有好。

Is there a way to generate random letters in Python (like random.randint but for letters)? The range functionality of random.randint would be nice but having a generator that just outputs a random letter would be better than nothing.


回答 0

简单:

>>> import string
>>> string.ascii_letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> import random
>>> random.choice(string.ascii_letters)
'j'

string.ascii_letters 根据当前语言环境返回包含小写字母和大写字母的字符串。

random.choice 从序列中返回单个随机元素。

Simple:

>>> import string
>>> string.ascii_letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> import random
>>> random.choice(string.ascii_letters)
'j'

string.ascii_letters returns a string containing the lower case and upper case letters according to the current locale.

random.choice returns a single, random element from a sequence.


回答 1

>>> import random
>>> import string
>>> random.choice(string.ascii_letters)
'g'
>>> import random
>>> import string
>>> random.choice(string.ascii_letters)
'g'

回答 2

>>>def random_char(y):
       return ''.join(random.choice(string.ascii_letters) for x in range(y))

>>>print (random_char(5))
>>>fxkea

生成y个随机字符

>>>def random_char(y):
       return ''.join(random.choice(string.ascii_letters) for x in range(y))

>>>print (random_char(5))
>>>fxkea

to generate y number of random characters


回答 3

>>> import random
>>> import string    
>>> random.choice(string.ascii_lowercase)
'b'
>>> import random
>>> import string    
>>> random.choice(string.ascii_lowercase)
'b'

回答 4

完整性的另一种方式:

>>> chr(random.randrange(97, 97 + 26))

利用事实 ascii “ a”为97,并且字母表中有26个字母。

确定random.randrange()函数调用的上限和下限时,请记住它random.randrange()在上限上是互斥的,这意味着它只会生成比提供的值小1个单位的整数。

Another way, for completeness:

>>> chr(random.randrange(97, 97 + 26))

Use the fact that ascii ‘a’ is 97, and there are 26 letters in the alphabet.

When determining the upper and lower bound of the random.randrange() function call, remember that random.randrange() is exclusive on its upper bound, meaning it will only ever generate an integer up to 1 unit less that the provided value.


回答 5

您可以使用它来获得一个或多个随机字母

import random
import string
random.seed(10)
letters = string.ascii_lowercase
rand_letters = random.choices(letters,k=5) # where k is the number of required rand_letters

print(rand_letters)

['o', 'l', 'p', 'f', 'v']

You can use this to get one or more random letter(s)

import random
import string
random.seed(10)
letters = string.ascii_lowercase
rand_letters = random.choices(letters,k=5) # where k is the number of required rand_letters

print(rand_letters)

['o', 'l', 'p', 'f', 'v']

回答 6

您可以列出一个清单:

import random
list1=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
b=random.randint(0,7)
print(list1[b])

You can just make a list:

import random
list1=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
b=random.randint(0,7)
print(list1[b])

回答 7

def randchar(a, b):
    return chr(random.randint(ord(a), ord(b)))
def randchar(a, b):
    return chr(random.randint(ord(a), ord(b)))

回答 8

import random
def guess_letter():
    return random.choice('abcdefghijklmnopqrstuvwxyz')
import random
def guess_letter():
    return random.choice('abcdefghijklmnopqrstuvwxyz')

回答 9

import random
def Random_Alpha():
    l = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
    return l[random.randint(0,25)]

print(Random_Alpha())
import random
def Random_Alpha():
    l = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
    return l[random.randint(0,25)]

print(Random_Alpha())

回答 10

您可以使用

map(lambda a : chr(a),  np.random.randint(low=65, high=90, size=4))

You can use

map(lambda a : chr(a),  np.random.randint(low=65, high=90, size=4))

回答 11

import string
import random

KEY_LEN = 20

def base_str():
    return (string.letters+string.digits)   
def key_gen():
    keylist = [random.choice(base_str()) for i in range(KEY_LEN)]
    return ("".join(keylist))

您可以这样获得随机字符串:

g9CtUljUWD9wtk1z07iF
ndPbI1DDn6UvHSQoDMtd
klMFY3pTYNVWsNJ6cs34
Qgr7OEalfhXllcFDGh2l
import string
import random

KEY_LEN = 20

def base_str():
    return (string.letters+string.digits)   
def key_gen():
    keylist = [random.choice(base_str()) for i in range(KEY_LEN)]
    return ("".join(keylist))

You can get random strings like this:

g9CtUljUWD9wtk1z07iF
ndPbI1DDn6UvHSQoDMtd
klMFY3pTYNVWsNJ6cs34
Qgr7OEalfhXllcFDGh2l

回答 12

def create_key(key_len):
    key = ''
    valid_characters_list = string.letters + string.digits
    for i in range(key_len):
        character = choice(valid_characters_list)
        key = key + character
    return key

def create_key_list(key_num):
    keys = []
    for i in range(key_num):
        key = create_key(key_len)
        if key not in keys:
            keys.append(key)
    return keys
def create_key(key_len):
    key = ''
    valid_characters_list = string.letters + string.digits
    for i in range(key_len):
        character = choice(valid_characters_list)
        key = key + character
    return key

def create_key_list(key_num):
    keys = []
    for i in range(key_num):
        key = create_key(key_len)
        if key not in keys:
            keys.append(key)
    return keys

回答 13

以前的所有答案都是正确的,如果您要查找各种类型的随机字符(即字母数字和特殊字符),那么这是我创建的一个脚本,用于演示创建随机函数的各种类型,它具有三个函数,一个用于数字,字符和特殊字符。该脚本仅生成密码,仅是演示生成随机字符的各种方式的示例。

import string
import random
import sys

#make sure it's 3.7 or above
print(sys.version)

def create_str(str_length):
    return random.sample(string.ascii_letters, str_length)

def create_num(num_length):
    digits = []
    for i in range(num_length):
        digits.append(str(random.randint(1, 100)))

    return digits

def create_special_chars(special_length):
    stringSpecial = []
    for i in range(special_length):
        stringSpecial.append(random.choice('!$%&()*+,-.:;<=>?@[]^_`{|}~'))

    return stringSpecial

print("how many characters would you like to use ? (DO NOT USE LESS THAN 8)")
str_cnt = input()
print("how many digits would you like to use ? (DO NOT USE LESS THAN 2)")
num_cnt = input()
print("how many special characters would you like to use ? (DO NOT USE LESS THAN 1)")
s_chars_cnt = input()
password_values = create_str(int(str_cnt)) +create_num(int(num_cnt)) + create_special_chars(int(s_chars_cnt))

#shuffle/mix the values
random.shuffle(password_values)

print("generated password is: ")
print(''.join(password_values))

结果:

All previous answers are correct, if you are looking for random characters of various types (i.e. alphanumeric and special characters) then here is an script that I created to demonstrate various types of creating random functions, it has three functions one for numbers, alpha- characters and special characters. The script simply generates passwords and is just an example to demonstrate various ways of generating random characters.

import string
import random
import sys

#make sure it's 3.7 or above
print(sys.version)

def create_str(str_length):
    return random.sample(string.ascii_letters, str_length)

def create_num(num_length):
    digits = []
    for i in range(num_length):
        digits.append(str(random.randint(1, 100)))

    return digits

def create_special_chars(special_length):
    stringSpecial = []
    for i in range(special_length):
        stringSpecial.append(random.choice('!$%&()*+,-.:;<=>?@[]^_`{|}~'))

    return stringSpecial

print("how many characters would you like to use ? (DO NOT USE LESS THAN 8)")
str_cnt = input()
print("how many digits would you like to use ? (DO NOT USE LESS THAN 2)")
num_cnt = input()
print("how many special characters would you like to use ? (DO NOT USE LESS THAN 1)")
s_chars_cnt = input()
password_values = create_str(int(str_cnt)) +create_num(int(num_cnt)) + create_special_chars(int(s_chars_cnt))

#shuffle/mix the values
random.shuffle(password_values)

print("generated password is: ")
print(''.join(password_values))

Result:


回答 14

好吧,这就是我的答案!它运作良好。只需将所需的随机字母数放入“数字” …(Python 3)

import random

def key_gen():
    keylist = random.choice('abcdefghijklmnopqrstuvwxyz')
    return keylist

number = 0
list_item = ''
while number < 20:
    number = number + 1
    list_item = list_item + key_gen()

print(list_item)

well, this is my answer! It works well. Just put the number of random letters you want in ‘number’… (Python 3)

import random

def key_gen():
    keylist = random.choice('abcdefghijklmnopqrstuvwxyz')
    return keylist

number = 0
list_item = ''
while number < 20:
    number = number + 1
    list_item = list_item + key_gen()

print(list_item)

回答 15

import string
import random

def random_char(y):
    return ''.join(random.choice(string.ascii_letters+string.digits+li) for x in range(y))
no=int(input("Enter the number of character for your password=  "))
li = random.choice('!@#$%^*&( )_+}{')
print(random_char(no)+li)
import string
import random

def random_char(y):
    return ''.join(random.choice(string.ascii_letters+string.digits+li) for x in range(y))
no=int(input("Enter the number of character for your password=  "))
li = random.choice('!@#$%^*&( )_+}{')
print(random_char(no)+li)

回答 16

我的代码过于复杂:

import random

letter = (random.randint(1,26))
if letter == 1:
   print ('a')
elif letter == 2:
    print ('b')
elif letter == 3:
    print ('c')
elif letter == 4:
    print ('d')
elif letter == 5:
    print ('e')
elif letter == 6:
    print ('f')
elif letter == 7:
    print ('g')
elif letter == 8:
    print ('h')
elif letter == 9:
    print ('i')
elif letter == 10:
    print ('j')
elif letter == 11:
    print ('k')
elif letter == 12:
    print ('l')
elif letter == 13:
    print ('m')
elif letter == 14:
    print ('n')
elif letter == 15:
    print ('o')
elif letter == 16:
    print ('p')
elif letter == 17:
    print ('q')
elif letter == 18:
    print ('r')
elif letter == 19:
    print ('s')
elif letter == 20:
    print ('t')
elif letter == 21:
    print ('u')
elif letter == 22:
    print ('v')
elif letter == 23:
    print ('w')
elif letter == 24:
    print ('x')
elif letter == 25:
    print ('y')
elif letter == 26:
    print ('z')

它基本上会从26个随机数中生成一个,然后转换为相应的字母。可以改进它,但是我只是一个初学者,我为这段代码感到自豪。

My overly complicated piece of code:

import random

letter = (random.randint(1,26))
if letter == 1:
   print ('a')
elif letter == 2:
    print ('b')
elif letter == 3:
    print ('c')
elif letter == 4:
    print ('d')
elif letter == 5:
    print ('e')
elif letter == 6:
    print ('f')
elif letter == 7:
    print ('g')
elif letter == 8:
    print ('h')
elif letter == 9:
    print ('i')
elif letter == 10:
    print ('j')
elif letter == 11:
    print ('k')
elif letter == 12:
    print ('l')
elif letter == 13:
    print ('m')
elif letter == 14:
    print ('n')
elif letter == 15:
    print ('o')
elif letter == 16:
    print ('p')
elif letter == 17:
    print ('q')
elif letter == 18:
    print ('r')
elif letter == 19:
    print ('s')
elif letter == 20:
    print ('t')
elif letter == 21:
    print ('u')
elif letter == 22:
    print ('v')
elif letter == 23:
    print ('w')
elif letter == 24:
    print ('x')
elif letter == 25:
    print ('y')
elif letter == 26:
    print ('z')

It basically generates a random number out of 26 and then converts into its corresponding letter. This could defiantly be improved but I am only a beginner and I am proud of this piece of code.


回答 17

也许这可以帮助您:

import random
for a in range(64,90):
    h = random.randint(64, a)
    e += chr(h)
print e

Maybe this can help you:

import random
for a in range(64,90):
    h = random.randint(64, a)
    e += chr(h)
print e

回答 18

在键盘上放一个python,让他翻动字母,直到找到自己喜欢的随机组合。

import string #This was a design above but failed to print. I remodled it.
import random
irandom = random.choice(string.ascii_letters) 
print irandom

Place a python on the keyboard and let him roll over the letters until you find your preferd random combo Just kidding!

import string #This was a design above but failed to print. I remodled it.
import random
irandom = random.choice(string.ascii_letters) 
print irandom

如何删除列表中的最后一项?

问题:如何删除列表中的最后一项?

我有这个程序来计算回答一个特定问题所花费的时间,并在回答不正确时退出while循环,但是我想删除上一次计算,所以我可以打电话min(),这不是错误的时间,抱歉这令人困惑。

from time import time

q = input('What do you want to type? ')
a = ' '
record = []
while a != '':
    start = time()
    a = input('Type: ')
    end = time()
    v = end-start
    record.append(v)
    if a == q:
        print('Time taken to type name: {:.2f}'.format(v))
    else:
        break
for i in record:
    print('{:.2f} seconds.'.format(i))

I have this program that calculates the time taken to answer a specific question, and quits out of the while loop when answer is incorrect, but i want to delete the last calculation, so i can call min() and it not be the wrong time, sorry if this is confusing.

from time import time

q = input('What do you want to type? ')
a = ' '
record = []
while a != '':
    start = time()
    a = input('Type: ')
    end = time()
    v = end-start
    record.append(v)
    if a == q:
        print('Time taken to type name: {:.2f}'.format(v))
    else:
        break
for i in record:
    print('{:.2f} seconds.'.format(i))

回答 0

如果我正确理解了问题,则可以使用切片符号保留除最后一项以外的所有内容:

record = record[:-1]

但是更好的方法是直接删除该项目:

del record[-1]

注意1:请注意,使用record = record [:-1]并不会真正删除最后一个元素,而是将子列表分配给record。如果您在函数中运行它并且record是参数,则这会有所不同。使用record = record [:-1]时,原始列表(函数外部)保持不变,而使用del record [-1]或record.pop()时,列表将更改。(如@pltrdy在评论中所述)

注意2:代码可以使用一些Python惯用法。我强烈建议您阅读:
像Pythonista一样的代码:惯用的Python(通过Wayback机器档案)。

If I understood the question correctly, you can use the slicing notation to keep everything except the last item:

record = record[:-1]

But a better way is to delete the item directly:

del record[-1]

Note 1: Note that using record = record[:-1] does not really remove the last element, but assign the sublist to record. This makes a difference if you run it inside a function and record is a parameter. With record = record[:-1] the original list (outside the function) is unchanged, with del record[-1] or record.pop() the list is changed. (as stated by @pltrdy in the comments)

Note 2: The code could use some Python idioms. I highly recommend reading this:
Code Like a Pythonista: Idiomatic Python (via wayback machine archive).


回答 1

你应该用这个

del record[-1]

问题所在

record = record[:-1]

是因为它每次删除项目时都会复制列表,所以效率不是很高

you should use this

del record[-1]

The problem with

record = record[:-1]

Is that it makes a copy of the list every time you remove an item, so isn’t very efficient


回答 2

list.pop() 删除并返回列表的最后一个元素。

list.pop() removes and returns the last element of the list.


回答 3

你需要:

record = record[:-1]

for循环之前。

这将设置record为当前record列表,但没有最后一项。您可能会根据自己的需要,在执行此操作之前确保列表不为空。

You need:

record = record[:-1]

before the for loop.

This will set record to the current record list but without the last item. You may, depending on your needs, want to ensure the list isn’t empty before doing this.


回答 4

如果您在计时方面做得很多,我可以推荐这个小(20行)上下文管理器:

您的代码可能如下所示:

#!/usr/bin/env python
# coding: utf-8

from timer import Timer

if __name__ == '__main__':
    a, record = None, []
    while not a == '':
        with Timer() as t: # everything in the block will be timed
            a = input('Type: ')
        record.append(t.elapsed_s)
    # drop the last item (makes a copy of the list):
    record = record[:-1] 
    # or just delete it:
    # del record[-1]

仅供参考,以下是Timer上下文管理器的全部内容:

from timeit import default_timer

class Timer(object):
    """ A timer as a context manager. """

    def __init__(self):
        self.timer = default_timer
        # measures wall clock time, not CPU time!
        # On Unix systems, it corresponds to time.time
        # On Windows systems, it corresponds to time.clock

    def __enter__(self):
        self.start = self.timer() # measure start time
        return self

    def __exit__(self, exc_type, exc_value, exc_traceback):
        self.end = self.timer() # measure end time
        self.elapsed_s = self.end - self.start # elapsed time, in seconds
        self.elapsed_ms = self.elapsed_s * 1000  # elapsed time, in milliseconds

If you do a lot with timing, I can recommend this little (20 line) context manager:

You code could look like this then:

#!/usr/bin/env python
# coding: utf-8

from timer import Timer

if __name__ == '__main__':
    a, record = None, []
    while not a == '':
        with Timer() as t: # everything in the block will be timed
            a = input('Type: ')
        record.append(t.elapsed_s)
    # drop the last item (makes a copy of the list):
    record = record[:-1] 
    # or just delete it:
    # del record[-1]

Just for reference, here’s the content of the Timer context manager in full:

from timeit import default_timer

class Timer(object):
    """ A timer as a context manager. """

    def __init__(self):
        self.timer = default_timer
        # measures wall clock time, not CPU time!
        # On Unix systems, it corresponds to time.time
        # On Windows systems, it corresponds to time.clock

    def __enter__(self):
        self.start = self.timer() # measure start time
        return self

    def __exit__(self, exc_type, exc_value, exc_traceback):
        self.end = self.timer() # measure end time
        self.elapsed_s = self.end - self.start # elapsed time, in seconds
        self.elapsed_ms = self.elapsed_s * 1000  # elapsed time, in milliseconds

回答 5

只是list.pop() 现在就使用,如果您愿意,可以使用另一种方法:list.popleft()

just simply use list.pop() now if you want it the other way use : list.popleft()


回答 6

如果您有一个列表列表(在我的情况下为tracked_output_sheet),要在其中删除每个列表的最后一个元素,则可以使用以下代码:

interim = []
for x in tracked_output_sheet:interim.append(x[:-1])
tracked_output_sheet= interim

If you have a list of lists (tracked_output_sheet in my case), where you want to delete last element from each list, you can use the following code:

interim = []
for x in tracked_output_sheet:interim.append(x[:-1])
tracked_output_sheet= interim

AttributeError:’模块’对象没有属性’urlopen’

问题:AttributeError:’模块’对象没有属性’urlopen’

我正在尝试使用Python下载网站的HTML源代码,但收到此错误。

Traceback (most recent call last):  
    File "C:\Users\Sergio.Tapia\Documents\NetBeansProjects\DICParser\src\WebDownload.py", line 3, in <module>
     file = urllib.urlopen("http://www.python.org")
AttributeError: 'module' object has no attribute 'urlopen'

我在这里遵循指南:http : //www.boddie.org.uk/python/HTML.html

import urllib

file = urllib.urlopen("http://www.python.org")
s = file.read()
f.close()

#I'm guessing this would output the html source code?
print(s)

我正在使用Python 3。

I’m trying to use Python to download the HTML source code of a website but I’m receiving this error.

Traceback (most recent call last):  
    File "C:\Users\Sergio.Tapia\Documents\NetBeansProjects\DICParser\src\WebDownload.py", line 3, in <module>
     file = urllib.urlopen("http://www.python.org")
AttributeError: 'module' object has no attribute 'urlopen'

I’m following the guide here: http://www.boddie.org.uk/python/HTML.html

import urllib

file = urllib.urlopen("http://www.python.org")
s = file.read()
f.close()

#I'm guessing this would output the html source code?
print(s)

I’m using Python 3.


回答 0

这适用于Python2.x。

对于Python 3,请在docs中查看:

import urllib.request

with urllib.request.urlopen("http://www.python.org") as url:
    s = url.read()
    # I'm guessing this would output the html source code ?
    print(s)

This works in Python 2.x.

For Python 3 look in the docs:

import urllib.request

with urllib.request.urlopen("http://www.python.org") as url:
    s = url.read()
    # I'm guessing this would output the html source code ?
    print(s)

回答 1

与Python 2 + 3兼容的解决方案是:

import sys

if sys.version_info[0] == 3:
    from urllib.request import urlopen
else:
    # Not Python 3 - today, it is most likely to be Python 2
    # But note that this might need an update when Python 4
    # might be around one day
    from urllib import urlopen


# Your code where you can use urlopen
with urlopen("http://www.python.org") as url:
    s = url.read()

print(s)

A Python 2+3 compatible solution is:

import sys

if sys.version_info[0] == 3:
    from urllib.request import urlopen
else:
    # Not Python 3 - today, it is most likely to be Python 2
    # But note that this might need an update when Python 4
    # might be around one day
    from urllib import urlopen


# Your code where you can use urlopen
with urlopen("http://www.python.org") as url:
    s = url.read()

print(s)

回答 2

import urllib.request as ur
s = ur.urlopen("http://www.google.com")
sl = s.read()
print(sl)

在Python v3中,“ urllib.request”本身就是一个模块,因此此处不能使用“ urllib”。

import urllib.request as ur
s = ur.urlopen("http://www.google.com")
sl = s.read()
print(sl)

In Python v3 the “urllib.request” is a module by itself, therefore “urllib” cannot be used here.


回答 3

为了使“ dataX = urllib.urlopen(url).read() ”在python 3中 工作(这对于python 2来说是正确的),您只需更改2个小东西即可。

1: urllib语句本身(在中间添加.request):

dataX = urllib.request.urlopen(url).read()

2:其前面的import语句(从“ import urlib”更改为:

import urllib.request

它应该在python3中工作:)

To get ‘dataX = urllib.urlopen(url).read()‘ working in python3 (this would have been correct for python2) you must just change 2 little things.

1: The urllib statement itself (add the .request in the middle):

dataX = urllib.request.urlopen(url).read()

2: The import statement preceding it (change from ‘import urlib’ to:

import urllib.request

And it should work in python3 :)


回答 4

import urllib.request as ur

filehandler = ur.urlopen ('http://www.google.com')
for line in filehandler:
    print(line.strip())
import urllib.request as ur

filehandler = ur.urlopen ('http://www.google.com')
for line in filehandler:
    print(line.strip())

回答 5

对于python 3,请尝试如下操作:

import urllib.request
urllib.request.urlretrieve('http://crcv.ucf.edu/THUMOS14/UCF101/UCF101/v_YoYo_g19_c02.avi', "video_name.avi")

它将视频下载到当前工作目录

我从这里得到帮助

For python 3, try something like this:

import urllib.request
urllib.request.urlretrieve('http://crcv.ucf.edu/THUMOS14/UCF101/UCF101/v_YoYo_g19_c02.avi', "video_name.avi")

It will download the video to the current working directory

I got help from HERE


回答 6

python3的解决方案:

from urllib.request import urlopen

url = 'http://www.python.org'
file = urlopen(url)
html = file.read()
print(html)

Solution for python3:

from urllib.request import urlopen

url = 'http://www.python.org'
file = urlopen(url)
html = file.read()
print(html)

回答 7

更改两行:

import urllib.request #line1

#Replace
urllib.urlopen("http://www.python.org")
#To
urllib.request.urlopen("http://www.python.org") #line2

如果收到错误403:禁止错误,请尝试以下操作:

siteurl = "http://www.python.org"

req = urllib.request.Request(siteurl, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.100 Safari/537.36'})
pageHTML = urllib.request.urlopen(req).read()

希望您的问题得到解决。

Change TWO lines:

import urllib.request #line1

#Replace
urllib.urlopen("http://www.python.org")
#To
urllib.request.urlopen("http://www.python.org") #line2

If You got ERROR 403: Forbidden Error exception try this:

siteurl = "http://www.python.org"

req = urllib.request.Request(siteurl, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.100 Safari/537.36'})
pageHTML = urllib.request.urlopen(req).read()

I hope your problem resolved.


回答 8

可能的方法之一:

import urllib
...

try:
    # Python 2
    from urllib2 import urlopen
except ImportError:
    # Python 3
    from urllib.request import urlopen

One of the possible way to do it:

import urllib
...

try:
    # Python 2
    from urllib2 import urlopen
except ImportError:
    # Python 3
    from urllib.request import urlopen

回答 9

使用六个模块使您的代码在python2python3之间兼容

urllib.request.urlopen("<your-url>")```

Use six module to make you code compatible between python2 and python3

urllib.request.urlopen("<your-url>")```

回答 10

您在python2.x中使用的代码,可以这样使用:

from urllib.request import urlopen
urlopen(url)

顺便说一句,建议另一个名为的模块requests使用起来更友好,您可以使用pipinstall来安装,并像这样使用:

import requests
requests.get(url)
requests.post(url)

我以为它很容易使用,我也是初学者….哈哈

your code used in python2.x, you can use like this:

from urllib.request import urlopen
urlopen(url)

by the way, suggest another module called requests is more friendly to use, you can use pip install it, and use like this:

import requests
requests.get(url)
requests.post(url)

I thought it is easily to use, i am beginner too….hahah


回答 11

import urllib
import urllib.request
from bs4 import BeautifulSoup


with urllib.request.urlopen("http://www.newegg.com/") as url:
    s = url.read()
    print(s)
soup = BeautifulSoup(s, "html.parser")
all_tag_a = soup.find_all("a", limit=10)

for links in all_tag_a:
    #print(links.get('href'))
    print(links)
import urllib
import urllib.request
from bs4 import BeautifulSoup


with urllib.request.urlopen("http://www.newegg.com/") as url:
    s = url.read()
    print(s)
soup = BeautifulSoup(s, "html.parser")
all_tag_a = soup.find_all("a", limit=10)

for links in all_tag_a:
    #print(links.get('href'))
    print(links)