标签归档:Python

Python列表可以有多大?

问题:Python列表可以有多大?

在Python中,列表可以有多大?我需要大约12000个元素的列表。我仍然可以运行列表方法(例如排序等)吗?

In Python, how big can a list get? I need a list of about 12000 elements. Will I still be able to run list methods such as sorting, etc?


回答 0

根据源代码,列表的最大大小为PY_SSIZE_T_MAX/sizeof(PyObject*)

PY_SSIZE_T_MAXpyport.h中定义为((size_t) -1)>>1

在常规的32位系统上,这是(4294967295/2)/ 4或536870912。

因此,在32位系统上,python列表的最大大小为536,870,912个元素。

只要您拥有的元素数量等于或小于此数量,所有列表函数都应正确运行。

According to the source code, the maximum size of a list is PY_SSIZE_T_MAX/sizeof(PyObject*).

PY_SSIZE_T_MAX is defined in pyport.h to be ((size_t) -1)>>1

On a regular 32bit system, this is (4294967295 / 2) / 4 or 536870912.

Therefore the maximum size of a python list on a 32 bit system is 536,870,912 elements.

As long as the number of elements you have is equal or below this, all list functions should operate correctly.


回答 1

Python文档所述

sys.maxsize

平台的Py_ssize_t类型支持的最大正整数,因此列表,字符串,字典和许多其他容器可以具有的最大大小。

在我的计算机(Linux x86_64)中:

>>> import sys
>>> print sys.maxsize
9223372036854775807

As the Python documentation says:

sys.maxsize

The largest positive integer supported by the platform’s Py_ssize_t type, and thus the maximum size lists, strings, dicts, and many other containers can have.

In my computer (Linux x86_64):

>>> import sys
>>> print sys.maxsize
9223372036854775807

回答 2

当然可以。实际上,您可以轻松地自己看到:

l = range(12000)
l = sorted(l, reverse=True)

在我的机器上运行这些行需要:

real    0m0.036s
user    0m0.024s
sys  0m0.004s

但是可以肯定,正如其他人所说。数组越大,操作将越慢。

Sure it is OK. Actually you can see for yourself easily:

l = range(12000)
l = sorted(l, reverse=True)

Running the those lines on my machine took:

real    0m0.036s
user    0m0.024s
sys  0m0.004s

But sure as everyone else said. The larger the array the slower the operations will be.


回答 3

在临时代码中,我创建了包含数百万个元素的列表。我相信Python的列表实现仅受系统上内存量的限制。

此外,尽管列表很大,但列表方法/函数仍应继续工作。

如果您关心性能,那么值得研究一下NumPy之类的库。

In casual code I’ve created lists with millions of elements. I believe that Python’s implementation of lists are only bound by the amount of memory on your system.

In addition, the list methods / functions should continue to work despite the size of the list.

If you care about performance, it might be worthwhile to look into a library such as NumPy.


回答 4

清单的性能特征在Effbot 进行了描述。

Python列表实际上是作为用于快速随机访问的向量实现的,因此容器基本上将容纳与内存中的空间一样多的项目。(您需要用于列表中包含的指针的空间以及在内存中用于指向的对象的空间。)

追加是O(1)(摊销的恒定复杂度),但是,插入/从序列中间删除将需要O(n)(线性复杂度)重新排序,这将随着列表中元素数量的增加而变慢。

您的排序问题更加细微,因为比较操作可能会花费无数的时间。如果您执行的比较缓慢,则需要花费很长时间,尽管这不是Python的list数据类型的错。

反转只需要交换列表中所有指针所需的时间O(n)(由于触摸每个指针一次,所以有必要(线性复杂度))。

Performance characteristics for lists are described on Effbot.

Python lists are actually implemented as vector for fast random access, so the container will basically hold as many items as there is space for in memory. (You need space for pointers contained in the list as well as space in memory for the object(s) being pointed to.)

Appending is O(1) (amortized constant complexity), however, inserting into/deleting from the middle of the sequence will require an O(n) (linear complexity) reordering, which will get slower as the number of elements in your list.

Your sorting question is more nuanced, since the comparison operation can take an unbounded amount of time. If you’re performing really slow comparisons, it will take a long time, though it’s no fault of Python’s list data type.

Reversal just takes the amount of time it required to swap all the pointers in the list (necessarily O(n) (linear complexity), since you touch each pointer once).


回答 5

12000个元素在Python中什么都没有…实际上,只要Python解释器在您的系统上具有内存,元素的数量就可以增加。

12000 elements is nothing in Python… and actually the number of elements can go as far as the Python interpreter has memory on your system.


回答 6

对于不同的系统,它会有所不同(取决于RAM)。最简单的找出方法是

import six six.MAXSIZE 9223372036854775807 这使的最大尺寸listdict太,按照该文件

It varies for different systems (depends on RAM). The easiest way to find out is

import six six.MAXSIZE 9223372036854775807 This gives the max size of list and dict too ,as per the documentation


回答 7

我想说,您仅受可用RAM总量的限制。显然,数组越大,对其进行的操作就越长。

I’d say you’re only limited by the total amount of RAM available. Obviously the larger the array the longer operations on it will take.


回答 8

我是在x64位系统上从这里获得的:win32上的Python 3.7.0b5(v3.7.0b5:abb8802389,2018年5月31日,01:54:01)[MSC v.1913 64位(AMD64)]

I got this from here on a x64 bit system: Python 3.7.0b5 (v3.7.0b5:abb8802389, May 31 2018, 01:54:01) [MSC v.1913 64 bit (AMD64)] on win32


回答 9

列表号没有限制。导致错误的主要原因是RAM。请升级您的内存大小。

There is no limitation of list number. The main reason which causes your error is the RAM. Please upgrade your memory size.


如何从Python异步运行外部命令?

问题:如何从Python异步运行外部命令?

我需要从Python脚本异步运行Shell命令。我的意思是,我希望我的Python脚本能够在外部命令关闭并继续执行所需操作的同时继续运行。

我读了这篇文章:

在Python中调用外部命令

然后我os.system()去做了一些测试,如果我&在命令末尾使用它,看起来就可以完成这项工作,这样我就不必等待它返回。我想知道的是,这是否是完成此任务的正确方法?我试过了,commands.call()但是对我来说不起作用,因为它会阻塞外部命令。

请告诉我是否os.system()建议这样做,或者我应该尝试其他方法。

I need to run a shell command asynchronously from a Python script. By this I mean that I want my Python script to continue running while the external command goes off and does whatever it needs to do.

I read this post:

Calling an external command in Python

I then went off and did some testing, and it looks like os.system() will do the job provided that I use & at the end of the command so that I don’t have to wait for it to return. What I am wondering is if this is the proper way to accomplish such a thing? I tried commands.call() but it will not work for me because it blocks on the external command.

Please let me know if using os.system() for this is advisable or if I should try some other route.


回答 0

subprocess.Popen正是您想要的。

from subprocess import Popen
p = Popen(['watch', 'ls']) # something long running
# ... do other stuff while subprocess is running
p.terminate()

(编辑以完成评论的答案)

Popen实例可以执行其他各种操作,例如可以poll()查看它是否仍在运行,还可以communicate()使用它在stdin上发送数据,并等待其终止。

subprocess.Popen does exactly what you want.

from subprocess import Popen
p = Popen(['watch', 'ls']) # something long running
# ... do other stuff while subprocess is running
p.terminate()

(Edit to complete the answer from comments)

The Popen instance can do various other things like you can poll() it to see if it is still running, and you can communicate() with it to send it data on stdin, and wait for it to terminate.


回答 1

如果要并行运行许多进程,然后在它们产生结果时进行处理,则可以使用轮询,如下所示:

from subprocess import Popen, PIPE
import time

running_procs = [
    Popen(['/usr/bin/my_cmd', '-i %s' % path], stdout=PIPE, stderr=PIPE)
    for path in '/tmp/file0 /tmp/file1 /tmp/file2'.split()]

while running_procs:
    for proc in running_procs:
        retcode = proc.poll()
        if retcode is not None: # Process finished.
            running_procs.remove(proc)
            break
        else: # No process is done, wait a bit and check again.
            time.sleep(.1)
            continue

    # Here, `proc` has finished with return code `retcode`
    if retcode != 0:
        """Error handling."""
    handle_results(proc.stdout)

控制流有些混乱,因为我正试图将其缩小—您可以根据自己的口味进行重构。:-)

这具有先为早期处理请求提供服务的优势。如果您调用communicate第一个正在运行的进程,而事实证明运行时间最长,则其他正在运行的进程在可能已经处理完它们的结果时将一直闲置在那里。

If you want to run many processes in parallel and then handle them when they yield results, you can use polling like in the following:

from subprocess import Popen, PIPE
import time

running_procs = [
    Popen(['/usr/bin/my_cmd', '-i %s' % path], stdout=PIPE, stderr=PIPE)
    for path in '/tmp/file0 /tmp/file1 /tmp/file2'.split()]

while running_procs:
    for proc in running_procs:
        retcode = proc.poll()
        if retcode is not None: # Process finished.
            running_procs.remove(proc)
            break
        else: # No process is done, wait a bit and check again.
            time.sleep(.1)
            continue

    # Here, `proc` has finished with return code `retcode`
    if retcode != 0:
        """Error handling."""
    handle_results(proc.stdout)

The control flow there is a little bit convoluted because I’m trying to make it small — you can refactor to your taste. :-)

This has the advantage of servicing the early-finishing requests first. If you call communicate on the first running process and that turns out to run the longest, the other running processes will have been sitting there idle when you could have been handling their results.


回答 2

我想知道的是[os.system()]是否是完成此类任务的正确方法?

os.system()不是正确的方法。这就是每个人都说要使用的原因subprocess

有关更多信息,请阅读http://docs.python.org/library/os.html#os.system

子流程模块提供了更强大的功能来生成新流程并检索其结果。使用该模块优于使用此功能。使用子流程模块。尤其要检查“子过程模块”部分的“替换旧功能”。

What I am wondering is if this [os.system()] is the proper way to accomplish such a thing?

No. os.system() is not the proper way. That’s why everyone says to use subprocess.

For more information, read http://docs.python.org/library/os.html#os.system

The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function. Use the subprocess module. Check especially the Replacing Older Functions with the subprocess Module section.


回答 3

我使用asyncproc模块取得了成功,该模块很好地处理了流程的输出。例如:

import os
from asynproc import Process
myProc = Process("myprogram.app")

while True:
    # check to see if process has ended
    poll = myProc.wait(os.WNOHANG)
    if poll is not None:
        break
    # print any new output
    out = myProc.read()
    if out != "":
        print out

I’ve had good success with the asyncproc module, which deals nicely with the output from the processes. For example:

import os
from asynproc import Process
myProc = Process("myprogram.app")

while True:
    # check to see if process has ended
    poll = myProc.wait(os.WNOHANG)
    if poll is not None:
        break
    # print any new output
    out = myProc.read()
    if out != "":
        print out

回答 4

pexpect与非阻塞阅读行结合使用是另一种方法。Pexpect解决了死锁问题,使您可以轻松地在后台运行进程,并在进程吐出预定义的字符串时提供简便的方法来进行回调,并且通常使与进程的交互更加容易。

Using pexpect with non-blocking readlines is another way to do this. Pexpect solves the deadlock problems, allows you to easily run the processes in the background, and gives easy ways to have callbacks when your process spits out predefined strings, and generally makes interacting with the process much easier.


回答 5

考虑到“我不必等待它返回”,最简单的解决方案之一就是:

subprocess.Popen( \
    [path_to_executable, arg1, arg2, ... argN],
    creationflags = subprocess.CREATE_NEW_CONSOLE,
).pid

但是…据我所读,这不是“ subprocess.CREATE_NEW_CONSOLE标记完成此事的正确方法”,因为标志会产生安全风险。

这里发生的关键事情是使用subprocess.CREATE_NEW_CONSOLE来创建新的控制台,并.pid(返回进程ID,以便以后可以检查程序是否需要),以免等待程序完成其工作。

Considering “I don’t have to wait for it to return”, one of the easiest solutions will be this:

subprocess.Popen( \
    [path_to_executable, arg1, arg2, ... argN],
    creationflags = subprocess.CREATE_NEW_CONSOLE,
).pid

But… From what I read this is not “the proper way to accomplish such a thing” because of security risks created by subprocess.CREATE_NEW_CONSOLE flag.

The key things that happen here is use of subprocess.CREATE_NEW_CONSOLE to create new console and .pid (returns process ID so that you could check program later on if you want to) so that not to wait for program to finish its job.


回答 6

我在使用Python中的s3270脚本软件尝试连接到3270终端时遇到相同的问题。现在,我在这里找到的Process子类解决了这个问题:

http://code.activestate.com/recipes/440554/

这是从文件中获取的示例:

def recv_some(p, t=.1, e=1, tr=5, stderr=0):
    if tr < 1:
        tr = 1
    x = time.time()+t
    y = []
    r = ''
    pr = p.recv
    if stderr:
        pr = p.recv_err
    while time.time() < x or r:
        r = pr()
        if r is None:
            if e:
                raise Exception(message)
            else:
                break
        elif r:
            y.append(r)
        else:
            time.sleep(max((x-time.time())/tr, 0))
    return ''.join(y)

def send_all(p, data):
    while len(data):
        sent = p.send(data)
        if sent is None:
            raise Exception(message)
        data = buffer(data, sent)

if __name__ == '__main__':
    if sys.platform == 'win32':
        shell, commands, tail = ('cmd', ('dir /w', 'echo HELLO WORLD'), '\r\n')
    else:
        shell, commands, tail = ('sh', ('ls', 'echo HELLO WORLD'), '\n')

    a = Popen(shell, stdin=PIPE, stdout=PIPE)
    print recv_some(a),
    for cmd in commands:
        send_all(a, cmd + tail)
        print recv_some(a),
    send_all(a, 'exit' + tail)
    print recv_some(a, e=0)
    a.wait()

I have the same problem trying to connect to an 3270 terminal using the s3270 scripting software in Python. Now I’m solving the problem with an subclass of Process that I found here:

http://code.activestate.com/recipes/440554/

And here is the sample taken from file:

def recv_some(p, t=.1, e=1, tr=5, stderr=0):
    if tr < 1:
        tr = 1
    x = time.time()+t
    y = []
    r = ''
    pr = p.recv
    if stderr:
        pr = p.recv_err
    while time.time() < x or r:
        r = pr()
        if r is None:
            if e:
                raise Exception(message)
            else:
                break
        elif r:
            y.append(r)
        else:
            time.sleep(max((x-time.time())/tr, 0))
    return ''.join(y)

def send_all(p, data):
    while len(data):
        sent = p.send(data)
        if sent is None:
            raise Exception(message)
        data = buffer(data, sent)

if __name__ == '__main__':
    if sys.platform == 'win32':
        shell, commands, tail = ('cmd', ('dir /w', 'echo HELLO WORLD'), '\r\n')
    else:
        shell, commands, tail = ('sh', ('ls', 'echo HELLO WORLD'), '\n')

    a = Popen(shell, stdin=PIPE, stdout=PIPE)
    print recv_some(a),
    for cmd in commands:
        send_all(a, cmd + tail)
        print recv_some(a),
    send_all(a, 'exit' + tail)
    print recv_some(a, e=0)
    a.wait()

回答 7

接受的答案旧。

我在这里找到了一个更好的现代答案:

https://kevinmccarthy.org/2016/07/25/streaming-subprocess-stdin-and-stdout-with-asyncio-in-python/

并进行了一些更改:

  1. 使它在Windows上工作
  2. 使它与多个命令一起工作
import sys
import asyncio

if sys.platform == "win32":
    asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())


async def _read_stream(stream, cb):
    while True:
        line = await stream.readline()
        if line:
            cb(line)
        else:
            break


async def _stream_subprocess(cmd, stdout_cb, stderr_cb):
    try:
        process = await asyncio.create_subprocess_exec(
            *cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
        )

        await asyncio.wait(
            [
                _read_stream(process.stdout, stdout_cb),
                _read_stream(process.stderr, stderr_cb),
            ]
        )
        rc = await process.wait()
        return process.pid, rc
    except OSError as e:
        # the program will hang if we let any exception propagate
        return e


def execute(*aws):
    """ run the given coroutines in an asyncio loop
    returns a list containing the values returned from each coroutine.
    """
    loop = asyncio.get_event_loop()
    rc = loop.run_until_complete(asyncio.gather(*aws))
    loop.close()
    return rc


def printer(label):
    def pr(*args, **kw):
        print(label, *args, **kw)

    return pr


def name_it(start=0, template="s{}"):
    """a simple generator for task names
    """
    while True:
        yield template.format(start)
        start += 1


def runners(cmds):
    """
    cmds is a list of commands to excecute as subprocesses
    each item is a list appropriate for use by subprocess.call
    """
    next_name = name_it().__next__
    for cmd in cmds:
        name = next_name()
        out = printer(f"{name}.stdout")
        err = printer(f"{name}.stderr")
        yield _stream_subprocess(cmd, out, err)


if __name__ == "__main__":
    cmds = (
        [
            "sh",
            "-c",
            """echo "$SHELL"-stdout && sleep 1 && echo stderr 1>&2 && sleep 1 && echo done""",
        ],
        [
            "bash",
            "-c",
            "echo 'hello, Dave.' && sleep 1 && echo dave_err 1>&2 && sleep 1 && echo done",
        ],
        [sys.executable, "-c", 'print("hello from python");import sys;sys.exit(2)'],
    )

    print(execute(*runners(cmds)))

示例命令不可能在您的系统上完美地工作,也不可能处理奇怪的错误,但是此代码确实演示了一种使用asyncio运行多个子进程并输出输出的方法。

The accepted answer is very old.

I found a better modern answer here:

https://kevinmccarthy.org/2016/07/25/streaming-subprocess-stdin-and-stdout-with-asyncio-in-python/

and made some changes:

  1. make it work on windows
  2. make it work with multiple commands
import sys
import asyncio

if sys.platform == "win32":
    asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())


async def _read_stream(stream, cb):
    while True:
        line = await stream.readline()
        if line:
            cb(line)
        else:
            break


async def _stream_subprocess(cmd, stdout_cb, stderr_cb):
    try:
        process = await asyncio.create_subprocess_exec(
            *cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
        )

        await asyncio.wait(
            [
                _read_stream(process.stdout, stdout_cb),
                _read_stream(process.stderr, stderr_cb),
            ]
        )
        rc = await process.wait()
        return process.pid, rc
    except OSError as e:
        # the program will hang if we let any exception propagate
        return e


def execute(*aws):
    """ run the given coroutines in an asyncio loop
    returns a list containing the values returned from each coroutine.
    """
    loop = asyncio.get_event_loop()
    rc = loop.run_until_complete(asyncio.gather(*aws))
    loop.close()
    return rc


def printer(label):
    def pr(*args, **kw):
        print(label, *args, **kw)

    return pr


def name_it(start=0, template="s{}"):
    """a simple generator for task names
    """
    while True:
        yield template.format(start)
        start += 1


def runners(cmds):
    """
    cmds is a list of commands to excecute as subprocesses
    each item is a list appropriate for use by subprocess.call
    """
    next_name = name_it().__next__
    for cmd in cmds:
        name = next_name()
        out = printer(f"{name}.stdout")
        err = printer(f"{name}.stderr")
        yield _stream_subprocess(cmd, out, err)


if __name__ == "__main__":
    cmds = (
        [
            "sh",
            "-c",
            """echo "$SHELL"-stdout && sleep 1 && echo stderr 1>&2 && sleep 1 && echo done""",
        ],
        [
            "bash",
            "-c",
            "echo 'hello, Dave.' && sleep 1 && echo dave_err 1>&2 && sleep 1 && echo done",
        ],
        [sys.executable, "-c", 'print("hello from python");import sys;sys.exit(2)'],
    )

    print(execute(*runners(cmds)))

It is unlikely that the example commands will work perfectly on your system, and it doesn’t handle weird errors, but this code does demonstrate one way to run multiple subprocesses using asyncio and stream the output.


回答 8

这里有几个答案,但是没有一个满足我的以下要求:

  1. 我不想等待命令完成或用子进程输出污染我的终端。

  2. 我想使用重定向运行bash脚本。

  3. 我想在我的bash脚本中支持管道(例如find ... | tar ...)。

满足以上要求的唯一组合是:

subprocess.Popen(['./my_script.sh "arg1" > "redirect/path/to"'],
                 stdout=subprocess.PIPE, 
                 stderr=subprocess.PIPE,
                 shell=True)

There are several answers here but none of them satisfied my below requirements:

  1. I don’t want to wait for command to finish or pollute my terminal with subprocess outputs.

  2. I want to run bash script with redirects.

  3. I want to support piping within my bash script (for example find ... | tar ...).

The only combination that satiesfies above requirements is:

subprocess.Popen(['./my_script.sh "arg1" > "redirect/path/to"'],
                 stdout=subprocess.PIPE, 
                 stderr=subprocess.PIPE,
                 shell=True)

回答 9

Python 3子过程示例在“等待命令异步终止”下对此进行了介绍:

import asyncio

proc = await asyncio.create_subprocess_exec(
    'ls','-lha',
    stdout=asyncio.subprocess.PIPE,
    stderr=asyncio.subprocess.PIPE)

# do something else while ls is working

# if proc takes very long to complete, the CPUs are free to use cycles for 
# other processes
stdout, stderr = await proc.communicate()

该过程完成后将立即开始运行await asyncio.create_subprocess_exec(...)。如果在您调用时还没有完成await proc.communicate(),它将在那儿等待,以便为您提供输出状态。如果完成,proc.communicate()将立即返回。

要点类似于Terrels的答案,但我认为Terrels的答案似乎使事情复杂化了。

请参阅asyncio.create_subprocess_exec以获取更多信息。

This is covered by Python 3 Subprocess Examples under “Wait for command to terminate asynchronously”:

import asyncio

proc = await asyncio.create_subprocess_exec(
    'ls','-lha',
    stdout=asyncio.subprocess.PIPE,
    stderr=asyncio.subprocess.PIPE)

# do something else while ls is working

# if proc takes very long to complete, the CPUs are free to use cycles for 
# other processes
stdout, stderr = await proc.communicate()

The process will start running as soon as the await asyncio.create_subprocess_exec(...) has completed. If it hasn’t finished by the time you call await proc.communicate(), it will wait there in order to give you your output status. If it has finished, proc.communicate() will return immediately.

The gist here is similar to Terrels answer but I think Terrels answer appears to overcomplicate things.

See asyncio.create_subprocess_exec for more information.


如何在python中从变量参数(kwargs)设置类属性

问题:如何在python中从变量参数(kwargs)设置类属性

假设我有一个带有构造函数(或其他函数)的类,该类接受可变数量的参数,然后有条件地将其设置为类属性。

我可以手动设置它们,但是似乎变量参数在python中足够普遍,因此应该有一个共同的习惯用法。但是我不确定如何动态地执行此操作。

我有一个使用eval的示例,但这并不安全。我想知道正确的方法-也许用lambda吗?

class Foo:
    def setAllManually(self, a=None, b=None, c=None):
        if a!=None: 
            self.a = a
        if b!=None:
            self.b = b
        if c!=None:
            self.c = c
    def setAllWithEval(self, **kwargs):
        for key in **kwargs:
            if kwargs[param] != None
                eval("self." + key + "=" + kwargs[param])

Suppose I have a class with a constructor (or other function) that takes a variable number of arguments and then sets them as class attributes conditionally.

I could set them manually, but it seems that variable parameters are common enough in python that there should be a common idiom for doing this. But I’m not sure how to do this dynamically.

I have an example using eval, but that’s hardly safe. I want to know the proper way to do this — maybe with lambda?

class Foo:
    def setAllManually(self, a=None, b=None, c=None):
        if a!=None: 
            self.a = a
        if b!=None:
            self.b = b
        if c!=None:
            self.c = c
    def setAllWithEval(self, **kwargs):
        for key in **kwargs:
            if kwargs[param] != None
                eval("self." + key + "=" + kwargs[param])

回答 0

您可以__dict__使用关键字参数更新属性(以字典的形式表示类属性):

class Bar(object):
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)

那么你就可以:

>>> bar = Bar(a=1, b=2)
>>> bar.a
1

和类似的东西:

allowed_keys = {'a', 'b', 'c'}
self.__dict__.update((k, v) for k, v in kwargs.items() if k in allowed_keys)

您可以预先过滤键(如果您仍在使用Python 2.x,请使用iteritems代替items)。

You could update the __dict__ attribute (which represents the instance attributes in the form of a dictionary) with the keyword arguments:

class Bar(object):
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)

then you can:

>>> bar = Bar(a=1, b=2)
>>> bar.a
1

and with something like:

allowed_keys = {'a', 'b', 'c'}
self.__dict__.update((k, v) for k, v in kwargs.items() if k in allowed_keys)

you could filter the keys beforehand (use iteritems instead of items if you’re still using Python 2.x).


回答 1

您可以使用以下setattr()方法:

class Foo:
  def setAllWithKwArgs(self, **kwargs):
    for key, value in kwargs.items():
      setattr(self, key, value)

有一种类似的getattr()方法来检索属性。

You can use the setattr() method:

class Foo:
  def setAllWithKwArgs(self, **kwargs):
    for key, value in kwargs.items():
      setattr(self, key, value)

There is an analogous getattr() method for retrieving attributes.


回答 2

此处的大多数答案都不是将所有允许的属性初始化为一个默认值的好方法。因此,要添加到@fqxp@mmj给出的答案中:

class Myclass:

    def __init__(self, **kwargs):
        # all those keys will be initialized as class attributes
        allowed_keys = set(['attr1','attr2','attr3'])
        # initialize all allowed keys to false
        self.__dict__.update((key, False) for key in allowed_keys)
        # and update the given keys by their given values
        self.__dict__.update((key, value) for key, value in kwargs.items() if key in allowed_keys)

Most answers here do not cover a good way to initialize all allowed attributes to just one default value. So, to add to the answers given by @fqxp and @mmj:

class Myclass:

    def __init__(self, **kwargs):
        # all those keys will be initialized as class attributes
        allowed_keys = set(['attr1','attr2','attr3'])
        # initialize all allowed keys to false
        self.__dict__.update((key, False) for key in allowed_keys)
        # and update the given keys by their given values
        self.__dict__.update((key, value) for key, value in kwargs.items() if key in allowed_keys)

回答 3

我提出了fqxp答案的一种变体,除了允许的属性外它还允许您设置属性的默认值

class Foo():
    def __init__(self, **kwargs):
        # define default attributes
        default_attr = dict(a=0, b=None, c=True)
        # define (additional) allowed attributes with no default value
        more_allowed_attr = ['d','e','f']
        allowed_attr = list(default_attr.keys()) + more_allowed_attr
        default_attr.update(kwargs)
        self.__dict__.update((k,v) for k,v in default_attr.items() if k in allowed_attr)

这是Python 3.x代码,对于Python 2.x,您需要至少进行一次调整,iteritems()以代替items()

I propose a variation of fqxp’s answer, which, in addition to allowed attributes, lets you set default values for attributes:

class Foo():
    def __init__(self, **kwargs):
        # define default attributes
        default_attr = dict(a=0, b=None, c=True)
        # define (additional) allowed attributes with no default value
        more_allowed_attr = ['d','e','f']
        allowed_attr = list(default_attr.keys()) + more_allowed_attr
        default_attr.update(kwargs)
        self.__dict__.update((k,v) for k,v in default_attr.items() if k in allowed_attr)

This is Python 3.x code, for Python 2.x you need at least one adjustment, iteritems() in place of items().

VERY LATE FOLLOW UP

I recently rewrote the above code as a class decorator, so that hard coding of attributes is reduced to a minimum. In some way it resembles the @dataclass decorator, which is what you might want to use instead.

# class decorator definition
def classattributes(default_attr,more_allowed_attr):
    def class_decorator(cls):
        def new_init(self,*args,**kwargs):
            allowed_attr = list(default_attr.keys()) + more_allowed_attr
            default_attr.update(kwargs)
            self.__dict__.update((k,v) for k,v in default_attr.items() if k in allowed_attr)
        cls.__init__ = new_init
        return cls
    return class_decorator

# usage:
# 1st arg is a dict of attributes with default values
# 2nd arg is a list of additional allowed attributes which may be instantiated or not
@classattributes( dict(a=0, b=None, c=True) , ['d','e','f'] )
class Foo():
    pass # add here class body except __init__

@classattributes( dict(g=0, h=None, j=True) , ['k','m','n'] )
class Bar():
    pass # add here class body except __init__

obj1 = Foo(d=999,c=False)
obj2 = Bar(h=-999,k="Hello")

obj1.__dict__ # {'a': 0, 'b': None, 'c': False, 'd': 999}
obj2.__dict__ # {'g': 0, 'h': -999, 'j': True, 'k': 'Hello'}

回答 4

另一个变体基于mmjfqxp的出色答案。如果我们想怎么办

  1. 避免硬编码允许的属性列表
  2. 直接和显式设置构造函数中每个属性的默认值
  3. 通过以下任一方式将kwarg限制为预定义的属性
    • 默默地拒绝无效的参数,可选地,
    • 引发错误。

“直接”是指避免使用多余的default_attributes字典。

class Bar(object):
    def __init__(self, **kwargs):

        # Predefine attributes with default values
        self.a = 0
        self.b = 0
        self.A = True
        self.B = True

        # get a list of all predefined values directly from __dict__
        allowed_keys = list(self.__dict__.keys())

        # Update __dict__ but only for keys that have been predefined 
        # (silently ignore others)
        self.__dict__.update((key, value) for key, value in kwargs.items() 
                             if key in allowed_keys)

        # To NOT silently ignore rejected keys
        rejected_keys = set(kwargs.keys()) - set(allowed_keys)
        if rejected_keys:
            raise ValueError("Invalid arguments in constructor:{}".format(rejected_keys))

不是重大突破,但对某人可能有用…

编辑: 如果我们的类使用@property装饰器使用getter和setter封装“受保护的”属性,并且如果我们希望能够使用构造函数设置这些属性,则我们可能希望allowed_keys使用的值扩展列表dir(self),如下所示:

allowed_keys = [i for i in dir(self) if "__" not in i and any([j.endswith(i) for j in self.__dict__.keys()])]

上面的代码不包括

  • 的任何隐藏变量dir()(基于“ __”的存在排除在外),以及
  • 在from dir()的属性名称(受保护或其他方式)的末尾找不到其名称的任何方法__dict__.keys(),因此可能仅保留@property装饰的方法。

此编辑可能仅对Python 3及更高版本有效。

Yet another variant based on the excellent answers by mmj and fqxp. What if we want to

  1. Avoid hardcoding a list of allowed attributes
  2. Directly and explicitly set default values for each attributes in the constructor
  3. Restrict kwargs to predefined attributes by either
    • silently rejecting invalid arguments or, alternatively,
    • raising an error.

By “directly”, I mean avoiding an extraneous default_attributes dictionary.

class Bar(object):
    def __init__(self, **kwargs):

        # Predefine attributes with default values
        self.a = 0
        self.b = 0
        self.A = True
        self.B = True

        # get a list of all predefined values directly from __dict__
        allowed_keys = list(self.__dict__.keys())

        # Update __dict__ but only for keys that have been predefined 
        # (silently ignore others)
        self.__dict__.update((key, value) for key, value in kwargs.items() 
                             if key in allowed_keys)

        # To NOT silently ignore rejected keys
        rejected_keys = set(kwargs.keys()) - set(allowed_keys)
        if rejected_keys:
            raise ValueError("Invalid arguments in constructor:{}".format(rejected_keys))

Not a major breakthrough, but maybe useful to someone…

EDIT: If our class uses @property decorators to encapsulate “protected” attributes with getters and setters, and if we want to be able to set these properties with our constructor, we may want to expand the allowed_keys list with values from dir(self), as follows:

allowed_keys = [i for i in dir(self) if "__" not in i and any([j.endswith(i) for j in self.__dict__.keys()])]

The above code excludes

  • any hidden variable from dir() (exclusion based on presence of “__”), and
  • any method from dir() whose name is not found in the end of an attribute name (protected or otherwise) from __dict__.keys(), thereby likely keeping only @property decorated methods.

This edit is likely only valid for Python 3 and above.


回答 5

class SymbolDict(object):
  def __init__(self, **kwargs):
    for key in kwargs:
      setattr(self, key, kwargs[key])

x = SymbolDict(foo=1, bar='3')
assert x.foo == 1

SymbolDict之所以叫该类,是因为它实际上是一本使用符号而不是字符串进行操作的字典。换句话说,您要做的x.foo不是,x['foo']而是在幕后做同样的事情。

class SymbolDict(object):
  def __init__(self, **kwargs):
    for key in kwargs:
      setattr(self, key, kwargs[key])

x = SymbolDict(foo=1, bar='3')
assert x.foo == 1

I called the class SymbolDict because it essentially is a dictionary that operates using symbols instead of strings. In other words, you do x.foo instead of x['foo'] but under the covers it’s really the same thing going on.


回答 6

以下解决方案vars(self).update(kwargs)还是self.__dict__.update(**kwargs)不够可靠,因为用户可以输入任何词典而不会出现错误消息。如果我需要检查用户是否插入以下签名(“ a1”,“ a2”,“ a3”,“ a4”,“ a5”),则该解决方案无效。此外,用户应该能够通过传递“位置参数”或“ kay-value对参数”来使用对象。

因此,我建议使用元类提供以下解决方案。

from inspect import Parameter, Signature

class StructMeta(type):
    def __new__(cls, name, bases, dict):
        clsobj = super().__new__(cls, name, bases, dict)
        sig = cls.make_signature(clsobj._fields)
        setattr(clsobj, '__signature__', sig)
        return clsobj

def make_signature(names):
    return Signature(
        Parameter(v, Parameter.POSITIONAL_OR_KEYWORD) for v in names
    )

class Structure(metaclass = StructMeta):
    _fields = []
    def __init__(self, *args, **kwargs):
        bond = self.__signature__.bind(*args, **kwargs)
        for name, val in bond.arguments.items():
            setattr(self, name, val)

if __name__ == 'main':

   class A(Structure):
      _fields = ['a1', 'a2']

   if __name__ == '__main__':
      a = A(a1 = 1, a2 = 2)
      print(vars(a))

      a = A(**{a1: 1, a2: 2})
      print(vars(a))

The following solutions vars(self).update(kwargs) or self.__dict__.update(**kwargs) are not robust, because the user can enter any dictionary with no error messages. If I need to check that the user insert the following signature (‘a1’, ‘a2’, ‘a3’, ‘a4’, ‘a5’) the solution does not work. Moreover, the user should be able to use the object by passing the “positional parameters” or the “kay-value pairs parameters”.

So I suggest the following solution by using a metaclass.

from inspect import Parameter, Signature

class StructMeta(type):
    def __new__(cls, name, bases, dict):
        clsobj = super().__new__(cls, name, bases, dict)
        sig = cls.make_signature(clsobj._fields)
        setattr(clsobj, '__signature__', sig)
        return clsobj

def make_signature(names):
    return Signature(
        Parameter(v, Parameter.POSITIONAL_OR_KEYWORD) for v in names
    )

class Structure(metaclass = StructMeta):
    _fields = []
    def __init__(self, *args, **kwargs):
        bond = self.__signature__.bind(*args, **kwargs)
        for name, val in bond.arguments.items():
            setattr(self, name, val)

if __name__ == 'main':

   class A(Structure):
      _fields = ['a1', 'a2']

   if __name__ == '__main__':
      a = A(a1 = 1, a2 = 2)
      print(vars(a))

      a = A(**{a1: 1, a2: 2})
      print(vars(a))

回答 7

他们可能是一个更好的解决方案,但我想到的是:

class Test:
    def __init__(self, *args, **kwargs):
        self.args=dict(**kwargs)

    def getkwargs(self):
        print(self.args)

t=Test(a=1, b=2, c="cats")
t.getkwargs()


python Test.py 
{'a': 1, 'c': 'cats', 'b': 2}

Their might be a better solution but what comes to mind for me is:

class Test:
    def __init__(self, *args, **kwargs):
        self.args=dict(**kwargs)

    def getkwargs(self):
        print(self.args)

t=Test(a=1, b=2, c="cats")
t.getkwargs()


python Test.py 
{'a': 1, 'c': 'cats', 'b': 2}

回答 8

这是通过幼虫最容易的

class Foo:
    def setAllWithKwArgs(self, **kwargs):
        for key, value in kwargs.items():
            setattr(self, key, value)

我的例子:

class Foo:
    def __init__(self, **kwargs):
        for key, value in kwargs.items():
            setattr(self, key, value)

door = Foo(size='180x70', color='red chestnut', material='oak')
print(door.size) #180x70

this one is the easiest via larsks

class Foo:
    def setAllWithKwArgs(self, **kwargs):
        for key, value in kwargs.items():
            setattr(self, key, value)

my example:

class Foo:
    def __init__(self, **kwargs):
        for key, value in kwargs.items():
            setattr(self, key, value)

door = Foo(size='180x70', color='red chestnut', material='oak')
print(door.size) #180x70

回答 9

我怀疑在大多数情况下使用命名的args(以获得更好的自我记录代码)可能会更好,因此它看起来可能像这样:

class Foo:
    def setAll(a=None, b=None, c=None):
        for key, value in (a, b, c):
            if (value != None):
                settattr(self, key, value)

I suspect it might be better in most instances to use named args (for better self documenting code) so it might look something like this:

class Foo:
    def setAll(a=None, b=None, c=None):
        for key, value in (a, b, c):
            if (value != None):
                settattr(self, key, value)

如何将数据作为字符串(而非文件)写入CSV格式?

问题:如何将数据作为字符串(而非文件)写入CSV格式?

我想将数据[1,2,'a','He said "what do you mean?"']转换为CSV格式的字符串。

通常会用到csv.writer()它,因为它处理所有疯狂的情况(逗号转义,引号转义,CSV方言等)。捕获的结果是csv.writer()期望输出到文件对象,而不是字符串。

我当前的解决方案是此功能有点怪异:

def CSV_String_Writeline(data):
    class Dummy_Writer:
        def write(self,instring):
            self.outstring = instring.strip("\r\n")
    dw = Dummy_Writer()
    csv_w = csv.writer( dw )
    csv_w.writerow(data)
    return dw.outstring

谁能提供一种仍然可以很好地处理边缘情况的更优雅的解决方案?

编辑:这是我最终完成的方式:

def csv2string(data):
    si = StringIO.StringIO()
    cw = csv.writer(si)
    cw.writerow(data)
    return si.getvalue().strip('\r\n')

I want to cast data like [1,2,'a','He said "what do you mean?"'] to a CSV-formatted string.

Normally one would use csv.writer() for this, because it handles all the crazy edge cases (comma escaping, quote mark escaping, CSV dialects, etc.) The catch is that csv.writer() expects to output to a file object, not to a string.

My current solution is this somewhat hacky function:

def CSV_String_Writeline(data):
    class Dummy_Writer:
        def write(self,instring):
            self.outstring = instring.strip("\r\n")
    dw = Dummy_Writer()
    csv_w = csv.writer( dw )
    csv_w.writerow(data)
    return dw.outstring

Can anyone give a more elegant solution that still handles the edge cases well?

Edit: Here’s how I ended up doing it:

def csv2string(data):
    si = StringIO.StringIO()
    cw = csv.writer(si)
    cw.writerow(data)
    return si.getvalue().strip('\r\n')

回答 0

您可以使用StringIO而不是自己的Dummy_Writer

此模块实现了类似文件的类,该类StringIO读写字符串缓冲区(也称为内存文件)。

还有cStringIO,这是StringIO该类的更快版本。

You could use StringIO instead of your own Dummy_Writer:

This module implements a file-like class, StringIO, that reads and writes a string buffer (also known as memory files).

There is also cStringIO, which is a faster version of the StringIO class.


回答 1

在Python 3中:

>>> import io
>>> import csv
>>> output = io.StringIO()
>>> csvdata = [1,2,'a','He said "what do you mean?"',"Whoa!\nNewlines!"]
>>> writer = csv.writer(output, quoting=csv.QUOTE_NONNUMERIC)
>>> writer.writerow(csvdata)
59
>>> output.getvalue()
'1,2,"a","He said ""what do you mean?""","Whoa!\nNewlines!"\r\n'

对于Python 2,需要更改一些细节:

>>> output = io.BytesIO()
>>> writer = csv.writer(output)
>>> writer.writerow(csvdata)
57L
>>> output.getvalue()
'1,2,a,"He said ""what do you mean?""","Whoa!\nNewlines!"\r\n'

In Python 3:

>>> import io
>>> import csv
>>> output = io.StringIO()
>>> csvdata = [1,2,'a','He said "what do you mean?"',"Whoa!\nNewlines!"]
>>> writer = csv.writer(output, quoting=csv.QUOTE_NONNUMERIC)
>>> writer.writerow(csvdata)
59
>>> output.getvalue()
'1,2,"a","He said ""what do you mean?""","Whoa!\nNewlines!"\r\n'

Some details need to be changed a bit for Python 2:

>>> output = io.BytesIO()
>>> writer = csv.writer(output)
>>> writer.writerow(csvdata)
57L
>>> output.getvalue()
'1,2,a,"He said ""what do you mean?""","Whoa!\nNewlines!"\r\n'

回答 2

我发现答案总的来说有点令人困惑。对于Python 2,这种用法对我有用:

import csv, io

def csv2string(data):
    si = io.BytesIO()
    cw = csv.writer(si)
    cw.writerow(data)
    return si.getvalue().strip('\r\n')

data=[1,2,'a','He said "what do you mean?"']
print csv2string(data)

I found the answers, all in all, a bit confusing. For Python 2, this usage worked for me:

import csv, io

def csv2string(data):
    si = io.BytesIO()
    cw = csv.writer(si)
    cw.writerow(data)
    return si.getvalue().strip('\r\n')

data=[1,2,'a','He said "what do you mean?"']
print csv2string(data)

回答 3

由于我大量使用此代码将结果从sanic作为csv数据异步流回用户,因此我为Python 3编写了以下代码段。

该代码段可让您一次又一次地重复使用相同的StringIo缓冲区。


import csv
from io import StringIO


class ArgsToCsv:
    def __init__(self, seperator=","):
        self.seperator = seperator
        self.buffer = StringIO()
        self.writer = csv.writer(self.buffer)

    def stringify(self, *args):
        self.writer.writerow(args)
        value = self.buffer.getvalue().strip("\r\n")
        self.buffer.seek(0)
        self.buffer.truncate(0)
        return value + "\n"

例:

csv_formatter = ArgsToCsv()

output += csv_formatter.stringify(
    10,
    """
    lol i have some pretty
    "freaky"
    strings right here \' yo!
    """,
    [10, 20, 30],
)

在github gist上查看更多用法:源代码和测试

since i use this quite a lot to stream results asynchronously from sanic back to the user as csv data i wrote the following snippet for Python 3.

The snippet lets you reuse the same StringIo buffer over and over again.


import csv
from io import StringIO


class ArgsToCsv:
    def __init__(self, seperator=","):
        self.seperator = seperator
        self.buffer = StringIO()
        self.writer = csv.writer(self.buffer)

    def stringify(self, *args):
        self.writer.writerow(args)
        value = self.buffer.getvalue().strip("\r\n")
        self.buffer.seek(0)
        self.buffer.truncate(0)
        return value + "\n"

example:

csv_formatter = ArgsToCsv()

output += csv_formatter.stringify(
    10,
    """
    lol i have some pretty
    "freaky"
    strings right here \' yo!
    """,
    [10, 20, 30],
)

Check out further usage at the github gist: source and test


回答 4

import csv
from StringIO import StringIO
with open('file.csv') as file:
    file = file.read()

stream = StringIO(file)

csv_file = csv.DictReader(stream)
import csv
from StringIO import StringIO
with open('file.csv') as file:
    file = file.read()

stream = StringIO(file)

csv_file = csv.DictReader(stream)

回答 5

这是适用于utf-8的版本。csvline2string仅用于一行,末尾没有换行符,csv2string用于多行,具有换行符:

import csv, io

def csvline2string(one_line_of_data):
    si = BytesIO.StringIO()
    cw = csv.writer(si)
    cw.writerow(one_line_of_data)
    return si.getvalue().strip('\r\n')

def csv2string(data):
    si = BytesIO.StringIO()
    cw = csv.writer(si)
    for one_line_of_data in data:
        cw.writerow(one_line_of_data)
    return si.getvalue()

Here’s the version that works for utf-8. csvline2string for just one line, without linebreaks at the end, csv2string for many lines, with linebreaks:

import csv, io

def csvline2string(one_line_of_data):
    si = BytesIO.StringIO()
    cw = csv.writer(si)
    cw.writerow(one_line_of_data)
    return si.getvalue().strip('\r\n')

def csv2string(data):
    si = BytesIO.StringIO()
    cw = csv.writer(si)
    for one_line_of_data in data:
        cw.writerow(one_line_of_data)
    return si.getvalue()

您如何在python中执行简单的“ chmod + x”操作?

问题:您如何在python中执行简单的“ chmod + x”操作?

我想从可执行的python脚本中创建文件。

import os
import stat
os.chmod('somefile', stat.S_IEXEC)

它似乎os.chmod没有像unix chmod那样“添加”权限。在最后一行注释掉的情况下,文件具有filemode -rw-r--r--,而在未注释掉的情况下,文件模式为---x------。如何u+x在保持其余模式不变的同时添加标志?

I want to create a file from within a python script that is executable.

import os
import stat
os.chmod('somefile', stat.S_IEXEC)

it appears os.chmod doesn’t ‘add’ permissions the way unix chmod does. With the last line commented out, the file has the filemode -rw-r--r--, with it not commented out, the file mode is ---x------. How can I just add the u+x flag while keeping the rest of the modes intact?


回答 0

使用os.stat()得到当前的权限,使用|或位在一起,并使用os.chmod()设置更新的权限。

例:

import os
import stat

st = os.stat('somefile')
os.chmod('somefile', st.st_mode | stat.S_IEXEC)

Use os.stat() to get the current permissions, use | to or the bits together, and use os.chmod() to set the updated permissions.

Example:

import os
import stat

st = os.stat('somefile')
os.chmod('somefile', st.st_mode | stat.S_IEXEC)

回答 1

对于生成可执行文件的工具(例如脚本),以下代码可能会有所帮助:

def make_executable(path):
    mode = os.stat(path).st_mode
    mode |= (mode & 0o444) >> 2    # copy R bits to X
    os.chmod(path, mode)

这使它(或多或少)尊重umask创建文件时的效果:仅为可读取的文件设置可执行文件。

用法:

path = 'foo.sh'
with open(path, 'w') as f:           # umask in effect when file is created
    f.write('#!/bin/sh\n')
    f.write('echo "hello world"\n')

make_executable(path)

For tools that generate executable files (e.g. scripts), the following code might be helpful:

def make_executable(path):
    mode = os.stat(path).st_mode
    mode |= (mode & 0o444) >> 2    # copy R bits to X
    os.chmod(path, mode)

This makes it (more or less) respect the umask that was in effect when the file was created: Executable is only set for those that can read.

Usage:

path = 'foo.sh'
with open(path, 'w') as f:           # umask in effect when file is created
    f.write('#!/bin/sh\n')
    f.write('echo "hello world"\n')

make_executable(path)

回答 2

如果知道所需的权限,则可以使用以下示例来简化操作。

Python 2:

os.chmod("/somedir/somefile", 0775)

Python 3:

os.chmod("/somedir/somefile", 0o775)

兼容(八进制转换):

os.chmod("/somedir/somefile", 509)

参考权限示例

If you know the permissions you want then the following example may be the way to keep it simple.

Python 2:

os.chmod("/somedir/somefile", 0775)

Python 3:

os.chmod("/somedir/somefile", 0o775)

Compatible with either (octal conversion):

os.chmod("/somedir/somefile", 509)

reference permissions examples


回答 3

您也可以这样做

>>> import os
>>> st = os.stat("hello.txt")

当前文件清单

$ ls -l hello.txt
-rw-r--r--  1 morrison  staff  17 Jan 13  2014 hello.txt

现在做。

>>> os.chmod("hello.txt", st.st_mode | 0o111)

然后您将在终端中看到此内容。

ls -l hello.txt    
-rwxr-xr-x  1 morrison  staff  17 Jan 13  2014 hello.txt

您可以按位或0o111使所有可执行文件,0o222使所有可写,和0o444使所有可读。

You can also do this

>>> import os
>>> st = os.stat("hello.txt")

Current listing of file

$ ls -l hello.txt
-rw-r--r--  1 morrison  staff  17 Jan 13  2014 hello.txt

Now do this.

>>> os.chmod("hello.txt", st.st_mode | 0o111)

and you will see this in the terminal.

ls -l hello.txt    
-rwxr-xr-x  1 morrison  staff  17 Jan 13  2014 hello.txt

You can bitwise or with 0o111 to make all executable, 0o222 to make all writable, and 0o444 to make all readable.


回答 4

尊重umask喜欢chmod +x

man chmod说如果augo没有给出,如:

chmod +x mypath

然后a使用,但与umask

字母ugoa的组合控制将更改哪些用户对该文件的访问权限:拥有该文件的用户(u),该文件组中的其他用户(g),不在该文件组中的其他用户(o)或全部用户(a)。如果没有给出这些,则效果就好像给出了(a)一样,但是不影响umask中设置的位。

这是一个完全模拟该行为的版本:

#!/usr/bin/env python3

import os
import stat

def get_umask():
    umask = os.umask(0)
    os.umask(umask)
    return umask

def chmod_plus_x(path):
    os.chmod(
        path,
        os.stat(path).st_mode |
        (
            (
                stat.S_IXUSR |
                stat.S_IXGRP |
                stat.S_IXOTH
            )
            & ~get_umask()
        )
    )

chmod_plus_x('.gitignore')

另请参阅:如何获取Python中的默认文件权限?

已在Ubuntu 16.04,Python 3.5.2中进行了测试。

Respect umask like chmod +x

man chmod says that if augo is not given as in:

chmod +x mypath

then a is used but with umask:

A combination of the letters ugoa controls which users’ access to the file will be changed: the user who owns it (u), other users in the file’s group (g), other users not in the file’s group (o), or all users (a). If none of these are given, the effect is as if (a) were given, but bits that are set in the umask are not affected.

Here is a version that simulates that behavior exactly:

#!/usr/bin/env python3

import os
import stat

def get_umask():
    umask = os.umask(0)
    os.umask(umask)
    return umask

def chmod_plus_x(path):
    os.chmod(
        path,
        os.stat(path).st_mode |
        (
            (
                stat.S_IXUSR |
                stat.S_IXGRP |
                stat.S_IXOTH
            )
            & ~get_umask()
        )
    )

chmod_plus_x('.gitignore')

See also: How can I get the default file permissions in Python?

Tested in Ubuntu 16.04, Python 3.5.2.


回答 5

在python3中:

import os
os.chmod("somefile", 0o664)

请记住要添加0o前缀,因为权限设置为八进制整数,Python会自动将前导零的任何整数视为八进制。否则,您os.chmod("somefile", 1230)的确通过了,这是的八进制664

In python3:

import os
os.chmod("somefile", 0o664)

Remember to add the 0o prefix since permissions are set as an octal integer, and Python automatically treats any integer with a leading zero as octal. Otherwise, you are passing os.chmod("somefile", 1230) indeed, which is octal of 664.


回答 6

如果您使用的是Python 3.4+,则可以使用标准库的便捷pathlib

它的Path类具有内置的chmodstat方法。

from pathlib import Path


f = Path("/path/to/file.txt")
f.chmod(f.stat().st_mode | stat.S_IEXEC)

If you’re using Python 3.4+, you can use the standard library’s convenient pathlib.

Its Path class has built-in chmod and stat methods.

from pathlib import Path
import stat


f = Path("/path/to/file.txt")
f.chmod(f.stat().st_mode | stat.S_IEXEC)

将列表转换为集合会更改元素顺序

问题:将列表转换为集合会更改元素顺序

最近,我注意到当我将a转换listset元素的顺序发生变化,由字符排序。

考虑以下示例:

x=[1,2,20,6,210]
print x 
# [1, 2, 20, 6, 210] # the order is same as initial order

set(x)
# set([1, 2, 20, 210, 6]) # in the set(x) output order is sorted

我的问题是-

  1. 为什么会这样呢?
  2. 如何进行设置操作(尤其是“设置差异”)而不丢失初始顺序?

Recently I noticed that when I am converting a list to set the order of elements is changed and is sorted by character.

Consider this example:

x=[1,2,20,6,210]
print x 
# [1, 2, 20, 6, 210] # the order is same as initial order

set(x)
# set([1, 2, 20, 210, 6]) # in the set(x) output order is sorted

My questions are –

  1. Why is this happening?
  2. How can I do set operations (especially Set Difference) without losing the initial order?

回答 0

  1. A set是无序的数据结构,因此它不保留插入顺序。

  2. 这取决于您的要求。如果您有一个普通列表,并且想要在保留列表顺序的同时删除一些元素集,则可以通过列表理解来做到这一点:

    >>> a = [1, 2, 20, 6, 210]
    >>> b = set([6, 20, 1])
    >>> [x for x in a if x not in b]
    [2, 210]

    如果需要同时支持快速成员资格测试保留插入顺序的数据结构,则可以使用Python字典的键,从Python 3.7开始保证可以保留插入顺序:

    >>> a = dict.fromkeys([1, 2, 20, 6, 210])
    >>> b = dict.fromkeys([6, 20, 1])
    >>> dict.fromkeys(x for x in a if x not in b)
    {2: None, 210: None}

    b并不需要在这里订购–您也可以使用set。请注意,a.keys() - b.keys()返回的设置差为set,因此不会保留插入顺序。

    在旧版本的Python中,您可以collections.OrderedDict改用:

    >>> a = collections.OrderedDict.fromkeys([1, 2, 20, 6, 210])
    >>> b = collections.OrderedDict.fromkeys([6, 20, 1])
    >>> collections.OrderedDict.fromkeys(x for x in a if x not in b)
    OrderedDict([(2, None), (210, None)])
  1. A set is an unordered data structure, so it does not preserve the insertion order.

  2. This depends on your requirements. If you have an normal list, and want to remove some set of elements while preserving the order of the list, you can do this with a list comprehension:

    >>> a = [1, 2, 20, 6, 210]
    >>> b = set([6, 20, 1])
    >>> [x for x in a if x not in b]
    [2, 210]
    

    If you need a data structure that supports both fast membership tests and preservation of insertion order, you can use the keys of a Python dictionary, which starting from Python 3.7 is guaranteed to preserve the insertion order:

    >>> a = dict.fromkeys([1, 2, 20, 6, 210])
    >>> b = dict.fromkeys([6, 20, 1])
    >>> dict.fromkeys(x for x in a if x not in b)
    {2: None, 210: None}
    

    b doesn’t really need to be ordered here – you could use a set as well. Note that a.keys() - b.keys() returns the set difference as a set, so it won’t preserve the insertion order.

    In older versions of Python, you can use collections.OrderedDict instead:

    >>> a = collections.OrderedDict.fromkeys([1, 2, 20, 6, 210])
    >>> b = collections.OrderedDict.fromkeys([6, 20, 1])
    >>> collections.OrderedDict.fromkeys(x for x in a if x not in b)
    OrderedDict([(2, None), (210, None)])
    

回答 1

在Python 3.6中,set()现在应该保持顺序,但是对于Python 2和Python 3还有另一种解决方案:

>>> x = [1, 2, 20, 6, 210]
>>> sorted(set(x), key=x.index)
[1, 2, 20, 6, 210]

In Python 3.6, set() now should keep the order, but there is another solution for Python 2 and 3:

>>> x = [1, 2, 20, 6, 210]
>>> sorted(set(x), key=x.index)
[1, 2, 20, 6, 210]

回答 2

回答第一个问题时,集合是针对集合操作进行优化的数据结构。像数学集一样,它不强制或维持元素的任何特定顺序。集合的抽象概念不强制执行顺序,因此不需要强制执行。从列表创建集合时,Python可以根据其用于集合的内部实现的需要自由更改元素的顺序,从而能够高效地执行集合操作。

Answering your first question, a set is a data structure optimized for set operations. Like a mathematical set, it does not enforce or maintain any particular order of the elements. The abstract concept of a set does not enforce order, so the implementation is not required to. When you create a set from a list, Python has the liberty to change the order of the elements for the needs of the internal implementation it uses for a set, which is able to perform set operations efficiently.


回答 3

通过以下功能删除重复项并保留顺序

def unique(sequence):
    seen = set()
    return [x for x in sequence if not (x in seen or seen.add(x))]

检查此链接

remove duplicates and preserve order by below function

def unique(sequence):
    seen = set()
    return [x for x in sequence if not (x in seen or seen.add(x))]

check this link


回答 4

在数学中,有集合有序集合(osets)。

  • set:唯一元素的无序容器(实现)
  • oset:唯一元素的有序容器(未实现)

在Python中,仅直接实现集合。我们可以使用常规的dict键(3.7+)模拟osets 。

给定

a = [1, 2, 20, 6, 210, 2, 1]
b = {2, 6}

oset = dict.fromkeys(a).keys()
# dict_keys([1, 2, 20, 6, 210])

演示版

删除副本,保留插入顺序。

list(oset)
# [1, 2, 20, 6, 210]

对dict键进行类似集合的操作。

oset - b
# {1, 20, 210}

oset | b
# {1, 2, 5, 6, 20, 210}

oset & b
# {2, 6}

oset ^ b
# {1, 5, 20, 210}

细节

注意:无序结构并不排除有序元素。相反,不能保证维持订单。例:

assert {1, 2, 3} == {2, 3, 1}                    # sets (order is ignored)

assert [1, 2, 3] != [2, 3, 1]                    # lists (order is guaranteed)

可能会很高兴地发现列表多集(mset)是另外两种引人入胜的数学数据结构:

  • list:允许重复的元素的有序容器(已实现)
  • mset:允许重复的元素的无序容器(NotImplemented)*

摘要

Container | Ordered | Unique | Implemented
----------|---------|--------|------------
set       |    n    |    y   |     y
oset      |    y    |    y   |     n
list      |    y    |    n   |     y
mset      |    n    |    n   |     n*  

*可以使用collections.Counter()dict样的多重性(计数)映射间接模拟多重集。

In mathematics, there are sets and ordered sets (osets).

  • set: an unordered container of unique elements (Implemented)
  • oset: an ordered container of unique elements (NotImplemented)

In Python, only sets are directly implemented. We can emulate osets with regular dict keys (3.7+).

Given

a = [1, 2, 20, 6, 210, 2, 1]
b = {2, 6}

Code

oset = dict.fromkeys(a).keys()
# dict_keys([1, 2, 20, 6, 210])

Demo

Replicates are removed, insertion-order is preserved.

list(oset)
# [1, 2, 20, 6, 210]

Set-like operations on dict keys.

oset - b
# {1, 20, 210}

oset | b
# {1, 2, 5, 6, 20, 210}

oset & b
# {2, 6}

oset ^ b
# {1, 5, 20, 210}

Details

Note: an unordered structure does not preclude ordered elements. Rather, maintained order is not guaranteed. Example:

assert {1, 2, 3} == {2, 3, 1}                    # sets (order is ignored)

assert [1, 2, 3] != [2, 3, 1]                    # lists (order is guaranteed)

One may be pleased to discover that a list and multiset (mset) are two more fascinating, mathematical data structures:

  • list: an ordered container of elements that permits replicates (Implemented)
  • mset: an unordered container of elements that permits replicates (NotImplemented)*

Summary

Container | Ordered | Unique | Implemented
----------|---------|--------|------------
set       |    n    |    y   |     y
oset      |    y    |    y   |     n
list      |    y    |    n   |     y
mset      |    n    |    n   |     n*  

*A multiset can be indirectly emulated with collections.Counter(), a dict-like mapping of multiplicities (counts).


回答 5

如其他答案所示,集合是不保留元素顺序的数据结构(和数学概念)-

但是,通过使用集合和字典的组合,可以实现所需的功能-尝试使用以下代码段:

# save the element order in a dict:
x_dict = dict(x,y for y, x in enumerate(my_list) )
x_set = set(my_list)
#perform desired set operations
...
#retrieve ordered list from the set:
new_list = [None] * len(new_set)
for element in new_set:
   new_list[x_dict[element]] = element

As denoted in other answers, sets are data structures (and mathematical concepts) that do not preserve the element order –

However, by using a combination of sets and dictionaries, it is possible that you can achieve wathever you want – try using these snippets:

# save the element order in a dict:
x_dict = dict(x,y for y, x in enumerate(my_list) )
x_set = set(my_list)
#perform desired set operations
...
#retrieve ordered list from the set:
new_list = [None] * len(new_set)
for element in new_set:
   new_list[x_dict[element]] = element

回答 6

在Sven的答案的基础上,我发现使用了collections.OrderedDict这样的代码,它帮助我完成了想要的工作,并允许我向dict中添加更多项:

import collections

x=[1,2,20,6,210]
z=collections.OrderedDict.fromkeys(x)
z
OrderedDict([(1, None), (2, None), (20, None), (6, None), (210, None)])

如果要添加项目,但仍将其视为一组,则可以执行以下操作:

z['nextitem']=None

您可以在字典上执行类似z.keys()的操作并获取集合:

z.keys()
[1, 2, 20, 6, 210]

Building on Sven’s answer, I found using collections.OrderedDict like so helped me accomplish what you want plus allow me to add more items to the dict:

import collections

x=[1,2,20,6,210]
z=collections.OrderedDict.fromkeys(x)
z
OrderedDict([(1, None), (2, None), (20, None), (6, None), (210, None)])

If you want to add items but still treat it like a set you can just do:

z['nextitem']=None

And you can perform an operation like z.keys() on the dict and get the set:

z.keys()
[1, 2, 20, 6, 210]

回答 7

上面最高分数概念的实现将其带回到列表中:

def SetOfListInOrder(incominglist):
    from collections import OrderedDict
    outtemp = OrderedDict()
    for item in incominglist:
        outtemp[item] = None
    return(list(outtemp))

在Python 3.6和Python 2.7上进行了简短测试。

An implementation of the highest score concept above that brings it back to a list:

def SetOfListInOrder(incominglist):
    from collections import OrderedDict
    outtemp = OrderedDict()
    for item in incominglist:
        outtemp[item] = None
    return(list(outtemp))

Tested (briefly) on Python 3.6 and Python 2.7.


回答 8

如果您要在两个初始列表中进行少量元素设置差值运算,而不是使用collections.OrderedDict使实现复杂化并使可读性降低的元素,则可以使用:

# initial lists on which you want to do set difference
>>> nums = [1,2,2,3,3,4,4,5]
>>> evens = [2,4,4,6]
>>> evens_set = set(evens)
>>> result = []
>>> for n in nums:
...   if not n in evens_set and not n in result:
...     result.append(n)
... 
>>> result
[1, 3, 5]

它的时间复杂度不是很好,但是它整洁且易于阅读。

In case you have a small number of elements in your two initial lists on which you want to do set difference operation, instead of using collections.OrderedDict which complicates the implementation and makes it less readable, you can use:

# initial lists on which you want to do set difference
>>> nums = [1,2,2,3,3,4,4,5]
>>> evens = [2,4,4,6]
>>> evens_set = set(evens)
>>> result = []
>>> for n in nums:
...   if not n in evens_set and not n in result:
...     result.append(n)
... 
>>> result
[1, 3, 5]

Its time complexity is not that good but it is neat and easy to read.


回答 9

有趣的是,人们总是使用“现实世界中的问题”开玩笑来解释理论科学中的定义。

如果设置有顺序,则首先需要弄清楚以下问题。如果列表中有重复的元素,那么将其变成集合时的顺序应该是什么?如果我们将两个集合并集,顺序是什么?如果在同一元素上以不同顺序相交的两个集合相交,顺序是什么?

另外,set在搜索特定键方面要快得多,这在set操作中非常有用(这就是为什么需要set而不是list的原因)。

如果您真的在乎索引,只需将其保留为列表即可。如果仍要对许多列表中的元素进行设置操作,最简单的方法是为每个列表创建一个字典,该列表中的集合具有相同的键以及包含原始列表中所有键索引的list值。

def indx_dic(l):
    dic = {}
    for i in range(len(l)):
        if l[i] in dic:
            dic.get(l[i]).append(i)
        else:
            dic[l[i]] = [i]
    return(dic)

a = [1,2,3,4,5,1,3,2]
set_a  = set(a)
dic_a = indx_dic(a)

print(dic_a)
# {1: [0, 5], 2: [1, 7], 3: [2, 6], 4: [3], 5: [4]}
print(set_a)
# {1, 2, 3, 4, 5}

It’s interesting that people always use ‘real world problem’ to make joke on the definition in theoretical science.

If set has order, you first need to figure out the following problems. If your list has duplicate elements, what should the order be when you turn it into a set? What is the order if we union two sets? What is the order if we intersect two sets with different order on the same elements?

Plus, set is much faster in searching for a particular key which is very good in sets operation (and that’s why you need a set, but not list).

If you really care about the index, just keep it as a list. If you still want to do set operation on the elements in many lists, the simplest way is creating a dictionary for each list with the same keys in the set along with a value of list containing all the index of the key in the original list.

def indx_dic(l):
    dic = {}
    for i in range(len(l)):
        if l[i] in dic:
            dic.get(l[i]).append(i)
        else:
            dic[l[i]] = [i]
    return(dic)

a = [1,2,3,4,5,1,3,2]
set_a  = set(a)
dic_a = indx_dic(a)

print(dic_a)
# {1: [0, 5], 2: [1, 7], 3: [2, 6], 4: [3], 5: [4]}
print(set_a)
# {1, 2, 3, 4, 5}

回答 10

这是一种简单的方法:

x=[1,2,20,6,210]
print sorted(set(x))

Here’s an easy way to do it:

x=[1,2,20,6,210]
print sorted(set(x))

如何在大熊猫中测试字符串是否包含列表中的子字符串之一?

问题:如何在大熊猫中测试字符串是否包含列表中的子字符串之一?

有没有这将是一个组合的等同的任何功能df.isin()df[col].str.contains()

例如,假设我有系列 s = pd.Series(['cat','hat','dog','fog','pet']),并且我想找到s包含的任何一个的所有地方['og', 'at'],那么我想得到除“宠物”以外的所有东西。

我有一个解决方案,但这很不雅致:

searchfor = ['og', 'at']
found = [s.str.contains(x) for x in searchfor]
result = pd.DataFrame[found]
result.any()

有一个更好的方法吗?

Is there any function that would be the equivalent of a combination of df.isin() and df[col].str.contains()?

For example, say I have the series s = pd.Series(['cat','hat','dog','fog','pet']), and I want to find all places where s contains any of ['og', 'at'], I would want to get everything but ‘pet’.

I have a solution, but it’s rather inelegant:

searchfor = ['og', 'at']
found = [s.str.contains(x) for x in searchfor]
result = pd.DataFrame[found]
result.any()

Is there a better way to do this?


回答 0

一种选择是仅使用正则表达式|字符尝试匹配系列中单词中的每个子字符串s(仍使用str.contains)。

您可以通过将单词searchfor与结合在一起来构造正则表达式|

>>> searchfor = ['og', 'at']
>>> s[s.str.contains('|'.join(searchfor))]
0    cat
1    hat
2    dog
3    fog
dtype: object

就像@AndyHayden在下面的注释中指出的那样,请注意您的子字符串是否具有特殊字符,例如$^您想在字面上进行匹配。这些字符在正则表达式的上下文中具有特定含义,并且会影响匹配。

您可以通过转义非字母数字字符来使子字符串列表更安全re.escape

>>> import re
>>> matches = ['$money', 'x^y']
>>> safe_matches = [re.escape(m) for m in matches]
>>> safe_matches
['\\$money', 'x\\^y']

与结合使用时,此新列表中带有的字符串将逐字匹配每个字符str.contains

One option is just to use the regex | character to try to match each of the substrings in the words in your Series s (still using str.contains).

You can construct the regex by joining the words in searchfor with |:

>>> searchfor = ['og', 'at']
>>> s[s.str.contains('|'.join(searchfor))]
0    cat
1    hat
2    dog
3    fog
dtype: object

As @AndyHayden noted in the comments below, take care if your substrings have special characters such as $ and ^ which you want to match literally. These characters have specific meanings in the context of regular expressions and will affect the matching.

You can make your list of substrings safer by escaping non-alphanumeric characters with re.escape:

>>> import re
>>> matches = ['$money', 'x^y']
>>> safe_matches = [re.escape(m) for m in matches]
>>> safe_matches
['\\$money', 'x\\^y']

The strings with in this new list will match each character literally when used with str.contains.


回答 1

您可以使用str.containsregex模式单独使用OR (|)

s[s.str.contains('og|at')]

或者您可以将系列添加到,dataframe然后使用str.contains

df = pd.DataFrame(s)
df[s.str.contains('og|at')] 

输出:

0 cat
1 hat
2 dog
3 fog 

You can use str.contains alone with a regex pattern using OR (|):

s[s.str.contains('og|at')]

Or you could add the series to a dataframe then use str.contains:

df = pd.DataFrame(s)
df[s.str.contains('og|at')] 

Output:

0 cat
1 hat
2 dog
3 fog 

回答 2

这是一个单行lambda,它也可以工作:

df["TrueFalse"] = df['col1'].apply(lambda x: 1 if any(i in x for i in searchfor) else 0)

输入:

searchfor = ['og', 'at']

df = pd.DataFrame([('cat', 1000.0), ('hat', 2000000.0), ('dog', 1000.0), ('fog', 330000.0),('pet', 330000.0)], columns=['col1', 'col2'])

   col1  col2
0   cat 1000.0
1   hat 2000000.0
2   dog 1000.0
3   fog 330000.0
4   pet 330000.0

应用Lambda:

df["TrueFalse"] = df['col1'].apply(lambda x: 1 if any(i in x for i in searchfor) else 0)

输出:

    col1    col2        TrueFalse
0   cat     1000.0      1
1   hat     2000000.0   1
2   dog     1000.0      1
3   fog     330000.0    1
4   pet     330000.0    0

Here is a one line lambda that also works:

df["TrueFalse"] = df['col1'].apply(lambda x: 1 if any(i in x for i in searchfor) else 0)

Input:

searchfor = ['og', 'at']

df = pd.DataFrame([('cat', 1000.0), ('hat', 2000000.0), ('dog', 1000.0), ('fog', 330000.0),('pet', 330000.0)], columns=['col1', 'col2'])

   col1  col2
0   cat 1000.0
1   hat 2000000.0
2   dog 1000.0
3   fog 330000.0
4   pet 330000.0

Apply Lambda:

df["TrueFalse"] = df['col1'].apply(lambda x: 1 if any(i in x for i in searchfor) else 0)

Output:

    col1    col2        TrueFalse
0   cat     1000.0      1
1   hat     2000000.0   1
2   dog     1000.0      1
3   fog     330000.0    1
4   pet     330000.0    0

Python请求和持久会话

问题:Python请求和持久会话

我正在使用请求模块(Python 2.5的版本0.10.0)。我已经找到了如何将数据提交到网站上的登录表单并检索会话密钥的方法,但是我看不到在后续请求中使用此会话密钥的明显方法。有人可以在下面的代码中填写省略号还是建议其他方法?

>>> import requests
>>> login_data =  {'formPosted':'1', 'login_email':'me@example.com', 'password':'pw'}
>>> r = requests.post('https://localhost/login.py', login_data)
>>> 
>>> r.text
u'You are being redirected <a href="profilePage?_ck=1349394964">here</a>'
>>> r.cookies
{'session_id_myapp': '127-0-0-1-825ff22a-6ed1-453b-aebc-5d3cf2987065'}
>>> 
>>> r2 = requests.get('https://localhost/profile_data.json', ...)

I am using the requests module (version 0.10.0 with Python 2.5). I have figured out how to submit data to a login form on a website and retrieve the session key, but I can’t see an obvious way to use this session key in subsequent requests. Can someone fill in the ellipsis in the code below or suggest another approach?

>>> import requests
>>> login_data =  {'formPosted':'1', 'login_email':'me@example.com', 'password':'pw'}
>>> r = requests.post('https://localhost/login.py', login_data)
>>> 
>>> r.text
u'You are being redirected <a href="profilePage?_ck=1349394964">here</a>'
>>> r.cookies
{'session_id_myapp': '127-0-0-1-825ff22a-6ed1-453b-aebc-5d3cf2987065'}
>>> 
>>> r2 = requests.get('https://localhost/profile_data.json', ...)

回答 0

您可以使用以下方法轻松创建持久会话:

s = requests.Session()

之后,继续执行您的请求,如下所示:

s.post('https://localhost/login.py', login_data)
#logged in! cookies saved for future requests.
r2 = s.get('https://localhost/profile_data.json', ...)
#cookies sent automatically!
#do whatever, s will keep your cookies intact :)

有关会话的更多信息:https : //requests.kennethreitz.org/en/master/user/advanced/#session-objects

You can easily create a persistent session using:

s = requests.Session()

After that, continue with your requests as you would:

s.post('https://localhost/login.py', login_data)
#logged in! cookies saved for future requests.
r2 = s.get('https://localhost/profile_data.json', ...)
#cookies sent automatically!
#do whatever, s will keep your cookies intact :)

For more about sessions: https://requests.kennethreitz.org/en/master/user/advanced/#session-objects


回答 1

其他答案有助于了解如何维护此类会话。另外,我想提供一个类,该类可以使会话在脚本的不同运行(带有缓存文件)上维护。这意味着仅在需要时才执行正确的“登录”(超时或缓存中不存在会话)。它还支持在随后的“ get”或“ post”调用中的代理设置。

已通过Python3测试。

使用它作为您自己的代码的基础。以下代码段随GPL v3一起发布

import pickle
import datetime
import os
from urllib.parse import urlparse
import requests    

class MyLoginSession:
    """
    a class which handles and saves login sessions. It also keeps track of proxy settings.
    It does also maintine a cache-file for restoring session data from earlier
    script executions.
    """
    def __init__(self,
                 loginUrl,
                 loginData,
                 loginTestUrl,
                 loginTestString,
                 sessionFileAppendix = '_session.dat',
                 maxSessionTimeSeconds = 30 * 60,
                 proxies = None,
                 userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1',
                 debug = True,
                 forceLogin = False,
                 **kwargs):
        """
        save some information needed to login the session

        you'll have to provide 'loginTestString' which will be looked for in the
        responses html to make sure, you've properly been logged in

        'proxies' is of format { 'https' : 'https://user:pass@server:port', 'http' : ...
        'loginData' will be sent as post data (dictionary of id : value).
        'maxSessionTimeSeconds' will be used to determine when to re-login.
        """
        urlData = urlparse(loginUrl)

        self.proxies = proxies
        self.loginData = loginData
        self.loginUrl = loginUrl
        self.loginTestUrl = loginTestUrl
        self.maxSessionTime = maxSessionTimeSeconds
        self.sessionFile = urlData.netloc + sessionFileAppendix
        self.userAgent = userAgent
        self.loginTestString = loginTestString
        self.debug = debug

        self.login(forceLogin, **kwargs)

    def modification_date(self, filename):
        """
        return last file modification date as datetime object
        """
        t = os.path.getmtime(filename)
        return datetime.datetime.fromtimestamp(t)

    def login(self, forceLogin = False, **kwargs):
        """
        login to a session. Try to read last saved session from cache file. If this fails
        do proper login. If the last cache access was too old, also perform a proper login.
        Always updates session cache file.
        """
        wasReadFromCache = False
        if self.debug:
            print('loading or generating session...')
        if os.path.exists(self.sessionFile) and not forceLogin:
            time = self.modification_date(self.sessionFile)         

            # only load if file less than 30 minutes old
            lastModification = (datetime.datetime.now() - time).seconds
            if lastModification < self.maxSessionTime:
                with open(self.sessionFile, "rb") as f:
                    self.session = pickle.load(f)
                    wasReadFromCache = True
                    if self.debug:
                        print("loaded session from cache (last access %ds ago) "
                              % lastModification)
        if not wasReadFromCache:
            self.session = requests.Session()
            self.session.headers.update({'user-agent' : self.userAgent})
            res = self.session.post(self.loginUrl, data = self.loginData, 
                                    proxies = self.proxies, **kwargs)

            if self.debug:
                print('created new session with login' )
            self.saveSessionToCache()

        # test login
        res = self.session.get(self.loginTestUrl)
        if res.text.lower().find(self.loginTestString.lower()) < 0:
            raise Exception("could not log into provided site '%s'"
                            " (did not find successful login string)"
                            % self.loginUrl)

    def saveSessionToCache(self):
        """
        save session to a cache file
        """
        # always save (to update timeout)
        with open(self.sessionFile, "wb") as f:
            pickle.dump(self.session, f)
            if self.debug:
                print('updated session cache-file %s' % self.sessionFile)

    def retrieveContent(self, url, method = "get", postData = None, **kwargs):
        """
        return the content of the url with respect to the session.

        If 'method' is not 'get', the url will be called with 'postData'
        as a post request.
        """
        if method == 'get':
            res = self.session.get(url , proxies = self.proxies, **kwargs)
        else:
            res = self.session.post(url , data = postData, proxies = self.proxies, **kwargs)

        # the session has been updated on the server, so also update in cache
        self.saveSessionToCache()            

        return res

使用上述类的代码片段可能如下所示:

if __name__ == "__main__":
    # proxies = {'https' : 'https://user:pass@server:port',
    #           'http' : 'http://user:pass@server:port'}

    loginData = {'user' : 'usr',
                 'password' :  'pwd'}

    loginUrl = 'https://...'
    loginTestUrl = 'https://...'
    successStr = 'Hello Tom'
    s = MyLoginSession(loginUrl, loginData, loginTestUrl, successStr, 
                       #proxies = proxies
                       )

    res = s.retrieveContent('https://....')
    print(res.text)

    # if, for instance, login via JSON values required try this:
    s = MyLoginSession(loginUrl, None, loginTestUrl, successStr, 
                       #proxies = proxies,
                       json = loginData)

the other answers help to understand how to maintain such a session. Additionally, I want to provide a class which keeps the session maintained over different runs of a script (with a cache file). This means a proper “login” is only performed when required (timout or no session exists in cache). Also it supports proxy settings over subsequent calls to ‘get’ or ‘post’.

It is tested with Python3.

Use it as a basis for your own code. The following snippets are release with GPL v3

import pickle
import datetime
import os
from urllib.parse import urlparse
import requests    

class MyLoginSession:
    """
    a class which handles and saves login sessions. It also keeps track of proxy settings.
    It does also maintine a cache-file for restoring session data from earlier
    script executions.
    """
    def __init__(self,
                 loginUrl,
                 loginData,
                 loginTestUrl,
                 loginTestString,
                 sessionFileAppendix = '_session.dat',
                 maxSessionTimeSeconds = 30 * 60,
                 proxies = None,
                 userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1',
                 debug = True,
                 forceLogin = False,
                 **kwargs):
        """
        save some information needed to login the session

        you'll have to provide 'loginTestString' which will be looked for in the
        responses html to make sure, you've properly been logged in

        'proxies' is of format { 'https' : 'https://user:pass@server:port', 'http' : ...
        'loginData' will be sent as post data (dictionary of id : value).
        'maxSessionTimeSeconds' will be used to determine when to re-login.
        """
        urlData = urlparse(loginUrl)

        self.proxies = proxies
        self.loginData = loginData
        self.loginUrl = loginUrl
        self.loginTestUrl = loginTestUrl
        self.maxSessionTime = maxSessionTimeSeconds
        self.sessionFile = urlData.netloc + sessionFileAppendix
        self.userAgent = userAgent
        self.loginTestString = loginTestString
        self.debug = debug

        self.login(forceLogin, **kwargs)

    def modification_date(self, filename):
        """
        return last file modification date as datetime object
        """
        t = os.path.getmtime(filename)
        return datetime.datetime.fromtimestamp(t)

    def login(self, forceLogin = False, **kwargs):
        """
        login to a session. Try to read last saved session from cache file. If this fails
        do proper login. If the last cache access was too old, also perform a proper login.
        Always updates session cache file.
        """
        wasReadFromCache = False
        if self.debug:
            print('loading or generating session...')
        if os.path.exists(self.sessionFile) and not forceLogin:
            time = self.modification_date(self.sessionFile)         

            # only load if file less than 30 minutes old
            lastModification = (datetime.datetime.now() - time).seconds
            if lastModification < self.maxSessionTime:
                with open(self.sessionFile, "rb") as f:
                    self.session = pickle.load(f)
                    wasReadFromCache = True
                    if self.debug:
                        print("loaded session from cache (last access %ds ago) "
                              % lastModification)
        if not wasReadFromCache:
            self.session = requests.Session()
            self.session.headers.update({'user-agent' : self.userAgent})
            res = self.session.post(self.loginUrl, data = self.loginData, 
                                    proxies = self.proxies, **kwargs)

            if self.debug:
                print('created new session with login' )
            self.saveSessionToCache()

        # test login
        res = self.session.get(self.loginTestUrl)
        if res.text.lower().find(self.loginTestString.lower()) < 0:
            raise Exception("could not log into provided site '%s'"
                            " (did not find successful login string)"
                            % self.loginUrl)

    def saveSessionToCache(self):
        """
        save session to a cache file
        """
        # always save (to update timeout)
        with open(self.sessionFile, "wb") as f:
            pickle.dump(self.session, f)
            if self.debug:
                print('updated session cache-file %s' % self.sessionFile)

    def retrieveContent(self, url, method = "get", postData = None, **kwargs):
        """
        return the content of the url with respect to the session.

        If 'method' is not 'get', the url will be called with 'postData'
        as a post request.
        """
        if method == 'get':
            res = self.session.get(url , proxies = self.proxies, **kwargs)
        else:
            res = self.session.post(url , data = postData, proxies = self.proxies, **kwargs)

        # the session has been updated on the server, so also update in cache
        self.saveSessionToCache()            

        return res

A code snippet for using the above class may look like this:

if __name__ == "__main__":
    # proxies = {'https' : 'https://user:pass@server:port',
    #           'http' : 'http://user:pass@server:port'}

    loginData = {'user' : 'usr',
                 'password' :  'pwd'}

    loginUrl = 'https://...'
    loginTestUrl = 'https://...'
    successStr = 'Hello Tom'
    s = MyLoginSession(loginUrl, loginData, loginTestUrl, successStr, 
                       #proxies = proxies
                       )

    res = s.retrieveContent('https://....')
    print(res.text)

    # if, for instance, login via JSON values required try this:
    s = MyLoginSession(loginUrl, None, loginTestUrl, successStr, 
                       #proxies = proxies,
                       json = loginData)

回答 2

在这个类似的问题中查看我的答案:

python:urllib2如何使用urlopen请求发送cookie

import urllib2
import urllib
from cookielib import CookieJar

cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# input-type values from the html form
formdata = { "username" : username, "password": password, "form-id" : "1234" }
data_encoded = urllib.urlencode(formdata)
response = opener.open("https://page.com/login.php", data_encoded)
content = response.read()

编辑:

我看到我的回答有些不满意,但没有解释性的评论。我猜是因为我指的是urllib库而不是requests。我这样做是因为OP寻求帮助requests或有人建议其他方法。

Check out my answer in this similar question:

python: urllib2 how to send cookie with urlopen request

import urllib2
import urllib
from cookielib import CookieJar

cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# input-type values from the html form
formdata = { "username" : username, "password": password, "form-id" : "1234" }
data_encoded = urllib.urlencode(formdata)
response = opener.open("https://page.com/login.php", data_encoded)
content = response.read()

EDIT:

I see I’ve gotten a few downvotes for my answer, but no explaining comments. I’m guessing it’s because I’m referring to the urllib libraries instead of requests. I do that because the OP asks for help with requests or for someone to suggest another approach.


回答 3

文档说这get是可选的cookies参数,允许您指定要使用的cookie:

从文档:

>>> url = 'http://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')

>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{"cookies": {"cookies_are": "working"}}'

http://docs.python-requests.org/zh-CN/latest/user/quickstart/#cookies

The documentation says that get takes in an optional cookies argument allowing you to specify cookies to use:

from the docs:

>>> url = 'http://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')

>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{"cookies": {"cookies_are": "working"}}'

http://docs.python-requests.org/en/latest/user/quickstart/#cookies


回答 4

尝试了以上所有答案后,我发现使用“ RequestsCookieJar”代替常规的CookieJar处理后续请求可以解决我的问题。

import requests
import json

# The Login URL
authUrl = 'https://whatever.com/login'

# The subsequent URL
testUrl = 'https://whatever.com/someEndpoint'

# Logout URL
testlogoutUrl = 'https://whatever.com/logout'

# Whatever you are posting
login_data =  {'formPosted':'1', 
               'login_email':'me@example.com', 
               'password':'pw'
               }

# The Authentication token or any other data that we will receive from the Authentication Request. 
token = ''

# Post the login Request
loginRequest = requests.post(authUrl, login_data)
print("{}".format(loginRequest.text))

# Save the request content to your variable. In this case I needed a field called token. 
token = str(json.loads(loginRequest.content)['token'])  # or ['access_token']
print("{}".format(token))

# Verify Successful login
print("{}".format(loginRequest.status_code))

# Create your Requests Cookie Jar for your subsequent requests and add the cookie
jar = requests.cookies.RequestsCookieJar()
jar.set('LWSSO_COOKIE_KEY', token)

# Execute your next request(s) with the Request Cookie Jar set
r = requests.get(testUrl, cookies=jar)
print("R.TEXT: {}".format(r.text))
print("R.STCD: {}".format(r.status_code))

# Execute your logout request(s) with the Request Cookie Jar set
r = requests.delete(testlogoutUrl, cookies=jar)
print("R.TEXT: {}".format(r.text))  # should show "Request Not Authorized"
print("R.STCD: {}".format(r.status_code))  # should show 401

Upon trying all the answers above, I found that using “RequestsCookieJar” instead of the regular CookieJar for subsequent requests fixed my problem.

import requests
import json

# The Login URL
authUrl = 'https://whatever.com/login'

# The subsequent URL
testUrl = 'https://whatever.com/someEndpoint'

# Logout URL
testlogoutUrl = 'https://whatever.com/logout'

# Whatever you are posting
login_data =  {'formPosted':'1', 
               'login_email':'me@example.com', 
               'password':'pw'
               }

# The Authentication token or any other data that we will receive from the Authentication Request. 
token = ''

# Post the login Request
loginRequest = requests.post(authUrl, login_data)
print("{}".format(loginRequest.text))

# Save the request content to your variable. In this case I needed a field called token. 
token = str(json.loads(loginRequest.content)['token'])  # or ['access_token']
print("{}".format(token))

# Verify Successful login
print("{}".format(loginRequest.status_code))

# Create your Requests Cookie Jar for your subsequent requests and add the cookie
jar = requests.cookies.RequestsCookieJar()
jar.set('LWSSO_COOKIE_KEY', token)

# Execute your next request(s) with the Request Cookie Jar set
r = requests.get(testUrl, cookies=jar)
print("R.TEXT: {}".format(r.text))
print("R.STCD: {}".format(r.status_code))

# Execute your logout request(s) with the Request Cookie Jar set
r = requests.delete(testlogoutUrl, cookies=jar)
print("R.TEXT: {}".format(r.text))  # should show "Request Not Authorized"
print("R.STCD: {}".format(r.status_code))  # should show 401

回答 5

片段以获取受密码保护的json数据

import requests

username = "my_user_name"
password = "my_super_secret"
url = "https://www.my_base_url.com"
the_page_i_want = "/my_json_data_page"

session = requests.Session()
# retrieve cookie value
resp = session.get(url+'/login')
csrf_token = resp.cookies['csrftoken']
# login, add referer
resp = session.post(url+"/login",
                  data={
                      'username': username,
                      'password': password,
                      'csrfmiddlewaretoken': csrf_token,
                      'next': the_page_i_want,
                  },
                  headers=dict(Referer=url+"/login"))
print(resp.json())

snippet to retrieve json data, password protected

import requests

username = "my_user_name"
password = "my_super_secret"
url = "https://www.my_base_url.com"
the_page_i_want = "/my_json_data_page"

session = requests.Session()
# retrieve cookie value
resp = session.get(url+'/login')
csrf_token = resp.cookies['csrftoken']
# login, add referer
resp = session.post(url+"/login",
                  data={
                      'username': username,
                      'password': password,
                      'csrfmiddlewaretoken': csrf_token,
                      'next': the_page_i_want,
                  },
                  headers=dict(Referer=url+"/login"))
print(resp.json())

回答 6

仅保存所需的cookie并重复使用。

import os
import pickle
from urllib.parse import urljoin, urlparse

login = 'my@email.com'
password = 'secret'
# Assuming two cookies are used for persistent login.
# (Find it by tracing the login process)
persistentCookieNames = ['sessionId', 'profileId']
URL = 'http://example.com'
urlData = urlparse(URL)
cookieFile = urlData.netloc + '.cookie'
signinUrl = urljoin(URL, "/signin")
with requests.Session() as session:
    try:
        with open(cookieFile, 'rb') as f:
            print("Loading cookies...")
            session.cookies.update(pickle.load(f))
    except Exception:
        # If could not load cookies from file, get the new ones by login in
        print("Login in...")
        post = session.post(
            signinUrl,
            data={
                'email': login,
                'password': password,
            }
        )
        try:
            with open(cookieFile, 'wb') as f:
                jar = requests.cookies.RequestsCookieJar()
                for cookie in session.cookies:
                    if cookie.name in persistentCookieNames:
                        jar.set_cookie(cookie)
                pickle.dump(jar, f)
        except Exception as e:
            os.remove(cookieFile)
            raise(e)
    MyPage = urljoin(URL, "/mypage")
    page = session.get(MyPage)

Save only required cookies and reuse them.

import os
import pickle
from urllib.parse import urljoin, urlparse

login = 'my@email.com'
password = 'secret'
# Assuming two cookies are used for persistent login.
# (Find it by tracing the login process)
persistentCookieNames = ['sessionId', 'profileId']
URL = 'http://example.com'
urlData = urlparse(URL)
cookieFile = urlData.netloc + '.cookie'
signinUrl = urljoin(URL, "/signin")
with requests.Session() as session:
    try:
        with open(cookieFile, 'rb') as f:
            print("Loading cookies...")
            session.cookies.update(pickle.load(f))
    except Exception:
        # If could not load cookies from file, get the new ones by login in
        print("Login in...")
        post = session.post(
            signinUrl,
            data={
                'email': login,
                'password': password,
            }
        )
        try:
            with open(cookieFile, 'wb') as f:
                jar = requests.cookies.RequestsCookieJar()
                for cookie in session.cookies:
                    if cookie.name in persistentCookieNames:
                        jar.set_cookie(cookie)
                pickle.dump(jar, f)
        except Exception as e:
            os.remove(cookieFile)
            raise(e)
    MyPage = urljoin(URL, "/mypage")
    page = session.get(MyPage)

回答 7

这将在Python中为您工作;

# Call JIRA API with HTTPBasicAuth
import json
import requests
from requests.auth import HTTPBasicAuth

JIRA_EMAIL = "****"
JIRA_TOKEN = "****"
BASE_URL = "https://****.atlassian.net"
API_URL = "/rest/api/3/serverInfo"

API_URL = BASE_URL+API_URL

BASIC_AUTH = HTTPBasicAuth(JIRA_EMAIL, JIRA_TOKEN)
HEADERS = {'Content-Type' : 'application/json;charset=iso-8859-1'}

response = requests.get(
    API_URL,
    headers=HEADERS,
    auth=BASIC_AUTH
)

print(json.dumps(json.loads(response.text), sort_keys=True, indent=4, separators=(",", ": ")))

This will work for you in Python;

# Call JIRA API with HTTPBasicAuth
import json
import requests
from requests.auth import HTTPBasicAuth

JIRA_EMAIL = "****"
JIRA_TOKEN = "****"
BASE_URL = "https://****.atlassian.net"
API_URL = "/rest/api/3/serverInfo"

API_URL = BASE_URL+API_URL

BASIC_AUTH = HTTPBasicAuth(JIRA_EMAIL, JIRA_TOKEN)
HEADERS = {'Content-Type' : 'application/json;charset=iso-8859-1'}

response = requests.get(
    API_URL,
    headers=HEADERS,
    auth=BASIC_AUTH
)

print(json.dumps(json.loads(response.text), sort_keys=True, indent=4, separators=(",", ": ")))

python的site-packages目录是什么?

问题:python的site-packages目录是什么?

site-packages在与Python相关的各种文章中都提到了该目录。它是什么?如何使用它?

The directory site-packages is mentioned in various Python related articles. What is it? How to use it?


回答 0

site-packages是手动构建的Python软件包的目标目录。从源代码构建和安装Python软件包时(使用distutils,可能通过执行python setup.py install),site-packages默认情况下会找到已安装的模块。

有标准位置:

  • Unix(纯)1prefix/lib/pythonX.Y/site-packages
  • Unix(非纯): exec-prefix/lib/pythonX.Y/site-packages
  • 视窗: prefix\Lib\site-packages

1 Pure表示该模块仅使用Python代码。非纯也可以包含C / C ++代码。

site-packages是默认情况下是Python 搜索路径的一部分,因此之后可以轻松导入在那里安装的模块。


有用的阅读

site-packages is the target directory of manually built Python packages. When you build and install Python packages from source (using distutils, probably by executing python setup.py install), you will find the installed modules in site-packages by default.

There are standard locations:

  • Unix (pure)1: prefix/lib/pythonX.Y/site-packages
  • Unix (non-pure): exec-prefix/lib/pythonX.Y/site-packages
  • Windows: prefix\Lib\site-packages

1 Pure means that the module uses only Python code. Non-pure can contain C/C++ code as well.

site-packages is by default part of the Python search path, so modules installed there can be imported easily afterwards.


Useful reading


回答 1

当您将--useroption与pip一起使用时,该软件包将安装在用户的文件夹中,而不是全局文件夹中,并且您无需以管理员权限运行pip命令。

用户的packages文件夹的位置可以使用以下方法找到:

python -m site --user-site

这将打印如下内容:

C:\Users\%USERNAME%\AppData\Roaming\Python\Python35\site-packages

当您不对--userpip 使用option时,该软件包将安装在以下给定的全局文件夹中:

python -c "import site; print(site.getsitepackages())"

这将打印如下内容:

['C:\\Program Files\\Anaconda3', 'C:\\Program Files\\Anaconda3\\lib\\site-packages'

注意:以上打印值适用于在Windows 10上安装了默认值的Anaconda 4.x的情况。

When you use --user option with pip, the package gets installed in user’s folder instead of global folder and you won’t need to run pip command with admin privileges.

The location of user’s packages folder can be found using:

python -m site --user-site

This will print something like:

C:\Users\%USERNAME%\AppData\Roaming\Python\Python35\site-packages

When you don’t use --user option with pip, the package gets installed in global folder given by:

python -c "import site; print(site.getsitepackages())"

This will print something like:

['C:\\Program Files\\Anaconda3', 'C:\\Program Files\\Anaconda3\\lib\\site-packages'

Note: Above printed values are for On Windows 10 with Anaconda 4.x installed with defaults.


回答 2

site-packages只是Python安装其模块的位置。

无需“查找”,python知道自己在哪里可以找到它,该位置始终是PYTHONPATH(sys.path)的一部分。

您可以通过编程方式找到它:

import sys
site_packages = next(p for p in sys.path if 'site-packages' in p)
print site_packages

‘/Users/foo/.envs/env1/lib/python2.7/site-packages’

site-packages is just the location where Python installs its modules.

No need to “find it”, python knows where to find it by itself, this location is always part of the PYTHONPATH (sys.path).

Programmatically you can find it this way:

import sys
site_packages = next(p for p in sys.path if 'site-packages' in p)
print site_packages

‘/Users/foo/.envs/env1/lib/python2.7/site-packages’


替换字符串中字符的实例

问题:替换字符串中字符的实例

这个简单的代码仅尝试用冒号替换分号(在i指定的位置)不起作用:

for i in range(0,len(line)):
     if (line[i]==";" and i in rightindexarray):
         line[i]=":"

它给出了错误

line[i]=":"
TypeError: 'str' object does not support item assignment

如何解决此问题,以冒号代替分号?使用replace不起作用,因为该函数不使用索引-可能有一些我不想替换的分号。

在字符串中,我可能有许多分号,例如“ Hei der!; Hello there;!;”

我知道我想替换哪些(我在字符串中有索引)。使用替换无法正常工作,因为我无法对其使用索引。

This simple code that simply tries to replace semicolons (at i-specified postions) by colons does not work:

for i in range(0,len(line)):
     if (line[i]==";" and i in rightindexarray):
         line[i]=":"

It gives the error

line[i]=":"
TypeError: 'str' object does not support item assignment

How can I work around this to replace the semicolons by colons? Using replace does not work as that function takes no index- there might be some semicolons I do not want to replace.

Example

In the string I might have any number of semicolons, eg “Hei der! ; Hello there ;!;”

I know which ones I want to replace (I have their index in the string). Using replace does not work as I’m not able to use an index with it.


回答 0

python中的字符串是不可变的,因此您不能将它们视为列表并分配给索引。

使用.replace()来代替:

line = line.replace(';', ':')

如果您只需要替换某些分号,则需要更具体。您可以使用切片来分隔要替换的字符串部分:

line = line[:10].replace(';', ':') + line[10:]

这将替换字符串的前10个字符中的所有分号。

Strings in python are immutable, so you cannot treat them as a list and assign to indices.

Use .replace() instead:

line = line.replace(';', ':')

If you need to replace only certain semicolons, you’ll need to be more specific. You could use slicing to isolate the section of the string to replace in:

line = line[:10].replace(';', ':') + line[10:]

That’ll replace all semi-colons in the first 10 characters of the string.


回答 1

如果您不想使用以下字符,可以执行以下操作,以给定索引将任何字符替换为相应的字符: .replace()

word = 'python'
index = 4
char = 'i'

word = word[:index] + char + word[index + 1:]
print word

o/p: pythin

You can do the below, to replace any char with a respective char at a given index, if you wish not to use .replace()

word = 'python'
index = 4
char = 'i'

word = word[:index] + char + word[index + 1:]
print word

o/p: pythin

回答 2

把字符串变成一个列表;那么您可以单独更改字符。然后,您可以将其放回原处.join

s = 'a;b;c;d'
slist = list(s)
for i, c in enumerate(slist):
    if slist[i] == ';' and 0 <= i <= 3: # only replaces semicolons in the first part of the text
        slist[i] = ':'
s = ''.join(slist)
print s # prints a:b:c;d

Turn the string into a list; then you can change the characters individually. Then you can put it back together with .join:

s = 'a;b;c;d'
slist = list(s)
for i, c in enumerate(slist):
    if slist[i] == ';' and 0 <= i <= 3: # only replaces semicolons in the first part of the text
        slist[i] = ':'
s = ''.join(slist)
print s # prints a:b:c;d

回答 3

如果要替换单个分号:

for i in range(0,len(line)):
 if (line[i]==";"):
     line = line[:i] + ":" + line[i+1:]

Havent对此进行了测试。

If you want to replace a single semicolon:

for i in range(0,len(line)):
 if (line[i]==";"):
     line = line[:i] + ":" + line[i+1:]

Havent tested it though.


回答 4

这应该涵盖了更一般的情况,但是您应该能够针对自己的目的对其进行自定义

def selectiveReplace(myStr):
    answer = []
    for index,char in enumerate(myStr):
        if char == ';':
            if index%2 == 1: # replace ';' in even indices with ":"
                answer.append(":")
            else:
                answer.append("!") # replace ';' in odd indices with "!"
        else:
            answer.append(char)
    return ''.join(answer)

希望这可以帮助

This should cover a slightly more general case, but you should be able to customize it for your purpose

def selectiveReplace(myStr):
    answer = []
    for index,char in enumerate(myStr):
        if char == ';':
            if index%2 == 1: # replace ';' in even indices with ":"
                answer.append(":")
            else:
                answer.append("!") # replace ';' in odd indices with "!"
        else:
            answer.append(char)
    return ''.join(answer)

Hope this helps


回答 5

您不能简单地为字符串中的字符分配值。使用此方法替换特定字符的值:

name = "India"
result=name .replace("d",'*')

输出:In * ia

另外,如果要替换第一个字符以外的所有第一个字符,请说*,例如。字符串=混音输出= ba ** le

码:

name = "babble"
front= name [0:1]
fromSecondCharacter = name [1:]
back=fromSecondCharacter.replace(front,'*')
return front+back

You cannot simply assign value to a character in the string. Use this method to replace value of a particular character:

name = "India"
result=name .replace("d",'*')

Output: In*ia

Also, if you want to replace say * for all the occurrences of the first character except the first character, eg. string = babble output = ba**le

Code:

name = "babble"
front= name [0:1]
fromSecondCharacter = name [1:]
back=fromSecondCharacter.replace(front,'*')
return front+back

回答 6

如果要替换为变量“ n”中指定的索引值,请尝试以下操作:

def missing_char(str, n):
 str=str.replace(str[n],":")
 return str

If you are replacing by an index value specified in variable ‘n’, then try the below:

def missing_char(str, n):
 str=str.replace(str[n],":")
 return str

回答 7

这个怎么样:

sentence = 'After 1500 years of that thinking surpressed'

sentence = sentence.lower()

def removeLetter(text,char):

    result = ''
    for c in text:
        if c != char:
            result += c
    return text.replace(char,'*')
text = removeLetter(sentence,'a')

How about this:

sentence = 'After 1500 years of that thinking surpressed'

sentence = sentence.lower()

def removeLetter(text,char):

    result = ''
    for c in text:
        if c != char:
            result += c
    return text.replace(char,'*')
text = removeLetter(sentence,'a')

回答 8

为了在字符串上有效地使用.replace()方法而不创建单独的列表,例如查看包含有空格的字符串的列表用户名,我们希望在每个用户名字符串中用下划线替换空格。

usernames = ["Joey Tribbiani", "Monica Geller", "Chandler Bing", "Phoebe Buffay"]

要替换每个用户名中的空格,请考虑在python中使用range函数。

for i in range(len(usernames)):
    usernames[i] = usernames[i].lower().replace(" ", "_")

print(usernames)

to use the .replace() method effectively on string without creating a separate list for example take a look at the list username containing string with some white space, we want to replace the white space with an underscore in each of the username string.

usernames = ["Joey Tribbiani", "Monica Geller", "Chandler Bing", "Phoebe Buffay"]

to replace the white spaces in each username consider using the range function in python.

for i in range(len(usernames)):
    usernames[i] = usernames[i].lower().replace(" ", "_")

print(usernames)

回答 9

要替换特定索引处的字符,功能如下:

def replace_char(s , n , c):
    n-=1
    s = s[0:n] + s[n:n+1].replace(s[n] , c) + s[n+1:]
    return s

其中s是字符串,n是索引,c是字符。

To replace a character at a specific index, the function is as follows:

def replace_char(s , n , c):
    n-=1
    s = s[0:n] + s[n:n+1].replace(s[n] , c) + s[n+1:]
    return s

where s is a string, n is index and c is a character.


回答 10

我写了这种方法来替换字符或替换特定实例的字符串。实例从0开始(如果将可选的inst参数更改为1,并将test_instance变量更改为1,则可以轻松将其更改为1。

def replace_instance(some_word, str_to_replace, new_str='', inst=0):
    return_word = ''
    char_index, test_instance = 0, 0
    while char_index < len(some_word):
        test_str = some_word[char_index: char_index + len(str_to_replace)]
        if test_str == str_to_replace:
            if test_instance == inst:
                return_word = some_word[:char_index] + new_str + some_word[char_index + len(str_to_replace):]
                break
            else:
                test_instance += 1
        char_index += 1
    return return_word

I wrote this method to replace characters or replace strings at a specific instance. instances start at 0 (this can easily be changed to 1 if you change the optional inst argument to 1, and test_instance variable to 1.

def replace_instance(some_word, str_to_replace, new_str='', inst=0):
    return_word = ''
    char_index, test_instance = 0, 0
    while char_index < len(some_word):
        test_str = some_word[char_index: char_index + len(str_to_replace)]
        if test_str == str_to_replace:
            if test_instance == inst:
                return_word = some_word[:char_index] + new_str + some_word[char_index + len(str_to_replace):]
                break
            else:
                test_instance += 1
        char_index += 1
    return return_word

回答 11

你可以这样做:

string = "this; is a; sample; ; python code;!;" #your desire string
result = ""
for i in range(len(string)):
    s = string[i]
    if (s == ";" and i in [4, 18, 20]): #insert your desire list
        s = ":"
    result = result + s
print(result)

You can do this:

string = "this; is a; sample; ; python code;!;" #your desire string
result = ""
for i in range(len(string)):
    s = string[i]
    if (s == ";" and i in [4, 18, 20]): #insert your desire list
        s = ":"
    result = result + s
print(result)

回答 12

名称= [“ Joey Tribbiani”,“ Monica Geller”,“ Chandler Bing”,“ Phoebe Buffay”]

用户名= []

for i in names:
    if " " in i:
        i = i.replace(" ", "_")
    print(i)

o,p Joey_Tribbiani Monica_Geller Chandler_Bing Phoebe_Buffay

names = ["Joey Tribbiani", "Monica Geller", "Chandler Bing", "Phoebe Buffay"]

usernames = []

for i in names:
    if " " in i:
        i = i.replace(" ", "_")
    print(i)

Output: Joey_Tribbiani Monica_Geller Chandler_Bing Phoebe_Buffay