Python 实用宝典

Question 1

In Python, how big can a list get? I need a list of about 12000 elements. Will I still be able to run list methods such as sorting, etc?

Question 2

According to the source code, the maximum size of a list is PY_SSIZE_T_MAX/sizeof(PyObject*).

PY_SSIZE_T_MAX is defined in pyport.h to be ((size_t) -1)>>1

On a regular 32bit system, this is (4294967295 / 2) / 4 or 536870912.

Therefore the maximum size of a python list on a 32 bit system is 536,870,912 elements.

As long as the number of elements you have is equal or below this, all list functions should operate correctly.

Question 3

As the Python documentation says:

sys.maxsize

The largest positive integer supported by the platform’s Py_ssize_t type, and thus the maximum size lists, strings, dicts, and many other containers can have.

In my computer (Linux x86_64):

>>> import sys
>>> print sys.maxsize
9223372036854775807

Question 4

Sure it is OK. Actually you can see for yourself easily:

l = range(12000)
l = sorted(l, reverse=True)

Running the those lines on my machine took:

real    0m0.036s
user    0m0.024s
sys  0m0.004s

But sure as everyone else said. The larger the array the slower the operations will be.

Question 5

In casual code I’ve created lists with millions of elements. I believe that Python’s implementation of lists are only bound by the amount of memory on your system.

In addition, the list methods / functions should continue to work despite the size of the list.

If you care about performance, it might be worthwhile to look into a library such as NumPy.

Question 6

Performance characteristics for lists are described on Effbot.

Python lists are actually implemented as vector for fast random access, so the container will basically hold as many items as there is space for in memory. (You need space for pointers contained in the list as well as space in memory for the object(s) being pointed to.)

Appending is O(1) (amortized constant complexity), however, inserting into/deleting from the middle of the sequence will require an O(n) (linear complexity) reordering, which will get slower as the number of elements in your list.

Your sorting question is more nuanced, since the comparison operation can take an unbounded amount of time. If you’re performing really slow comparisons, it will take a long time, though it’s no fault of Python’s list data type.

Reversal just takes the amount of time it required to swap all the pointers in the list (necessarily O(n) (linear complexity), since you touch each pointer once).

Question 7

12000 elements is nothing in Python… and actually the number of elements can go as far as the Python interpreter has memory on your system.

Question 8

It varies for different systems (depends on RAM). The easiest way to find out is

import six six.MAXSIZE 9223372036854775807 This gives the max size of list and dict too ,as per the documentation

Question 9

I’d say you’re only limited by the total amount of RAM available. Obviously the larger the array the longer operations on it will take.

Question 10

I got this from here on a x64 bit system: Python 3.7.0b5 (v3.7.0b5:abb8802389, May 31 2018, 01:54:01) [MSC v.1913 64 bit (AMD64)] on win32

Question 11

There is no limitation of list number. The main reason which causes your error is the RAM. Please upgrade your memory size.

Question 12

I need to run a shell command asynchronously from a Python script. By this I mean that I want my Python script to continue running while the external command goes off and does whatever it needs to do.

I read this post:

Calling an external command in Python

I then went off and did some testing, and it looks like os.system() will do the job provided that I use & at the end of the command so that I don’t have to wait for it to return. What I am wondering is if this is the proper way to accomplish such a thing? I tried commands.call() but it will not work for me because it blocks on the external command.

Please let me know if using os.system() for this is advisable or if I should try some other route.

Question 13

subprocess.Popen does exactly what you want.

from subprocess import Popen
p = Popen(['watch', 'ls']) # something long running
# ... do other stuff while subprocess is running
p.terminate()

(Edit to complete the answer from comments)

The Popen instance can do various other things like you can poll() it to see if it is still running, and you can communicate() with it to send it data on stdin, and wait for it to terminate.

Question 14

If you want to run many processes in parallel and then handle them when they yield results, you can use polling like in the following:

from subprocess import Popen, PIPE
import time

running_procs = [
    Popen(['/usr/bin/my_cmd', '-i %s' % path], stdout=PIPE, stderr=PIPE)
    for path in '/tmp/file0 /tmp/file1 /tmp/file2'.split()]

while running_procs:
    for proc in running_procs:
        retcode = proc.poll()
        if retcode is not None: # Process finished.
            running_procs.remove(proc)
            break
        else: # No process is done, wait a bit and check again.
            time.sleep(.1)
            continue

    # Here, `proc` has finished with return code `retcode`
    if retcode != 0:
        """Error handling."""
    handle_results(proc.stdout)

The control flow there is a little bit convoluted because I’m trying to make it small — you can refactor to your taste. :-)

This has the advantage of servicing the early-finishing requests first. If you call communicate on the first running process and that turns out to run the longest, the other running processes will have been sitting there idle when you could have been handling their results.

Question 15

What I am wondering is if this [os.system()] is the proper way to accomplish such a thing?

No. os.system() is not the proper way. That’s why everyone says to use subprocess.

For more information, read http://docs.python.org/library/os.html#os.system

The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function. Use the subprocess module. Check especially the Replacing Older Functions with the subprocess Module section.

Question 16

I’ve had good success with the asyncproc module, which deals nicely with the output from the processes. For example:

import os
from asynproc import Process
myProc = Process("myprogram.app")

while True:
    # check to see if process has ended
    poll = myProc.wait(os.WNOHANG)
    if poll is not None:
        break
    # print any new output
    out = myProc.read()
    if out != "":
        print out

Question 17

Using pexpect with non-blocking readlines is another way to do this. Pexpect solves the deadlock problems, allows you to easily run the processes in the background, and gives easy ways to have callbacks when your process spits out predefined strings, and generally makes interacting with the process much easier.

Question 18

Considering “I don’t have to wait for it to return”, one of the easiest solutions will be this:

subprocess.Popen( \
    [path_to_executable, arg1, arg2, ... argN],
    creationflags = subprocess.CREATE_NEW_CONSOLE,
).pid

But… From what I read this is not “the proper way to accomplish such a thing” because of security risks created by subprocess.CREATE_NEW_CONSOLE flag.

The key things that happen here is use of subprocess.CREATE_NEW_CONSOLE to create new console and .pid (returns process ID so that you could check program later on if you want to) so that not to wait for program to finish its job.

Question 19

I have the same problem trying to connect to an 3270 terminal using the s3270 scripting software in Python. Now I’m solving the problem with an subclass of Process that I found here:

http://code.activestate.com/recipes/440554/

And here is the sample taken from file:

def recv_some(p, t=.1, e=1, tr=5, stderr=0):
    if tr < 1:
        tr = 1
    x = time.time()+t
    y = []
    r = ''
    pr = p.recv
    if stderr:
        pr = p.recv_err
    while time.time() < x or r:
        r = pr()
        if r is None:
            if e:
                raise Exception(message)
            else:
                break
        elif r:
            y.append(r)
        else:
            time.sleep(max((x-time.time())/tr, 0))
    return ''.join(y)

def send_all(p, data):
    while len(data):
        sent = p.send(data)
        if sent is None:
            raise Exception(message)
        data = buffer(data, sent)

if __name__ == '__main__':
    if sys.platform == 'win32':
        shell, commands, tail = ('cmd', ('dir /w', 'echo HELLO WORLD'), '\r\n')
    else:
        shell, commands, tail = ('sh', ('ls', 'echo HELLO WORLD'), '\n')

    a = Popen(shell, stdin=PIPE, stdout=PIPE)
    print recv_some(a),
    for cmd in commands:
        send_all(a, cmd + tail)
        print recv_some(a),
    send_all(a, 'exit' + tail)
    print recv_some(a, e=0)
    a.wait()

Question 20

The accepted answer is very old.

I found a better modern answer here:

https://kevinmccarthy.org/2016/07/25/streaming-subprocess-stdin-and-stdout-with-asyncio-in-python/

and made some changes:

make it work on windows
make it work with multiple commands

import sys
import asyncio

if sys.platform == "win32":
    asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())


async def _read_stream(stream, cb):
    while True:
        line = await stream.readline()
        if line:
            cb(line)
        else:
            break


async def _stream_subprocess(cmd, stdout_cb, stderr_cb):
    try:
        process = await asyncio.create_subprocess_exec(
            *cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
        )

        await asyncio.wait(
            [
                _read_stream(process.stdout, stdout_cb),
                _read_stream(process.stderr, stderr_cb),
            ]
        )
        rc = await process.wait()
        return process.pid, rc
    except OSError as e:
        # the program will hang if we let any exception propagate
        return e


def execute(*aws):
    """ run the given coroutines in an asyncio loop
    returns a list containing the values returned from each coroutine.
    """
    loop = asyncio.get_event_loop()
    rc = loop.run_until_complete(asyncio.gather(*aws))
    loop.close()
    return rc


def printer(label):
    def pr(*args, **kw):
        print(label, *args, **kw)

    return pr


def name_it(start=0, template="s{}"):
    """a simple generator for task names
    """
    while True:
        yield template.format(start)
        start += 1


def runners(cmds):
    """
    cmds is a list of commands to excecute as subprocesses
    each item is a list appropriate for use by subprocess.call
    """
    next_name = name_it().__next__
    for cmd in cmds:
        name = next_name()
        out = printer(f"{name}.stdout")
        err = printer(f"{name}.stderr")
        yield _stream_subprocess(cmd, out, err)


if __name__ == "__main__":
    cmds = (
        [
            "sh",
            "-c",
            """echo "$SHELL"-stdout && sleep 1 && echo stderr 1>&2 && sleep 1 && echo done""",
        ],
        [
            "bash",
            "-c",
            "echo 'hello, Dave.' && sleep 1 && echo dave_err 1>&2 && sleep 1 && echo done",
        ],
        [sys.executable, "-c", 'print("hello from python");import sys;sys.exit(2)'],
    )

    print(execute(*runners(cmds)))

It is unlikely that the example commands will work perfectly on your system, and it doesn’t handle weird errors, but this code does demonstrate one way to run multiple subprocesses using asyncio and stream the output.

Question 21

There are several answers here but none of them satisfied my below requirements:

I don’t want to wait for command to finish or pollute my terminal with subprocess outputs.
I want to run bash script with redirects.
I want to support piping within my bash script (for example find ... | tar ...).

The only combination that satiesfies above requirements is:

subprocess.Popen(['./my_script.sh "arg1" > "redirect/path/to"'],
                 stdout=subprocess.PIPE, 
                 stderr=subprocess.PIPE,
                 shell=True)

Question 22

This is covered by Python 3 Subprocess Examples under “Wait for command to terminate asynchronously”:

import asyncio

proc = await asyncio.create_subprocess_exec(
    'ls','-lha',
    stdout=asyncio.subprocess.PIPE,
    stderr=asyncio.subprocess.PIPE)

# do something else while ls is working

# if proc takes very long to complete, the CPUs are free to use cycles for 
# other processes
stdout, stderr = await proc.communicate()

The process will start running as soon as the await asyncio.create_subprocess_exec(...) has completed. If it hasn’t finished by the time you call await proc.communicate(), it will wait there in order to give you your output status. If it has finished, proc.communicate() will return immediately.

The gist here is similar to Terrels answer but I think Terrels answer appears to overcomplicate things.

See asyncio.create_subprocess_exec for more information.

Question 23

Suppose I have a class with a constructor (or other function) that takes a variable number of arguments and then sets them as class attributes conditionally.

I could set them manually, but it seems that variable parameters are common enough in python that there should be a common idiom for doing this. But I’m not sure how to do this dynamically.

I have an example using eval, but that’s hardly safe. I want to know the proper way to do this — maybe with lambda?

class Foo:
    def setAllManually(self, a=None, b=None, c=None):
        if a!=None: 
            self.a = a
        if b!=None:
            self.b = b
        if c!=None:
            self.c = c
    def setAllWithEval(self, **kwargs):
        for key in **kwargs:
            if kwargs[param] != None
                eval("self." + key + "=" + kwargs[param])

Question 24

You could update the __dict__ attribute (which represents the instance attributes in the form of a dictionary) with the keyword arguments:

class Bar(object):
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)

then you can:

>>> bar = Bar(a=1, b=2)
>>> bar.a
1

and with something like:

allowed_keys = {'a', 'b', 'c'}
self.__dict__.update((k, v) for k, v in kwargs.items() if k in allowed_keys)

you could filter the keys beforehand (use iteritems instead of items if you’re still using Python 2.x).

Question 25

You can use the setattr() method:

class Foo:
  def setAllWithKwArgs(self, **kwargs):
    for key, value in kwargs.items():
      setattr(self, key, value)

There is an analogous getattr() method for retrieving attributes.

Question 26

Most answers here do not cover a good way to initialize all allowed attributes to just one default value. So, to add to the answers given by @fqxp and @mmj:

class Myclass:

    def __init__(self, **kwargs):
        # all those keys will be initialized as class attributes
        allowed_keys = set(['attr1','attr2','attr3'])
        # initialize all allowed keys to false
        self.__dict__.update((key, False) for key in allowed_keys)
        # and update the given keys by their given values
        self.__dict__.update((key, value) for key, value in kwargs.items() if key in allowed_keys)

Question 27

I propose a variation of fqxp’s answer, which, in addition to allowed attributes, lets you set default values for attributes:

class Foo():
    def __init__(self, **kwargs):
        # define default attributes
        default_attr = dict(a=0, b=None, c=True)
        # define (additional) allowed attributes with no default value
        more_allowed_attr = ['d','e','f']
        allowed_attr = list(default_attr.keys()) + more_allowed_attr
        default_attr.update(kwargs)
        self.__dict__.update((k,v) for k,v in default_attr.items() if k in allowed_attr)

This is Python 3.x code, for Python 2.x you need at least one adjustment, iteritems() in place of items().

VERY LATE FOLLOW UP

I recently rewrote the above code as a class decorator, so that hard coding of attributes is reduced to a minimum. In some way it resembles the @dataclass decorator, which is what you might want to use instead.

# class decorator definition
def classattributes(default_attr,more_allowed_attr):
    def class_decorator(cls):
        def new_init(self,*args,**kwargs):
            allowed_attr = list(default_attr.keys()) + more_allowed_attr
            default_attr.update(kwargs)
            self.__dict__.update((k,v) for k,v in default_attr.items() if k in allowed_attr)
        cls.__init__ = new_init
        return cls
    return class_decorator

# usage:
# 1st arg is a dict of attributes with default values
# 2nd arg is a list of additional allowed attributes which may be instantiated or not
@classattributes( dict(a=0, b=None, c=True) , ['d','e','f'] )
class Foo():
    pass # add here class body except __init__

@classattributes( dict(g=0, h=None, j=True) , ['k','m','n'] )
class Bar():
    pass # add here class body except __init__

obj1 = Foo(d=999,c=False)
obj2 = Bar(h=-999,k="Hello")

obj1.__dict__ # {'a': 0, 'b': None, 'c': False, 'd': 999}
obj2.__dict__ # {'g': 0, 'h': -999, 'j': True, 'k': 'Hello'}

Question 28

Yet another variant based on the excellent answers by mmj and fqxp. What if we want to

Avoid hardcoding a list of allowed attributes
Directly and explicitly set default values for each attributes in the constructor
Restrict kwargs to predefined attributes by either
- silently rejecting invalid arguments or, alternatively,
- raising an error.

By “directly”, I mean avoiding an extraneous default_attributes dictionary.

class Bar(object):
    def __init__(self, **kwargs):

        # Predefine attributes with default values
        self.a = 0
        self.b = 0
        self.A = True
        self.B = True

        # get a list of all predefined values directly from __dict__
        allowed_keys = list(self.__dict__.keys())

        # Update __dict__ but only for keys that have been predefined 
        # (silently ignore others)
        self.__dict__.update((key, value) for key, value in kwargs.items() 
                             if key in allowed_keys)

        # To NOT silently ignore rejected keys
        rejected_keys = set(kwargs.keys()) - set(allowed_keys)
        if rejected_keys:
            raise ValueError("Invalid arguments in constructor:{}".format(rejected_keys))

Not a major breakthrough, but maybe useful to someone…

EDIT: If our class uses @property decorators to encapsulate “protected” attributes with getters and setters, and if we want to be able to set these properties with our constructor, we may want to expand the allowed_keys list with values from dir(self), as follows:

allowed_keys = [i for i in dir(self) if "__" not in i and any([j.endswith(i) for j in self.__dict__.keys()])]

The above code excludes

any hidden variable from dir() (exclusion based on presence of “__”), and
any method from dir() whose name is not found in the end of an attribute name (protected or otherwise) from __dict__.keys(), thereby likely keeping only @property decorated methods.

This edit is likely only valid for Python 3 and above.

Question 29

class SymbolDict(object):
  def __init__(self, **kwargs):
    for key in kwargs:
      setattr(self, key, kwargs[key])

x = SymbolDict(foo=1, bar='3')
assert x.foo == 1

I called the class SymbolDict because it essentially is a dictionary that operates using symbols instead of strings. In other words, you do x.foo instead of x['foo'] but under the covers it’s really the same thing going on.

Question 30

The following solutions vars(self).update(kwargs) or self.__dict__.update(**kwargs) are not robust, because the user can enter any dictionary with no error messages. If I need to check that the user insert the following signature (‘a1’, ‘a2’, ‘a3’, ‘a4’, ‘a5’) the solution does not work. Moreover, the user should be able to use the object by passing the “positional parameters” or the “kay-value pairs parameters”.

So I suggest the following solution by using a metaclass.

from inspect import Parameter, Signature

class StructMeta(type):
    def __new__(cls, name, bases, dict):
        clsobj = super().__new__(cls, name, bases, dict)
        sig = cls.make_signature(clsobj._fields)
        setattr(clsobj, '__signature__', sig)
        return clsobj

def make_signature(names):
    return Signature(
        Parameter(v, Parameter.POSITIONAL_OR_KEYWORD) for v in names
    )

class Structure(metaclass = StructMeta):
    _fields = []
    def __init__(self, *args, **kwargs):
        bond = self.__signature__.bind(*args, **kwargs)
        for name, val in bond.arguments.items():
            setattr(self, name, val)

if __name__ == 'main':

   class A(Structure):
      _fields = ['a1', 'a2']

   if __name__ == '__main__':
      a = A(a1 = 1, a2 = 2)
      print(vars(a))

      a = A(**{a1: 1, a2: 2})
      print(vars(a))

Question 31

Their might be a better solution but what comes to mind for me is:

class Test:
    def __init__(self, *args, **kwargs):
        self.args=dict(**kwargs)

    def getkwargs(self):
        print(self.args)

t=Test(a=1, b=2, c="cats")
t.getkwargs()


python Test.py 
{'a': 1, 'c': 'cats', 'b': 2}

Question 32

this one is the easiest via larsks

class Foo:
    def setAllWithKwArgs(self, **kwargs):
        for key, value in kwargs.items():
            setattr(self, key, value)

my example:

class Foo:
    def __init__(self, **kwargs):
        for key, value in kwargs.items():
            setattr(self, key, value)

door = Foo(size='180x70', color='red chestnut', material='oak')
print(door.size) #180x70

Question 33

I suspect it might be better in most instances to use named args (for better self documenting code) so it might look something like this:

class Foo:
    def setAll(a=None, b=None, c=None):
        for key, value in (a, b, c):
            if (value != None):
                settattr(self, key, value)

Question 34

I want to cast data like [1,2,'a','He said "what do you mean?"'] to a CSV-formatted string.

Normally one would use csv.writer() for this, because it handles all the crazy edge cases (comma escaping, quote mark escaping, CSV dialects, etc.) The catch is that csv.writer() expects to output to a file object, not to a string.

My current solution is this somewhat hacky function:

def CSV_String_Writeline(data):
    class Dummy_Writer:
        def write(self,instring):
            self.outstring = instring.strip("\r\n")
    dw = Dummy_Writer()
    csv_w = csv.writer( dw )
    csv_w.writerow(data)
    return dw.outstring

Can anyone give a more elegant solution that still handles the edge cases well?

Edit: Here’s how I ended up doing it:

def csv2string(data):
    si = StringIO.StringIO()
    cw = csv.writer(si)
    cw.writerow(data)
    return si.getvalue().strip('\r\n')

Question 35

You could use StringIO instead of your own Dummy_Writer:

This module implements a file-like class, StringIO, that reads and writes a string buffer (also known as memory files).

There is also cStringIO, which is a faster version of the StringIO class.

Question 36

In Python 3:

>>> import io
>>> import csv
>>> output = io.StringIO()
>>> csvdata = [1,2,'a','He said "what do you mean?"',"Whoa!\nNewlines!"]
>>> writer = csv.writer(output, quoting=csv.QUOTE_NONNUMERIC)
>>> writer.writerow(csvdata)
59
>>> output.getvalue()
'1,2,"a","He said ""what do you mean?""","Whoa!\nNewlines!"\r\n'

Some details need to be changed a bit for Python 2:

>>> output = io.BytesIO()
>>> writer = csv.writer(output)
>>> writer.writerow(csvdata)
57L
>>> output.getvalue()
'1,2,a,"He said ""what do you mean?""","Whoa!\nNewlines!"\r\n'

Question 37

I found the answers, all in all, a bit confusing. For Python 2, this usage worked for me:

import csv, io

def csv2string(data):
    si = io.BytesIO()
    cw = csv.writer(si)
    cw.writerow(data)
    return si.getvalue().strip('\r\n')

data=[1,2,'a','He said "what do you mean?"']
print csv2string(data)

Question 38

since i use this quite a lot to stream results asynchronously from sanic back to the user as csv data i wrote the following snippet for Python 3.

The snippet lets you reuse the same StringIo buffer over and over again.


import csv
from io import StringIO


class ArgsToCsv:
    def __init__(self, seperator=","):
        self.seperator = seperator
        self.buffer = StringIO()
        self.writer = csv.writer(self.buffer)

    def stringify(self, *args):
        self.writer.writerow(args)
        value = self.buffer.getvalue().strip("\r\n")
        self.buffer.seek(0)
        self.buffer.truncate(0)
        return value + "\n"

example:

csv_formatter = ArgsToCsv()

output += csv_formatter.stringify(
    10,
    """
    lol i have some pretty
    "freaky"
    strings right here \' yo!
    """,
    [10, 20, 30],
)

Check out further usage at the github gist: source and test

Question 39

import csv
from StringIO import StringIO
with open('file.csv') as file:
    file = file.read()

stream = StringIO(file)

csv_file = csv.DictReader(stream)

Question 40

Here’s the version that works for utf-8. csvline2string for just one line, without linebreaks at the end, csv2string for many lines, with linebreaks:

import csv, io

def csvline2string(one_line_of_data):
    si = BytesIO.StringIO()
    cw = csv.writer(si)
    cw.writerow(one_line_of_data)
    return si.getvalue().strip('\r\n')

def csv2string(data):
    si = BytesIO.StringIO()
    cw = csv.writer(si)
    for one_line_of_data in data:
        cw.writerow(one_line_of_data)
    return si.getvalue()

Question 41

I want to create a file from within a python script that is executable.

import os
import stat
os.chmod('somefile', stat.S_IEXEC)

it appears os.chmod doesn’t ‘add’ permissions the way unix chmod does. With the last line commented out, the file has the filemode -rw-r--r--, with it not commented out, the file mode is ---x------. How can I just add the u+x flag while keeping the rest of the modes intact?

Question 42

Use os.stat() to get the current permissions, use | to or the bits together, and use os.chmod() to set the updated permissions.

Example:

import os
import stat

st = os.stat('somefile')
os.chmod('somefile', st.st_mode | stat.S_IEXEC)

Question 43

For tools that generate executable files (e.g. scripts), the following code might be helpful:

def make_executable(path):
    mode = os.stat(path).st_mode
    mode |= (mode & 0o444) >> 2    # copy R bits to X
    os.chmod(path, mode)

This makes it (more or less) respect the umask that was in effect when the file was created: Executable is only set for those that can read.

Usage:

path = 'foo.sh'
with open(path, 'w') as f:           # umask in effect when file is created
    f.write('#!/bin/sh\n')
    f.write('echo "hello world"\n')

make_executable(path)

Question 44

If you know the permissions you want then the following example may be the way to keep it simple.

Python 2:

os.chmod("/somedir/somefile", 0775)

Python 3:

os.chmod("/somedir/somefile", 0o775)

Compatible with either (octal conversion):

os.chmod("/somedir/somefile", 509)

reference permissions examples

Question 45

You can also do this

>>> import os
>>> st = os.stat("hello.txt")

Current listing of file

$ ls -l hello.txt
-rw-r--r--  1 morrison  staff  17 Jan 13  2014 hello.txt

Now do this.

>>> os.chmod("hello.txt", st.st_mode | 0o111)

and you will see this in the terminal.

ls -l hello.txt    
-rwxr-xr-x  1 morrison  staff  17 Jan 13  2014 hello.txt

You can bitwise or with 0o111 to make all executable, 0o222 to make all writable, and 0o444 to make all readable.

Question 46

Respect umask like chmod +x

man chmod says that if augo is not given as in:

chmod +x mypath

then a is used but with umask:

A combination of the letters ugoa controls which users’ access to the file will be changed: the user who owns it (u), other users in the file’s group (g), other users not in the file’s group (o), or all users (a). If none of these are given, the effect is as if (a) were given, but bits that are set in the umask are not affected.

Here is a version that simulates that behavior exactly:

#!/usr/bin/env python3

import os
import stat

def get_umask():
    umask = os.umask(0)
    os.umask(umask)
    return umask

def chmod_plus_x(path):
    os.chmod(
        path,
        os.stat(path).st_mode |
        (
            (
                stat.S_IXUSR |
                stat.S_IXGRP |
                stat.S_IXOTH
            )
            & ~get_umask()
        )
    )

chmod_plus_x('.gitignore')

See also: How can I get the default file permissions in Python?

Tested in Ubuntu 16.04, Python 3.5.2.

Question 47

In python3:

import os
os.chmod("somefile", 0o664)

Remember to add the 0o prefix since permissions are set as an octal integer, and Python automatically treats any integer with a leading zero as octal. Otherwise, you are passing os.chmod("somefile", 1230) indeed, which is octal of 664.

Question 48

If you’re using Python 3.4+, you can use the standard library’s convenient pathlib.

Its Path class has built-in chmod and stat methods.

from pathlib import Path
import stat


f = Path("/path/to/file.txt")
f.chmod(f.stat().st_mode | stat.S_IEXEC)

Question 49

Recently I noticed that when I am converting a list to set the order of elements is changed and is sorted by character.

Consider this example:

x=[1,2,20,6,210]
print x 
# [1, 2, 20, 6, 210] # the order is same as initial order

set(x)
# set([1, 2, 20, 210, 6]) # in the set(x) output order is sorted

My questions are –

Why is this happening?
How can I do set operations (especially Set Difference) without losing the initial order?

Question 50

A set is an unordered data structure, so it does not preserve the insertion order.
This depends on your requirements. If you have an normal list, and want to remove some set of elements while preserving the order of the list, you can do this with a list comprehension:
```
>>> a = [1, 2, 20, 6, 210]
>>> b = set([6, 20, 1])
>>> [x for x in a if x not in b]
[2, 210]
```
If you need a data structure that supports both fast membership tests and preservation of insertion order, you can use the keys of a Python dictionary, which starting from Python 3.7 is guaranteed to preserve the insertion order:
```
>>> a = dict.fromkeys([1, 2, 20, 6, 210])
>>> b = dict.fromkeys([6, 20, 1])
>>> dict.fromkeys(x for x in a if x not in b)
{2: None, 210: None}
```
b doesn’t really need to be ordered here – you could use a set as well. Note that a.keys() - b.keys() returns the set difference as a set, so it won’t preserve the insertion order.

In older versions of Python, you can use collections.OrderedDict instead:
```
>>> a = collections.OrderedDict.fromkeys([1, 2, 20, 6, 210])
>>> b = collections.OrderedDict.fromkeys([6, 20, 1])
>>> collections.OrderedDict.fromkeys(x for x in a if x not in b)
OrderedDict([(2, None), (210, None)])
```

Question 51

~~In Python 3.6, set() now should keep the order, but~~ there is another solution for Python 2 and 3:

>>> x = [1, 2, 20, 6, 210]
>>> sorted(set(x), key=x.index)
[1, 2, 20, 6, 210]

Question 52

Answering your first question, a set is a data structure optimized for set operations. Like a mathematical set, it does not enforce or maintain any particular order of the elements. The abstract concept of a set does not enforce order, so the implementation is not required to. When you create a set from a list, Python has the liberty to change the order of the elements for the needs of the internal implementation it uses for a set, which is able to perform set operations efficiently.

Question 53

remove duplicates and preserve order by below function

def unique(sequence):
    seen = set()
    return [x for x in sequence if not (x in seen or seen.add(x))]

check this link

Question 54

In mathematics, there are sets and ordered sets (osets).

set: an unordered container of unique elements (Implemented)
oset: an ordered container of unique elements (NotImplemented)

In Python, only sets are directly implemented. We can emulate osets with regular dict keys (3.7+).

Given

a = [1, 2, 20, 6, 210, 2, 1]
b = {2, 6}

Code

oset = dict.fromkeys(a).keys()
# dict_keys([1, 2, 20, 6, 210])

Demo

Replicates are removed, insertion-order is preserved.

list(oset)
# [1, 2, 20, 6, 210]

Set-like operations on dict keys.

oset - b
# {1, 20, 210}

oset | b
# {1, 2, 5, 6, 20, 210}

oset & b
# {2, 6}

oset ^ b
# {1, 5, 20, 210}

Details

Note: an unordered structure does not preclude ordered elements. Rather, maintained order is not guaranteed. Example:

assert {1, 2, 3} == {2, 3, 1}                    # sets (order is ignored)

assert [1, 2, 3] != [2, 3, 1]                    # lists (order is guaranteed)

One may be pleased to discover that a list and multiset (mset) are two more fascinating, mathematical data structures:

list: an ordered container of elements that permits replicates (Implemented)
mset: an unordered container of elements that permits replicates (NotImplemented)*

Summary

Container | Ordered | Unique | Implemented
----------|---------|--------|------------
set       |    n    |    y   |     y
oset      |    y    |    y   |     n
list      |    y    |    n   |     y
mset      |    n    |    n   |     n*

^{*A multiset can be indirectly emulated with collections.Counter(), a dict-like mapping of multiplicities (counts).}

Question 55

As denoted in other answers, sets are data structures (and mathematical concepts) that do not preserve the element order –

However, by using a combination of sets and dictionaries, it is possible that you can achieve wathever you want – try using these snippets:

# save the element order in a dict:
x_dict = dict(x,y for y, x in enumerate(my_list) )
x_set = set(my_list)
#perform desired set operations
...
#retrieve ordered list from the set:
new_list = [None] * len(new_set)
for element in new_set:
   new_list[x_dict[element]] = element

Question 56

Building on Sven’s answer, I found using collections.OrderedDict like so helped me accomplish what you want plus allow me to add more items to the dict:

import collections

x=[1,2,20,6,210]
z=collections.OrderedDict.fromkeys(x)
z
OrderedDict([(1, None), (2, None), (20, None), (6, None), (210, None)])

If you want to add items but still treat it like a set you can just do:

z['nextitem']=None

And you can perform an operation like z.keys() on the dict and get the set:

z.keys()
[1, 2, 20, 6, 210]

Question 57

An implementation of the highest score concept above that brings it back to a list:

def SetOfListInOrder(incominglist):
    from collections import OrderedDict
    outtemp = OrderedDict()
    for item in incominglist:
        outtemp[item] = None
    return(list(outtemp))

Tested (briefly) on Python 3.6 and Python 2.7.

Question 58

In case you have a small number of elements in your two initial lists on which you want to do set difference operation, instead of using collections.OrderedDict which complicates the implementation and makes it less readable, you can use:

# initial lists on which you want to do set difference
>>> nums = [1,2,2,3,3,4,4,5]
>>> evens = [2,4,4,6]
>>> evens_set = set(evens)
>>> result = []
>>> for n in nums:
...   if not n in evens_set and not n in result:
...     result.append(n)
... 
>>> result
[1, 3, 5]

Its time complexity is not that good but it is neat and easy to read.

Question 59

It’s interesting that people always use ‘real world problem’ to make joke on the definition in theoretical science.

If set has order, you first need to figure out the following problems. If your list has duplicate elements, what should the order be when you turn it into a set? What is the order if we union two sets? What is the order if we intersect two sets with different order on the same elements?

Plus, set is much faster in searching for a particular key which is very good in sets operation (and that’s why you need a set, but not list).

If you really care about the index, just keep it as a list. If you still want to do set operation on the elements in many lists, the simplest way is creating a dictionary for each list with the same keys in the set along with a value of list containing all the index of the key in the original list.

def indx_dic(l):
    dic = {}
    for i in range(len(l)):
        if l[i] in dic:
            dic.get(l[i]).append(i)
        else:
            dic[l[i]] = [i]
    return(dic)

a = [1,2,3,4,5,1,3,2]
set_a  = set(a)
dic_a = indx_dic(a)

print(dic_a)
# {1: [0, 5], 2: [1, 7], 3: [2, 6], 4: [3], 5: [4]}
print(set_a)
# {1, 2, 3, 4, 5}

Question 60

Here’s an easy way to do it:

x=[1,2,20,6,210]
print sorted(set(x))

Question 61

Is there any function that would be the equivalent of a combination of df.isin() and df[col].str.contains()?

For example, say I have the series s = pd.Series(['cat','hat','dog','fog','pet']), and I want to find all places where s contains any of ['og', 'at'], I would want to get everything but ‘pet’.

I have a solution, but it’s rather inelegant:

searchfor = ['og', 'at']
found = [s.str.contains(x) for x in searchfor]
result = pd.DataFrame[found]
result.any()

Is there a better way to do this?

Question 62

One option is just to use the regex | character to try to match each of the substrings in the words in your Series s (still using str.contains).

You can construct the regex by joining the words in searchfor with |:

>>> searchfor = ['og', 'at']
>>> s[s.str.contains('|'.join(searchfor))]
0    cat
1    hat
2    dog
3    fog
dtype: object

As @AndyHayden noted in the comments below, take care if your substrings have special characters such as $ and ^ which you want to match literally. These characters have specific meanings in the context of regular expressions and will affect the matching.

You can make your list of substrings safer by escaping non-alphanumeric characters with re.escape:

>>> import re
>>> matches = ['$money', 'x^y']
>>> safe_matches = [re.escape(m) for m in matches]
>>> safe_matches
['\\$money', 'x\\^y']

The strings with in this new list will match each character literally when used with str.contains.

Question 63

You can use str.contains alone with a regex pattern using OR (|):

s[s.str.contains('og|at')]

Or you could add the series to a dataframe then use str.contains:

df = pd.DataFrame(s)
df[s.str.contains('og|at')]

Output:

0 cat
1 hat
2 dog
3 fog

Question 64

Here is a one line lambda that also works:

df["TrueFalse"] = df['col1'].apply(lambda x: 1 if any(i in x for i in searchfor) else 0)

Input:

searchfor = ['og', 'at']

df = pd.DataFrame([('cat', 1000.0), ('hat', 2000000.0), ('dog', 1000.0), ('fog', 330000.0),('pet', 330000.0)], columns=['col1', 'col2'])

   col1  col2
0   cat 1000.0
1   hat 2000000.0
2   dog 1000.0
3   fog 330000.0
4   pet 330000.0

Apply Lambda:

df["TrueFalse"] = df['col1'].apply(lambda x: 1 if any(i in x for i in searchfor) else 0)

Output:

    col1    col2        TrueFalse
0   cat     1000.0      1
1   hat     2000000.0   1
2   dog     1000.0      1
3   fog     330000.0    1
4   pet     330000.0    0

问题：Python列表可以有多大？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

问题：如何从Python异步运行外部命令？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

问题：如何在python中从变量参数（kwargs）设置类属性

回答 0

回答 1

回答 2

回答 3

VERY LATE FOLLOW UP

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

问题：如何将数据作为字符串（而非文件）写入CSV格式？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：您如何在python中执行简单的“ chmod + x”操作？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：将列表转换为集合会更改元素顺序

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题：如何在大熊猫中测试字符串是否包含列表中的子字符串之一？

回答 0

回答 1

回答 2

问题：Python请求和持久会话

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：python的site-packages目录是什么？

回答 0

有用的阅读

Useful reading

回答 1

回答 2