标签归档:list

列表到数组的转换以使用ravel()函数

问题:列表到数组的转换以使用ravel()函数

我在python中有一个列表,我想将其转换为数组以能够使用ravel()函数。

I have a list in python and I want to convert it to an array to be able to use ravel() function.


回答 0

用途numpy.asarray

import numpy as np
myarray = np.asarray(mylist)

Use numpy.asarray:

import numpy as np
myarray = np.asarray(mylist)

回答 1

创建一个int数组和一个列表

from array import array
listA = list(range(0,50))
for item in listA:
    print(item)
arrayA = array("i", listA)
for item in arrayA:
    print(item)

create an int array and a list

from array import array
listA = list(range(0,50))
for item in listA:
    print(item)
arrayA = array("i", listA)
for item in arrayA:
    print(item)

回答 2

我想要一种无需使用额外模块即可执行此操作的方法。首先将列表转换为字符串,然后追加到数组:

dataset_list = ''.join(input_list)
dataset_array = []
for item in dataset_list.split(';'): # comma, or other
    dataset_array.append(item)

I wanted a way to do this without using an extra module. First turn list to string, then append to an array:

dataset_list = ''.join(input_list)
dataset_array = []
for item in dataset_list.split(';'): # comma, or other
    dataset_array.append(item)

回答 3

如果您只想ravel在自己的(嵌套,我要摆姿势?)列表上打电话,则可以直接执行此操作,numpy将为您进行转换:

L = [[1,None,3],["The", "quick", object]]
np.ravel(L)
# array([1, None, 3, 'The', 'quick', <class 'object'>], dtype=object)

另外值得一提的是,你不必去通过numpy所有

If all you want is calling ravel on your (nested, I s’pose?) list, you can do that directly, numpy will do the casting for you:

L = [[1,None,3],["The", "quick", object]]
np.ravel(L)
# array([1, None, 3, 'The', 'quick', <class 'object'>], dtype=object)

Also worth mentioning that you needn’t go through numpy at all.


回答 4

使用以下代码:

import numpy as np

myArray=np.array([1,2,4])  #func used to convert [1,2,3] list into an array
print(myArray)

Use the following code:

import numpy as np

myArray=np.array([1,2,4])  #func used to convert [1,2,3] list into an array
print(myArray)

回答 5

如果变量b有一个列表,则只需执行以下操作:

创建一个新变量“ a”为:a=[] 然后将列表分配给“ a”为:a=b

现在“ a”在数组中具有列表“ b”的所有组件。

因此您已成功将列表转换为数组。

if variable b has a list then you can simply do the below:

create a new variable “a” as: a=[] then assign the list to “a” as: a=b

now “a” has all the components of list “b” in array.

so you have successfully converted list to array.


Python:如何打印范围az?

问题:Python:如何打印范围az?

1.打印: abcdefghijklmn

2.每秒: acegikm

3.将url索引附加到{ hello.com/、hej.com/、…、hallo.com/}:hello.com/a hej.com/b … hallo.com/n

1. Print a-n: a b c d e f g h i j k l m n

2. Every second in a-n: a c e g i k m

3. Append a-n to index of urls{hello.com/, hej.com/, …, hallo.com/}: hello.com/a hej.com/b … hallo.com/n


回答 0

>>> import string
>>> string.ascii_lowercase[:14]
'abcdefghijklmn'
>>> string.ascii_lowercase[:14:2]
'acegikm'

要执行网址,您可以使用类似以下内容的网址

[i + j for i, j in zip(list_of_urls, string.ascii_lowercase[:14])]
>>> import string
>>> string.ascii_lowercase[:14]
'abcdefghijklmn'
>>> string.ascii_lowercase[:14:2]
'acegikm'

To do the urls, you could use something like this

[i + j for i, j in zip(list_of_urls, string.ascii_lowercase[:14])]

回答 1

假设这是一项家庭作业;-)-无需调用库等-它可能希望您将chr / ord与range()一起使用,如下所示:

for i in range(ord('a'), ord('n')+1):
    print chr(i),

对于其余的内容,只需要使用range()多一点

Assuming this is a homework ;-) – no need to summon libraries etc – it probably expect you to use range() with chr/ord, like so:

for i in range(ord('a'), ord('n')+1):
    print chr(i),

For the rest, just play a bit more with the range()


回答 2

提示:

import string
print string.ascii_lowercase

for i in xrange(0, 10, 2):
    print i

"hello{0}, world!".format('z')

Hints:

import string
print string.ascii_lowercase

and

for i in xrange(0, 10, 2):
    print i

and

"hello{0}, world!".format('z')

回答 3

for one in range(97,110):
    print chr(one)
for one in range(97,110):
    print chr(one)

回答 4

获取具有所需值的列表

small_letters = map(chr, range(ord('a'), ord('z')+1))
big_letters = map(chr, range(ord('A'), ord('Z')+1))
digits = map(chr, range(ord('0'), ord('9')+1))

要么

import string
string.letters
string.uppercase
string.digits

此解决方案使用ASCII表ord从一个字符获取ascii值,然后chr反之亦然。

应用您对列表的了解

>>> small_letters = map(chr, range(ord('a'), ord('z')+1))

>>> an = small_letters[0:(ord('n')-ord('a')+1)]
>>> print(" ".join(an))
a b c d e f g h i j k l m n

>>> print(" ".join(small_letters[0::2]))
a c e g i k m o q s u w y

>>> s = small_letters[0:(ord('n')-ord('a')+1):2]
>>> print(" ".join(s))
a c e g i k m

>>> urls = ["hello.com/", "hej.com/", "hallo.com/"]
>>> print([x + y for x, y in zip(urls, an)])
['hello.com/a', 'hej.com/b', 'hallo.com/c']

Get a list with the desired values

small_letters = map(chr, range(ord('a'), ord('z')+1))
big_letters = map(chr, range(ord('A'), ord('Z')+1))
digits = map(chr, range(ord('0'), ord('9')+1))

or

import string
string.letters
string.uppercase
string.digits

This solution uses the ASCII table. ord gets the ascii value from a character and chr vice versa.

Apply what you know about lists

>>> small_letters = map(chr, range(ord('a'), ord('z')+1))

>>> an = small_letters[0:(ord('n')-ord('a')+1)]
>>> print(" ".join(an))
a b c d e f g h i j k l m n

>>> print(" ".join(small_letters[0::2]))
a c e g i k m o q s u w y

>>> s = small_letters[0:(ord('n')-ord('a')+1):2]
>>> print(" ".join(s))
a c e g i k m

>>> urls = ["hello.com/", "hej.com/", "hallo.com/"]
>>> print([x + y for x, y in zip(urls, an)])
['hello.com/a', 'hej.com/b', 'hallo.com/c']

回答 5

import string
print list(string.ascii_lowercase)
# ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
import string
print list(string.ascii_lowercase)
# ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

回答 6

import string
print list(string.ascii_lowercase)
# ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

for c in list(string.ascii_lowercase)[:5]:
    ...operation with the first 5 characters
import string
print list(string.ascii_lowercase)
# ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

and

for c in list(string.ascii_lowercase)[:5]:
    ...operation with the first 5 characters

回答 7

myList = [chr(chNum) for chNum in list(range(ord('a'),ord('z')+1))]
print(myList)

输出量

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
myList = [chr(chNum) for chNum in list(range(ord('a'),ord('z')+1))]
print(myList)

Output

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

回答 8

#1)
print " ".join(map(chr, range(ord('a'),ord('n')+1)))

#2)
print " ".join(map(chr, range(ord('a'),ord('n')+1,2)))

#3)
urls = ["hello.com/", "hej.com/", "hallo.com/"]
an = map(chr, range(ord('a'),ord('n')+1))
print [ x + y for x,y in zip(urls, an)]
#1)
print " ".join(map(chr, range(ord('a'),ord('n')+1)))

#2)
print " ".join(map(chr, range(ord('a'),ord('n')+1,2)))

#3)
urls = ["hello.com/", "hej.com/", "hallo.com/"]
an = map(chr, range(ord('a'),ord('n')+1))
print [ x + y for x,y in zip(urls, an)]

回答 9

这个问题的答案很简单,只需列出一个名为ABC的列表,如下所示:

ABC = ['abcdefghijklmnopqrstuvwxyz']

每当需要引用它时,只需执行以下操作:

print ABC[0:9] #prints abcdefghij
print ABC       #prints abcdefghijklmnopqrstuvwxyz
for x in range(0,25):
    if x % 2 == 0:
        print ABC[x] #prints acegikmoqsuwy (all odd numbered letters)

也可以尝试这样来破坏您的设备:D

##Try this and call it AlphabetSoup.py:

ABC = ['abcdefghijklmnopqrstuvwxyz']


try:
    while True:
        for a in ABC:
            for b in ABC:
                for c in ABC:
                    for d in ABC:
                        for e in ABC:
                            for f in ABC:
                                print a, b, c, d, e, f, '    ',
except KeyboardInterrupt:
    pass

The answer to this question is simple, just make a list called ABC like so:

ABC = ['abcdefghijklmnopqrstuvwxyz']

And whenever you need to refer to it, just do:

print ABC[0:9] #prints abcdefghij
print ABC       #prints abcdefghijklmnopqrstuvwxyz
for x in range(0,25):
    if x % 2 == 0:
        print ABC[x] #prints acegikmoqsuwy (all odd numbered letters)

Also try this to break ur device :D

##Try this and call it AlphabetSoup.py:

ABC = ['abcdefghijklmnopqrstuvwxyz']


try:
    while True:
        for a in ABC:
            for b in ABC:
                for c in ABC:
                    for d in ABC:
                        for e in ABC:
                            for f in ABC:
                                print a, b, c, d, e, f, '    ',
except KeyboardInterrupt:
    pass

回答 10

尝试:

strng = ""
for i in range(97,123):
    strng = strng + chr(i)
print(strng)

Try:

strng = ""
for i in range(97,123):
    strng = strng + chr(i)
print(strng)

回答 11

这是您的第二个问题:string.lowercase[ord('a')-97:ord('n')-97:2]因为97==ord('a')-如果您想学习一点,您应该自己弄清楚其余的部分;-)

This is your 2nd question: string.lowercase[ord('a')-97:ord('n')-97:2] because 97==ord('a') — if you want to learn a bit you should figure out the rest yourself ;-)


回答 12

list(string.ascii_lowercase)

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
list(string.ascii_lowercase)

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

回答 13

我希望这有帮助:

import string

alphas = list(string.ascii_letters[:26])
for chr in alphas:
 print(chr)

I hope this helps:

import string

alphas = list(string.ascii_letters[:26])
for chr in alphas:
 print(chr)

回答 14

关于狼吞虎咽的答案。

邮编功能,充分说明,返回a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. [...]构造称为列表理解,很酷的功能!

About gnibbler’s answer.

Zip -function, full explanation, returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. [...] construct is called list comprehension, very cool feature!


回答 15

另一种方式

  import string
  pass

  aalist = list(string.ascii_lowercase)
  aaurls = ['alpha.com','bravo.com','chrly.com','delta.com',]
  iilen  =  aaurls.__len__()
  pass

  ans01 = "".join( (aalist[0:14]) )
  ans02 = "".join( (aalist[0:14:2]) )
  ans03 = "".join( "{vurl}/{vl}\n".format(vl=vjj[1],vurl=aaurls[vjj[0] % iilen]) for vjj in enumerate(aalist[0:14]) )
  pass

  print(ans01)
  print(ans02)
  print(ans03)
  pass

结果

abcdefghijklmn
acegikm
alpha.com/a
bravo.com/b
chrly.com/c
delta.com/d
alpha.com/e
bravo.com/f
chrly.com/g
delta.com/h
alpha.com/i
bravo.com/j
chrly.com/k
delta.com/l
alpha.com/m
bravo.com/n

这与其他回复有何不同

  • 遍历任意数量的基本网址
  • 循环浏览网址,直到我们用完所有字母后再停止
  • 使用enumerate结合列表理解和str.format

Another way to do it

  import string
  pass

  aalist = list(string.ascii_lowercase)
  aaurls = ['alpha.com','bravo.com','chrly.com','delta.com',]
  iilen  =  aaurls.__len__()
  pass

  ans01 = "".join( (aalist[0:14]) )
  ans02 = "".join( (aalist[0:14:2]) )
  ans03 = "".join( "{vurl}/{vl}\n".format(vl=vjj[1],vurl=aaurls[vjj[0] % iilen]) for vjj in enumerate(aalist[0:14]) )
  pass

  print(ans01)
  print(ans02)
  print(ans03)
  pass

Result

abcdefghijklmn
acegikm
alpha.com/a
bravo.com/b
chrly.com/c
delta.com/d
alpha.com/e
bravo.com/f
chrly.com/g
delta.com/h
alpha.com/i
bravo.com/j
chrly.com/k
delta.com/l
alpha.com/m
bravo.com/n

How this differs from the other replies

  • iterate over an arbitrary number of base urls
  • cycle through the urls and do not stop until we run out of letters
  • use enumerate in conjunction with list comprehension and str.format

来自os.listdir()的非字母数字列表顺序

问题:来自os.listdir()的非字母数字列表顺序

我经常使用python处理数据目录。最近,我注意到列表的默认顺序已更改为几乎毫无意义的内容。例如,如果我位于包含以下子目录的当前目录中:run01,run02,…,run19,run20,然后从以下命令生成列表:

dir = os.listdir(os.getcwd())

然后我通常会按以下顺序获得列表:

dir = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08', ... ]

等等。该顺序曾经是字母数字。但是这个新订单已经存在了一段时间。

是什么决定这些列表的(显示)顺序?

I often use python to process directories of data. Recently, I have noticed that the default order of the lists has changed to something almost nonsensical. For example, if I am in a current directory containing the following subdirectories: run01, run02, … run19, run20, and then I generate a list from the following command:

dir = os.listdir(os.getcwd())

then I usually get a list in this order:

dir = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08', ... ]

and so on. The order used to be alphanumeric. But this new order has remained with me for a while now.

What is determining the (displayed) order of these lists?


回答 0

我认为顺序与文件在FileSystem上建立索引的方式有关。如果您确实要使其遵循某些顺序,则可以在获取文件后始终对列表进行排序。

I think the order has to do with the way the files are indexed on your FileSystem. If you really want to make it adhere to some order you can always sort the list after getting the files.


回答 1

您可以使用内置sorted函数对字符串进行任意排序。根据您的描述,

sorted(os.listdir(whatever_directory))

或者,您可以使用.sort列表的方法:

lst = os.listdir(whatever_directory)
lst.sort()

我认为应该可以解决问题。

请注意,os.listdir获取文件名的顺序可能完全取决于您的文件系统。

You can use the builtin sorted function to sort the strings however you want. Based on what you describe,

sorted(os.listdir(whatever_directory))

Alternatively, you can use the .sort method of a list:

lst = os.listdir(whatever_directory)
lst.sort()

I think should do the trick.

Note that the order that os.listdir gets the filenames is probably completely dependent on your filesystem.


回答 2

根据文档

os.listdir(路径)

返回一个列表,其中包含由path给出的目录中条目的名称。该列表按任意顺序排列。它不包括特殊条目“。” 和“ ..”,即使它们存在于目录中。

不能依赖顺序,它是文件系统的产物。

要对结果进行排序,请使用sorted(os.listdir(path))

Per the documentation:

os.listdir(path)

Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order. It does not include the special entries ‘.’ and ‘..’ even if they are present in the directory.

Order cannot be relied upon and is an artifact of the filesystem.

To sort the result, use sorted(os.listdir(path)).


回答 3

不管出于什么原因,Python都没有内置的方法来进行自然排序(意味着1、2、10而不是1、10、2),因此您必须自己编写:

import re
def sorted_alphanumeric(data):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(data, key=alphanum_key)

现在,您可以使用此功能对列表进行排序:

dirlist = sorted_alphanumeric(os.listdir(...))

问题: 如果您使用上述函数对字符串(例如文件夹名称)进行排序,并希望像Windows资源管理器一样对它们进行排序,则在某些情况下无法正常使用。
如果您的文件夹名称中带有某些“特殊”字符,则此排序功能将在Windows上返回不正确的结果。例如,此函数将排序1, !1, !a, a,而Windows资源管理器将排序!1, 1, !a, a

因此,如果您想像Python中的Windows资源管理器那样进行排序,则必须通过ctypes 使用Windows内置函数StrCmpLogicalW(这当然在Unix上不起作用):

from ctypes import wintypes, windll
from functools import cmp_to_key
def winsort(data):
    _StrCmpLogicalW = windll.Shlwapi.StrCmpLogicalW
    _StrCmpLogicalW.argtypes = [wintypes.LPWSTR, wintypes.LPWSTR]
    _StrCmpLogicalW.restype  = wintypes.INT

    cmp_fnc = lambda psz1, psz2: _StrCmpLogicalW(psz1, psz2)
    return sorted(data, key=cmp_to_key(cmp_fnc))

此功能比稍慢sorted_alphanumeric()

奖励:winsort还可以在Windows上对完整路径进行排序

另外,尤其是在使用Unix的情况下,可以使用natsort库(pip install natsort)以正确的方式对完整路径进行排序(意味着子文件夹位于正确的位置)。

您可以像这样使用它来排序完整路径:

from natsort import natsorted, ns
dirlist = natsorted(dirlist, alg=ns.PATH | ns.IGNORECASE)

不要将其用于仅对文件夹名称(或通常为字符串)进行常规排序,因为它比sorted_alphanumeric()上面的函数要慢很多。如果您期望Windows资源管理器排序,该
natsorted库将给您不正确的结果,因此可以使用winsort()它。

Python for whatever reason does not come with a built-in way to have natural sorting (meaning 1, 2, 10 instead of 1, 10, 2), so you have to write it yourself:

import re
def sorted_alphanumeric(data):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(data, key=alphanum_key)

You can now use this function to sort a list:

dirlist = sorted_alphanumeric(os.listdir(...))

PROBLEMS: In case you use the above function to sort strings (for example folder names) and want them sorted like Windows Explorer does, it will not work properly in some edge cases.
This sorting function will return incorrect results on Windows, if you have folder names with certain ‘special’ characters in them. For example this function will sort 1, !1, !a, a, whereas Windows Explorer would sort !1, 1, !a, a.

So if you want to sort exactly like Windows Explorer does in Python you have to use the Windows built-in function StrCmpLogicalW via ctypes (this of course won’t work on Unix):

from ctypes import wintypes, windll
from functools import cmp_to_key
def winsort(data):
    _StrCmpLogicalW = windll.Shlwapi.StrCmpLogicalW
    _StrCmpLogicalW.argtypes = [wintypes.LPWSTR, wintypes.LPWSTR]
    _StrCmpLogicalW.restype  = wintypes.INT

    cmp_fnc = lambda psz1, psz2: _StrCmpLogicalW(psz1, psz2)
    return sorted(data, key=cmp_to_key(cmp_fnc))

This function is slightly slower than sorted_alphanumeric().

Bonus: winsort can also sort full paths on Windows.

Alternatively, especially if you use Unix, you can use the natsort library (pip install natsort) to sort by full paths in a correct way (meaning subfolders at the correct position).

You can use it like this to sort full paths:

from natsort import natsorted, ns
dirlist = natsorted(dirlist, alg=ns.PATH | ns.IGNORECASE)

Don’t use it for normal sorting of just folder names (or strings in general), as it’s quite a bit slower than then sorted_alphanumeric() function above.
natsorted library will give you incorrect results if you expect Windows Explorer sorting, so use winsort() for that.


回答 4

我认为默认情况下,顺序由ASCII值确定。这个问题的解决方案是这样

dir = sorted(os.listdir(os.getcwd()), key=len)

I think by default the order is determined with the ASCII value. The solution to this problem is this

dir = sorted(os.listdir(os.getcwd()), key=len)

回答 5

这可能只是C的readdir()返回顺序。尝试运行此C程序:

#include <dirent.h>
#include <stdio.h>
int main(void)
{   DIR *dirp;
    struct dirent* de;
    dirp = opendir(".");
    while(de = readdir(dirp)) // Yes, one '='.
        printf("%s\n", de->d_name);
    closedir(dirp);
    return 0;
}

构建线应类似于gcc -o foo foo.c

PS只需运行此代码和您的Python代码,它们都给了我排序的输出,所以我无法重现您看到的内容。

It’s probably just the order that C’s readdir() returns. Try running this C program:

#include <dirent.h>
#include <stdio.h>
int main(void)
{   DIR *dirp;
    struct dirent* de;
    dirp = opendir(".");
    while(de = readdir(dirp)) // Yes, one '='.
        printf("%s\n", de->d_name);
    closedir(dirp);
    return 0;
}

The build line should be something like gcc -o foo foo.c.

P.S. Just ran this and your Python code, and they both gave me sorted output, so I can’t reproduce what you’re seeing.


回答 6

aaa = ['row_163.pkl', 'row_394.pkl', 'row_679.pkl', 'row_202.pkl', 'row_1449.pkl', 'row_247.pkl', 'row_1353.pkl', 'row_749.pkl', 'row_1293.pkl', 'row_1304.pkl', 'row_78.pkl', 'row_532.pkl', 'row_9.pkl', 'row_1435.pkl']                                                                                                                                                                                                                                                                                                 
sorted(aaa, key=lambda x: int(os.path.splitext(x.split('_')[1])[0]))

由于在案件的矿山要求我有这样的情况row_163.pkl在这里os.path.splitext('row_163.pkl')将它分成('row_163', '.pkl')所以需要根据“_”也把它分解。

但如果您有需要,您可以做类似的事情

sorted(aa, key = lambda x: (int(re.sub('\D','',x)),x))

哪里

aa = ['run01', 'run08', 'run11', 'run12', 'run13', 'run14', 'run18']

对于目录检索,您也可以 sorted(os.listdir(path))

对于like 'run01.txt''run01.csv'您可以这样做

sorted(files, key=lambda x : int(os.path.splitext(x)[0]))
aaa = ['row_163.pkl', 'row_394.pkl', 'row_679.pkl', 'row_202.pkl', 'row_1449.pkl', 'row_247.pkl', 'row_1353.pkl', 'row_749.pkl', 'row_1293.pkl', 'row_1304.pkl', 'row_78.pkl', 'row_532.pkl', 'row_9.pkl', 'row_1435.pkl']                                                                                                                                                                                                                                                                                                 
sorted(aaa, key=lambda x: int(os.path.splitext(x.split('_')[1])[0]))

As In case of mine requirement I have the case like row_163.pkl here os.path.splitext('row_163.pkl') will break it into ('row_163', '.pkl') so need to split it based on ‘_’ also.

but in case of your requirement you can do something like

sorted(aa, key = lambda x: (int(re.sub('\D','',x)),x))

where

aa = ['run01', 'run08', 'run11', 'run12', 'run13', 'run14', 'run18']

and also for directory retrieving you can do sorted(os.listdir(path))

and for the case of like 'run01.txt' or 'run01.csv' you can do like this

sorted(files, key=lambda x : int(os.path.splitext(x)[0]))

回答 7

我发现“排序”并不总是按预期进行。例如,我有一个如下目录,“ sort”给我一个非常奇怪的结果:

>>> os.listdir(pathon)
['2', '3', '4', '5', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472']
>>> sorted([ f for f in os.listdir(pathon)])
['2', '3', '4', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472', '5']

看起来它首先比较第一个字符,如果最大,那就是最后一个。

I found “sort” does not always do what I expected. eg, I have a directory as below, and the “sort” give me a very strange result:

>>> os.listdir(pathon)
['2', '3', '4', '5', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472']
>>> sorted([ f for f in os.listdir(pathon)])
['2', '3', '4', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472', '5']

It seems it compares the first character first, if that is the biggest, it would be the last one.


回答 8

文档中

该列表以任意顺序排列,并且不包括特殊条目“。”。和“ ..”,即使它们存在于目录中。

这意味着该顺序可能与OS /文件系统相关,没有特别有意义的顺序,因此不能保证特定顺序。提到了很多答案:如果需要,可以对检索到的列表进行排序。

干杯:)

From the documentation:

The list is in arbitrary order, and does not include the special entries ‘.’ and ‘..’ even if they are present in the directory.

This means that the order is probably OS/filesystem dependent, has no particularly meaningful order, and is therefore not guaranteed to be anything in particular. As many answers mentioned: if preferred, the retrieved list can be sorted.

Cheers :)


回答 9

艾略特的答案可以很好地解决它,但是由于它是评论,因此没有引起注意,因此为了帮助某人,我在此重申它为解决方案。

使用natsort库:

使用以下命令为Ubuntu和其他Debian版本安装库

Python 2

sudo pip install natsort

Python 3

sudo pip3 install natsort

有关如何使用此库的详细信息,请参见此处

Elliot’s answer solves it perfectly but because it is a comment, it goes unnoticed so with the aim of helping someone, I am reiterating it as a solution.

Use natsort library:

Install the library with the following command for Ubuntu and other Debian versions

Python 2

sudo pip install natsort

Python 3

sudo pip3 install natsort

Details of how to use this library is found here


回答 10

In [6]: os.listdir?

Type:       builtin_function_or_method
String Form:<built-in function listdir>
Docstring:
listdir(path) -> list_of_strings
Return a list containing the names of the entries in the directory.
path: path of directory to list
The list is in **arbitrary order**.  It does not include the special
entries '.' and '..' even if they are present in the directory.
In [6]: os.listdir?

Type:       builtin_function_or_method
String Form:<built-in function listdir>
Docstring:
listdir(path) -> list_of_strings
Return a list containing the names of the entries in the directory.
path: path of directory to list
The list is in **arbitrary order**.  It does not include the special
entries '.' and '..' even if they are present in the directory.

回答 11

os.listdirsorted命令的建议组合产生的结果与Linux下的ls -l命令相同。以下示例验证了此假设:

user@user-PC:/tmp/test$ touch 3a 4a 5a b c d1 d2 d3 k l p0 p1 p3 q 410a 409a 408a 407a
user@user-PC:/tmp/test$ ls -l
total 0
-rw-rw-r-- 1 user user 0 Feb  15 10:31 3a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 407a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 408a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 409a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 410a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 4a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 5a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 b
-rw-rw-r-- 1 user user 0 Feb  15 10:31 c
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d1
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d2
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d3
-rw-rw-r-- 1 user user 0 Feb  15 10:31 k
-rw-rw-r-- 1 user user 0 Feb  15 10:31 l
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p0
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p1
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p3
-rw-rw-r-- 1 user user 0 Feb  15 10:31 q

user@user-PC:/tmp/test$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.listdir( './' )
['d3', 'k', 'p1', 'b', '410a', '5a', 'l', 'p0', '407a', '409a', '408a', 'd2', '4a', 'p3', '3a', 'q', 'c', 'd1']
>>> sorted( os.listdir( './' ) )
['3a', '407a', '408a', '409a', '410a', '4a', '5a', 'b', 'c', 'd1', 'd2', 'd3', 'k', 'l', 'p0', 'p1', 'p3', 'q']
>>> exit()
user@user-PC:/tmp/test$ 

因此,对于想要在其Python代码中重现著名的ls -l命令的结果的人来说,sorted(os.listdir(DIR))效果很好。

The proposed combination of os.listdir and sorted commands generates the same result as ls -l command under Linux. The following example verifies this assumption:

user@user-PC:/tmp/test$ touch 3a 4a 5a b c d1 d2 d3 k l p0 p1 p3 q 410a 409a 408a 407a
user@user-PC:/tmp/test$ ls -l
total 0
-rw-rw-r-- 1 user user 0 Feb  15 10:31 3a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 407a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 408a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 409a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 410a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 4a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 5a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 b
-rw-rw-r-- 1 user user 0 Feb  15 10:31 c
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d1
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d2
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d3
-rw-rw-r-- 1 user user 0 Feb  15 10:31 k
-rw-rw-r-- 1 user user 0 Feb  15 10:31 l
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p0
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p1
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p3
-rw-rw-r-- 1 user user 0 Feb  15 10:31 q

user@user-PC:/tmp/test$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.listdir( './' )
['d3', 'k', 'p1', 'b', '410a', '5a', 'l', 'p0', '407a', '409a', '408a', 'd2', '4a', 'p3', '3a', 'q', 'c', 'd1']
>>> sorted( os.listdir( './' ) )
['3a', '407a', '408a', '409a', '410a', '4a', '5a', 'b', 'c', 'd1', 'd2', 'd3', 'k', 'l', 'p0', 'p1', 'p3', 'q']
>>> exit()
user@user-PC:/tmp/test$ 

So, for someone who wants to reproduce the result of the well-known ls -l command in their python code, sorted( os.listdir( DIR ) ) works pretty well.


我想exceptions处理“列表索引超出范围”。

问题:我想exceptions处理“列表索引超出范围”。

我正在使用BeautifulSoup并解析一些HTML。

我从每个HTML (使用for循环)中获取特定数据,并将该数据添加到特定列表中。

问题是,某些HTML具有不同的格式(它们中没有我想要的数据)

因此,我尝试使用异常处理并将值添加null到列表中(我应该这样做,因为数据顺序很重要。)

例如,我有一个类似的代码:

soup = BeautifulSoup(links)
dlist = soup.findAll('dd', 'title')
# I'm trying to find content between <dd class='title'> and </dd>
gotdata = dlist[1]
# and what i want is the 2nd content of those
newlist.append(gotdata)
# and I add that to a newlist

并且某些链接没有任何链接<dd class='title'>,所以我想要做的是将字符串添加null到列表中。

错误出现:

list index out of range.

我尝试做的是添加一些像这样的行:

if not dlist[1]:  
   newlist.append('null')
   continue

但这行不通。它仍然显示错误:

list index out of range.

我该怎么办?我应该使用异常处理吗?还是有更简单的方法?

有什么建议?任何帮助都将非常棒!

I am using BeautifulSoup and parsing some HTMLs.

I’m getting a certain data from each HTML (using for loop) and adding that data to a certain list.

The problem is, some of the HTMLs have different format (and they don’t have the data that I want in them).

So, I was trying to use exception handling and add value null to the list (I should do this since the sequence of data is important.)

For instance, I have a code like:

soup = BeautifulSoup(links)
dlist = soup.findAll('dd', 'title')
# I'm trying to find content between <dd class='title'> and </dd>
gotdata = dlist[1]
# and what i want is the 2nd content of those
newlist.append(gotdata)
# and I add that to a newlist

and some of the links don’t have any <dd class='title'>, so what I want to do is add string null to the list instead.

The error appears:

list index out of range.

What I have done tried is to add some lines like this:

if not dlist[1]:  
   newlist.append('null')
   continue

But it doesn’t work out. It still shows error:

list index out of range.

What should I do about this? Should I use exception handling? or is there any easier way?

Any suggestions? Any help would be really great!


回答 0

处理异常的方法是:

try:
    gotdata = dlist[1]
except IndexError:
    gotdata = 'null'

当然,你也可以检查len()dlist; 但是处理异常更为直观。

Handling the exception is the way to go:

try:
    gotdata = dlist[1]
except IndexError:
    gotdata = 'null'

Of course you could also check the len() of dlist; but handling the exception is more intuitive.


回答 1

您有两个选择;处理异常或测试长度:

if len(dlist) > 1:
    newlist.append(dlist[1])
    continue

要么

try:
    newlist.append(dlist[1])
except IndexError:
    pass
continue

如果经常没有第二项,则使用第一项;如果有时没有第二项,则使用第二项。

You have two options; either handle the exception or test the length:

if len(dlist) > 1:
    newlist.append(dlist[1])
    continue

or

try:
    newlist.append(dlist[1])
except IndexError:
    pass
continue

Use the first if there often is no second item, the second if there sometimes is no second item.


回答 2

三元就足够了。更改:

gotdata = dlist[1]

gotdata = dlist[1] if len(dlist) > 1 else 'null'

这是一种较短的表达方式

if len(dlist) > 1:
    gotdata = dlist[1]
else: 
    gotdata = 'null'

A ternary will suffice. change:

gotdata = dlist[1]

to

gotdata = dlist[1] if len(dlist) > 1 else 'null'

this is a shorter way of expressing

if len(dlist) > 1:
    gotdata = dlist[1]
else: 
    gotdata = 'null'

回答 3

引用ThiefMaster♦有时,我们会得到一个错误,其值指定为’\ n’或null并执行处理ValueError所需的错误:

处理异常是解决之道

try:
    gotdata = dlist[1]
except (IndexError, ValueError):
    gotdata = 'null'

Taking reference of ThiefMaster♦ sometimes we get an error with value given as ‘\n’ or null and perform for that required to handle ValueError:

Handling the exception is the way to go

try:
    gotdata = dlist[1]
except (IndexError, ValueError):
    gotdata = 'null'

回答 4

for i in range (1, len(list))
    try:
        print (list[i])

    except ValueError:
        print("Error Value.")
    except indexError:
        print("Erorr index")
    except :
        print('error ')
for i in range (1, len(list))
    try:
        print (list[i])

    except ValueError:
        print("Error Value.")
    except indexError:
        print("Erorr index")
    except :
        print('error ')

回答 5

对于任何对较短方式感兴趣的人:

gotdata = len(dlist)>1 and dlist[1] or 'null'

但是为了获得最佳性能,我建议使用False而不是’null’,那么单行测试就足够了:

gotdata = len(dlist)>1 and dlist[1]

For anyone interested in a shorter way:

gotdata = len(dlist)>1 and dlist[1] or 'null'

But for best performance, I suggest using False instead of 'null', then a one line test will suffice:

gotdata = len(dlist)>1 and dlist[1]

Python:字典列表,如果存在,则增加一个字典值,如果不增加新字典

问题:Python:字典列表,如果存在,则增加一个字典值,如果不增加新字典

我想做类似的事情。

list_of_urls = ['http://www.google.fr/', 'http://www.google.fr/', 
                'http://www.google.cn/', 'http://www.google.com/', 
                'http://www.google.fr/', 'http://www.google.fr/', 
                'http://www.google.fr/', 'http://www.google.com/', 
                'http://www.google.fr/', 'http://www.google.com/', 
                'http://www.google.cn/']

urls = [{'url': 'http://www.google.fr/', 'nbr': 1}]

for url in list_of_urls:
    if url in [f['url'] for f in urls]:
         urls[??]['nbr'] += 1
    else:
         urls.append({'url': url, 'nbr': 1})

我能怎么做 ?我不知道该选择元组来编辑它还是找出元组索引?

有什么帮助吗?

I would like do something like that.

list_of_urls = ['http://www.google.fr/', 'http://www.google.fr/', 
                'http://www.google.cn/', 'http://www.google.com/', 
                'http://www.google.fr/', 'http://www.google.fr/', 
                'http://www.google.fr/', 'http://www.google.com/', 
                'http://www.google.fr/', 'http://www.google.com/', 
                'http://www.google.cn/']

urls = [{'url': 'http://www.google.fr/', 'nbr': 1}]

for url in list_of_urls:
    if url in [f['url'] for f in urls]:
         urls[??]['nbr'] += 1
    else:
         urls.append({'url': url, 'nbr': 1})

How can I do ? I don’t know if I should take the tuple to edit it or figure out the tuple indices?

Any help ?


回答 0

那是组织事情的一种非常奇怪的方式。如果存储在字典中,这很容易:

# This example should work in any version of Python.
# urls_d will contain URL keys, with counts as values, like: {'http://www.google.fr/' : 1 }
urls_d = {}
for url in list_of_urls:
    if not url in urls_d:
        urls_d[url] = 1
    else:
        urls_d[url] += 1

这段更新计数字典的代码是Python中常见的“模式”。常见的是defaultdict,创建了一个特殊的数据结构,以使其变得更加容易:

from collections import defaultdict  # available in Python 2.5 and newer

urls_d = defaultdict(int)
for url in list_of_urls:
    urls_d[url] += 1

如果您defaultdict使用键访问,而该键尚未在中defaultdict,则该键会自动添加一个默认值。将defaultdict采取调用您传递,并调用它来获得默认值。在这种情况下,我们在课堂上通过了int;当Python调用时,int()它返回零值。因此,第一次引用URL时,其计数将初始化为零,然后将一个添加到计数中。

但是充满计数的字典也是一种常见的模式,因此Python提供了一个现成的类:containers.Counter 您只需Counter通过调用该类并传递任何可迭代的类来创建实例;它会建立一个字典,其中的键是可迭代的值,而值是键在可迭代中出现的次数的计数。上面的示例变为:

from collections import Counter  # available in Python 2.7 and newer

urls_d = Counter(list_of_urls)

如果您确实需要按照显示的方式进行操作,则最简单,最快的方法是使用这三个示例中的任何一个,然后构建所需的示例。

from collections import defaultdict  # available in Python 2.5 and newer

urls_d = defaultdict(int)
for url in list_of_urls:
    urls_d[url] += 1

urls = [{"url": key, "nbr": value} for key, value in urls_d.items()]

如果您使用的是Python 2.7或更高版本,则可以单行执行:

from collections import Counter

urls = [{"url": key, "nbr": value} for key, value in Counter(list_of_urls).items()]

That is a very strange way to organize things. If you stored in a dictionary, this is easy:

# This example should work in any version of Python.
# urls_d will contain URL keys, with counts as values, like: {'http://www.google.fr/' : 1 }
urls_d = {}
for url in list_of_urls:
    if not url in urls_d:
        urls_d[url] = 1
    else:
        urls_d[url] += 1

This code for updating a dictionary of counts is a common “pattern” in Python. It is so common that there is a special data structure, defaultdict, created just to make this even easier:

from collections import defaultdict  # available in Python 2.5 and newer

urls_d = defaultdict(int)
for url in list_of_urls:
    urls_d[url] += 1

If you access the defaultdict using a key, and the key is not already in the defaultdict, the key is automatically added with a default value. The defaultdict takes the callable you passed in, and calls it to get the default value. In this case, we passed in class int; when Python calls int() it returns a zero value. So, the first time you reference a URL, its count is initialized to zero, and then you add one to the count.

But a dictionary full of counts is also a common pattern, so Python provides a ready-to-use class: containers.Counter You just create a Counter instance by calling the class, passing in any iterable; it builds a dictionary where the keys are values from the iterable, and the values are counts of how many times the key appeared in the iterable. The above example then becomes:

from collections import Counter  # available in Python 2.7 and newer

urls_d = Counter(list_of_urls)

If you really need to do it the way you showed, the easiest and fastest way would be to use any one of these three examples, and then build the one you need.

from collections import defaultdict  # available in Python 2.5 and newer

urls_d = defaultdict(int)
for url in list_of_urls:
    urls_d[url] += 1

urls = [{"url": key, "nbr": value} for key, value in urls_d.items()]

If you are using Python 2.7 or newer you can do it in a one-liner:

from collections import Counter

urls = [{"url": key, "nbr": value} for key, value in Counter(list_of_urls).items()]

回答 1

使用默认值可以,但是:

urls[url] = urls.get(url, 0) + 1

使用.get,可以获取默认返回值(如果不存在)。默认情况下为None,但如果我发送给您,则为0。

Using the default works, but so does:

urls[url] = urls.get(url, 0) + 1

using .get, you can get a default return if it doesn’t exist. By default it’s None, but in the case I sent you, it would be 0.


回答 2

使用defaultdict

from collections import defaultdict

urls = defaultdict(int)

for url in list_of_urls:
    urls[url] += 1

Use defaultdict:

from collections import defaultdict

urls = defaultdict(int)

for url in list_of_urls:
    urls[url] += 1

回答 3

这对我来说总是正常的:

for url in list_of_urls:
    urls.setdefault(url, 0)
    urls[url] += 1

This always works fine for me:

for url in list_of_urls:
    urls.setdefault(url, 0)
    urls[url] += 1

回答 4

完全按照您的方式来做?您可以使用for … else结构

for url in list_of_urls:
    for url_dict in urls:
        if url_dict['url'] == url:
            url_dict['nbr'] += 1
            break
    else:
        urls.append(dict(url=url, nbr=1))

但这是很不雅观的。您是否真的必须将访问的URL存储为LIST?例如,如果将其排序为dict,并以url字符串索引,则它会更干净:

urls = {'http://www.google.fr/': dict(url='http://www.google.fr/', nbr=1)}

for url in list_of_urls:
    if url in urls:
        urls[url]['nbr'] += 1
    else:
        urls[url] = dict(url=url, nbr=1)

在第二个示例中需要注意的几件事:

  • 了解测试单个测试时如何使用dict urls消除整个urls列表的需求url。这种方法将更快。
  • 使用dict( )大括号代替您的代码
  • 使用list_of_urlsurlsurl作为变量名使代码挺难解析。这是更好地找到一些更清晰的,如urls_to_visiturls_already_visitedcurrent_url。我知道,时间更长。但这更清楚。

当然,我假设这dict(url='http://www.google.fr', nbr=1)是您自己的数据结构的简化,因为否则,urls可能只是:

urls = {'http://www.google.fr':1}

for url in list_of_urls:
    if url in urls:
        urls[url] += 1
    else:
        urls[url] = 1

使用defaultdict姿势可以很优雅:

urls = collections.defaultdict(int)
for url in list_of_urls:
    urls[url] += 1

To do it exactly your way? You could use the for…else structure

for url in list_of_urls:
    for url_dict in urls:
        if url_dict['url'] == url:
            url_dict['nbr'] += 1
            break
    else:
        urls.append(dict(url=url, nbr=1))

But it is quite inelegant. Do you really have to store the visited urls as a LIST? If you sort it as a dict, indexed by url string, for example, it would be way cleaner:

urls = {'http://www.google.fr/': dict(url='http://www.google.fr/', nbr=1)}

for url in list_of_urls:
    if url in urls:
        urls[url]['nbr'] += 1
    else:
        urls[url] = dict(url=url, nbr=1)

A few things to note in that second example:

  • see how using a dict for urls removes the need for going through the whole urls list when testing for one single url. This approach will be faster.
  • Using dict( ) instead of braces makes your code shorter
  • using list_of_urls, urls and url as variable names make the code quite hard to parse. It’s better to find something clearer, such as urls_to_visit, urls_already_visited and current_url. I know, it’s longer. But it’s clearer.

And of course I’m assuming that dict(url='http://www.google.fr', nbr=1) is a simplification of your own data structure, because otherwise, urls could simply be:

urls = {'http://www.google.fr':1}

for url in list_of_urls:
    if url in urls:
        urls[url] += 1
    else:
        urls[url] = 1

Which can get very elegant with the defaultdict stance:

urls = collections.defaultdict(int)
for url in list_of_urls:
    urls[url] += 1

回答 5

除了第一次以外,每次看到一个单词时,if语句的测试都会失败。如果您要计算大量的单词,许多单词可能会多次出现。在一个值的初始化仅发生一次且该值的增加将发生多次的情况下,使用try语句会更便宜:

urls_d = {}
for url in list_of_urls:
    try:
        urls_d[url] += 1
    except KeyError:
        urls_d[url] = 1

您可以阅读有关此内容的更多信息:https : //wiki.python.org/moin/PythonSpeed/PerformanceTips

Except for the first time, each time a word is seen the if statement’s test fails. If you are counting a large number of words, many will probably occur multiple times. In a situation where the initialization of a value is only going to occur once and the augmentation of that value will occur many times it is cheaper to use a try statement:

urls_d = {}
for url in list_of_urls:
    try:
        urls_d[url] += 1
    except KeyError:
        urls_d[url] = 1

you can read more about this: https://wiki.python.org/moin/PythonSpeed/PerformanceTips


为什么元组在内存中的空间比列表少?

问题:为什么元组在内存中的空间比列表少?

A tuple在Python中占用更少的内存空间:

>>> a = (1,2,3)
>>> a.__sizeof__()
48

lists占用更多的内存空间:

>>> b = [1,2,3]
>>> b.__sizeof__()
64

Python内存管理内部会发生什么?

A tuple takes less memory space in Python:

>>> a = (1,2,3)
>>> a.__sizeof__()
48

whereas lists takes more memory space:

>>> b = [1,2,3]
>>> b.__sizeof__()
64

What happens internally on the Python memory management?


回答 0

我假设您正在使用CPython并使用64位(在CPython 2.7 64位上得到的结果相同)。其他Python实现可能会有所不同,或者您拥有32位Python。

无论采用哪种实现方式,lists都是可变大小的,而tuples是固定大小的。

因此tuples可以将元素直接存储在struct内部,另一方面,列表需要一层间接寻址(它存储指向元素的指针)。间接层是一个指针,在64位系统(即64位,即8字节)上。

但是还有另一件事list:它们过度分配。否则,list.append始终是一项O(n)操作-要使其摊销(快得多!!!),它会过度分配。但是现在它必须跟踪分配的大小和填充的大小(s只需要存储一个大小,因为分配的和填充的大小始终相同)。这意味着每个列表必须存储另一个“大小”,它在64位系统上是64位整数,也是8个字节。O(1)tuple

因此lists比tuples 需要至少16个字节的内存。为什么我说“至少”?由于分配过多。过度分配意味着它分配了比所需更多的空间。但是,过度分配的数量取决于创建列表的“方式”和附加/删除历史记录:

>>> l = [1,2,3]
>>> l.__sizeof__()
64
>>> l.append(4)  # triggers re-allocation (with over-allocation), because the original list is full
>>> l.__sizeof__()
96

>>> l = []
>>> l.__sizeof__()
40
>>> l.append(1)  # re-allocation with over-allocation
>>> l.__sizeof__()
72
>>> l.append(2)  # no re-alloc
>>> l.append(3)  # no re-alloc
>>> l.__sizeof__()
72
>>> l.append(4)  # still has room, so no over-allocation needed (yet)
>>> l.__sizeof__()
72

图片

我决定创建一些图像以伴随以上说明。也许这些有帮助

在示例中,这是(示意性地)将其存储在内存中的方式。我强调了红色(徒手)循环的区别:

这实际上只是一个近似值,因为int对象也是Python对象,并且CPython甚至重用了小整数,因此内存中对象的一种可能更准确的表示形式(尽管不那么可读)将是:

有用的链接:

请注意,__sizeof__它并不会真正返回“正确”的大小!它仅返回存储值的大小。但是,使用sys.getsizeof结果不同:

>>> import sys
>>> l = [1,2,3]
>>> t = (1, 2, 3)
>>> sys.getsizeof(l)
88
>>> sys.getsizeof(t)
72

有24个“额外”字节。这些是真实的,这是方法中未考虑的垃圾收集器开销__sizeof__。这是因为您通常不应该直接使用魔术方法-在这种情况下,请使用知道如何处理魔术方法的函数:(sys.getsizeof这实际上会将GC开销加到的返回值上__sizeof__)。

I assume you’re using CPython and with 64bits (I got the same results on my CPython 2.7 64-bit). There could be differences in other Python implementations or if you have a 32bit Python.

Regardless of the implementation, lists are variable-sized while tuples are fixed-size.

So tuples can store the elements directly inside the struct, lists on the other hand need a layer of indirection (it stores a pointer to the elements). This layer of indirection is a pointer, on 64bit systems that’s 64bit, hence 8bytes.

But there’s another thing that lists do: They over-allocate. Otherwise list.append would be an O(n) operation always – to make it amortized O(1) (much faster!!!) it over-allocates. But now it has to keep track of the allocated size and the filled size (tuples only need to store one size, because allocated and filled size are always identical). That means each list has to store another “size” which on 64bit systems is a 64bit integer, again 8 bytes.

So lists need at least 16 bytes more memory than tuples. Why did I say “at least”? Because of the over-allocation. Over-allocation means it allocates more space than needed. However, the amount of over-allocation depends on “how” you create the list and the append/deletion history:

>>> l = [1,2,3]
>>> l.__sizeof__()
64
>>> l.append(4)  # triggers re-allocation (with over-allocation), because the original list is full
>>> l.__sizeof__()
96

>>> l = []
>>> l.__sizeof__()
40
>>> l.append(1)  # re-allocation with over-allocation
>>> l.__sizeof__()
72
>>> l.append(2)  # no re-alloc
>>> l.append(3)  # no re-alloc
>>> l.__sizeof__()
72
>>> l.append(4)  # still has room, so no over-allocation needed (yet)
>>> l.__sizeof__()
72

Images

I decided to create some images to accompany the explanation above. Maybe these are helpful

This is how it (schematically) is stored in memory in your example. I highlighted the differences with red (free-hand) cycles:

That’s actually just an approximation because int objects are also Python objects and CPython even reuses small integers, so a probably more accurate representation (although not as readable) of the objects in memory would be:

Useful links:

Note that __sizeof__ doesn’t really return the “correct” size! It only returns the size of the stored values. However when you use sys.getsizeof the result is different:

>>> import sys
>>> l = [1,2,3]
>>> t = (1, 2, 3)
>>> sys.getsizeof(l)
88
>>> sys.getsizeof(t)
72

There are 24 “extra” bytes. These are real, that’s the garbage collector overhead that isn’t accounted for in the __sizeof__ method. That’s because you’re generally not supposed to use magic methods directly – use the functions that know how to handle them, in this case: sys.getsizeof (which actually adds the GC overhead to the value returned from __sizeof__).


回答 1

我将更深入地研究CPython代码库,以便我们可以看到大小的实际计算方式。在您的特定示例中没有执行过度分配,因此我不会赘述

我将在这里使用64位值。


lists 的大小由以下函数计算得出list_sizeof

static PyObject *
list_sizeof(PyListObject *self)
{
    Py_ssize_t res;

    res = _PyObject_SIZE(Py_TYPE(self)) + self->allocated * sizeof(void*);
    return PyInt_FromSsize_t(res);
}

Py_TYPE(self)是一个抓取ob_typeself(返回PyList_Type),而 _PyObject_SIZE另一种宏抓斗tp_basicsize从该类型。tp_basicsize计算为实例结构sizeof(PyListObject)在哪里PyListObject

PyListObject结构包含三个字段:

PyObject_VAR_HEAD     # 24 bytes 
PyObject **ob_item;   #  8 bytes
Py_ssize_t allocated; #  8 bytes

这些内容有评论(我将它们修剪掉)以解释它们的含义,请点击上面的链接阅读它们。PyObject_VAR_HEAD扩展到3个8字节字段(ob_refcountob_typeob_size),所以一个24字节的贡献。

所以现在res是:

sizeof(PyListObject) + self->allocated * sizeof(void*)

要么:

40 + self->allocated * sizeof(void*)

如果列表实例具有已分配的元素。第二部分计算他们的贡献。self->allocated顾名思义,它保存分配的元素数。

没有任何元素,列表的大小计算为:

>>> [].__sizeof__()
40

即实例结构的大小。


tuple对象没有定义tuple_sizeof函数。而是使用它们object_sizeof来计算大小:

static PyObject *
object_sizeof(PyObject *self, PyObject *args)
{
    Py_ssize_t res, isize;

    res = 0;
    isize = self->ob_type->tp_itemsize;
    if (isize > 0)
        res = Py_SIZE(self) * isize;
    res += self->ob_type->tp_basicsize;

    return PyInt_FromSsize_t(res);
}

lists一样,它获取tp_basicsize和,如果对象具有非零值tp_itemsize(意味着它具有可变长度的实例),它将乘以元组中的项数(通过Py_SIZEtp_itemsize

tp_basicsize再次使用sizeof(PyTupleObject)其中的 PyTupleObject结构包含

PyObject_VAR_HEAD       # 24 bytes 
PyObject *ob_item[1];   # 8  bytes

因此,没有任何元素(即Py_SIZEreturn 0),空元组的大小等于sizeof(PyTupleObject)

>>> ().__sizeof__()
24

?? 嗯,这里是我还没有找到一个解释,一个古怪tp_basicsizetuples的实际计算公式如下:

sizeof(PyTupleObject) - sizeof(PyObject *)

为什么8要从中删除其他字节tp_basicsize是我一直无法找到的。(有关可能的解释,请参阅MSeifert的评论)


但是,这基本上是您特定示例中的区别list还会保留许多已分配的元素,这有助于确定何时再次过度分配。

现在,当添加其他元素时,列表确实会执行此过度分配以实现O(1)追加。由于MSeifert的封面很好地覆盖了他的答案,因此尺寸更大。

I’ll take a deeper dive into the CPython codebase so we can see how the sizes are actually calculated. In your specific example, no over-allocations have been performed, so I won’t touch on that.

I’m going to use 64-bit values here, as you are.


The size for lists is calculated from the following function, list_sizeof:

static PyObject *
list_sizeof(PyListObject *self)
{
    Py_ssize_t res;

    res = _PyObject_SIZE(Py_TYPE(self)) + self->allocated * sizeof(void*);
    return PyInt_FromSsize_t(res);
}

Here Py_TYPE(self) is a macro that grabs the ob_type of self (returning PyList_Type) while _PyObject_SIZE is another macro that grabs tp_basicsize from that type. tp_basicsize is calculated as sizeof(PyListObject) where PyListObject is the instance struct.

The PyListObject structure has three fields:

PyObject_VAR_HEAD     # 24 bytes 
PyObject **ob_item;   #  8 bytes
Py_ssize_t allocated; #  8 bytes

these have comments (which I trimmed) explaining what they are, follow the link above to read them. PyObject_VAR_HEAD expands into three 8 byte fields (ob_refcount, ob_type and ob_size) so a 24 byte contribution.

So for now res is:

sizeof(PyListObject) + self->allocated * sizeof(void*)

or:

40 + self->allocated * sizeof(void*)

If the list instance has elements that are allocated. the second part calculates their contribution. self->allocated, as it’s name implies, holds the number of allocated elements.

Without any elements, the size of lists is calculated to be:

>>> [].__sizeof__()
40

i.e the size of the instance struct.


tuple objects don’t define a tuple_sizeof function. Instead, they use object_sizeof to calculate their size:

static PyObject *
object_sizeof(PyObject *self, PyObject *args)
{
    Py_ssize_t res, isize;

    res = 0;
    isize = self->ob_type->tp_itemsize;
    if (isize > 0)
        res = Py_SIZE(self) * isize;
    res += self->ob_type->tp_basicsize;

    return PyInt_FromSsize_t(res);
}

This, as for lists, grabs the tp_basicsize and, if the object has a non-zero tp_itemsize (meaning it has variable-length instances), it multiplies the number of items in the tuple (which it gets via Py_SIZE) with tp_itemsize.

tp_basicsize again uses sizeof(PyTupleObject) where the PyTupleObject struct contains:

PyObject_VAR_HEAD       # 24 bytes 
PyObject *ob_item[1];   # 8  bytes

So, without any elements (that is, Py_SIZE returns 0) the size of empty tuples is equal to sizeof(PyTupleObject):

>>> ().__sizeof__()
24

huh? Well, here’s an oddity which I haven’t found an explanation for, the tp_basicsize of tuples is actually calculated as follows:

sizeof(PyTupleObject) - sizeof(PyObject *)

why an additional 8 bytes is removed from tp_basicsize is something I haven’t been able to find out. (See MSeifert’s comment for a possible explanation)


But, this is basically the difference in your specific example. lists also keep around a number of allocated elements which helps determine when to over-allocate again.

Now, when additional elements are added, lists do indeed perform this over-allocation in order to achieve O(1) appends. This results in greater sizes as MSeifert’s covers nicely in his answer.


回答 2

MSeifert的答案涵盖了广泛的范围;为简单起见,您可以想到:

tuple是一成不变的。一旦设置,您将无法更改。因此,您预先知道需要为该对象分配多少内存。

list易变。您可以在其中添加或删除项目。它必须知道它的大小(用于内部隐含)。根据需要调整大小。

没有免费的餐点 -这些功能需要付费。因此,列表的内存开销。

MSeifert answer covers it broadly; to keep it simple you can think of:

tuple is immutable. Once it set, you can’t change it. So you know in advance how much memory you need to allocate for that object.

list is mutable. You can add or remove items to or from it. It has to know the size of it (for internal impl.). It resizes as needed.

There are no free meals – these capabilities comes with a cost. Hence the overhead in memory for lists.


回答 3

元组的大小是有前缀的,这意味着在元组初始化时,解释器会为所包含的数据分配足够的空间,这就是它的结尾,使其具有不变性(无法修改),而列表是可变对象,因此意味着动态分配内存,因此避免每次您追加或修改列表时都要分配空间(分配足够的空间来容纳已更改的数据并将数据复制到其中),它会为以后的追加,修改等分配更多的空间。总结。

The size of the tuple is prefixed, meaning at tuple initialization the interpreter allocate enough space for the contained data, and that’s the end of it, giving it’s immutable (can’t be modified), whereas a list is a mutable object hence implying dynamic allocation of memory, so to avoid allocating space each time you append or modify the list ( allocate enough space to contain the changed data and copy the data to it), it allocates additional space for future append, modifications, … that pretty much sums it up.


如何将元组列表转换为多个列表?

问题:如何将元组列表转换为多个列表?

假设我有一个元组列表,并且我想转换为多个列表。

例如,元组列表是

[(1,2),(3,4),(5,6),]

Python中是否有任何内置函数可以将其转换为:

[1,3,5],[2,4,6]

这可以是一个简单的程序。但是我只是对Python中存在这种内置函数感到好奇。

Suppose I have a list of tuples and I want to convert to multiple lists.

For example, the list of tuples is

[(1,2),(3,4),(5,6),]

Is there any built-in function in Python that convert it to:

[1,3,5],[2,4,6]

This can be a simple program. But I am just curious about the existence of such built-in function in Python.


回答 0

内置功能zip()几乎可以满足您的需求:

>>> zip(*[(1, 2), (3, 4), (5, 6)])
[(1, 3, 5), (2, 4, 6)]

唯一的区别是您得到元组而不是列表。您可以使用将它们转换为列表

map(list, zip(*[(1, 2), (3, 4), (5, 6)]))

The built-in function zip() will almost do what you want:

>>> zip(*[(1, 2), (3, 4), (5, 6)])
[(1, 3, 5), (2, 4, 6)]

The only difference is that you get tuples instead of lists. You can convert them to lists using

map(list, zip(*[(1, 2), (3, 4), (5, 6)]))

回答 1

python docs

zip()与*运算符结合可用于解压缩列表:

具体的例子:

>>> zip((1,3,5),(2,4,6))
[(1, 2), (3, 4), (5, 6)]
>>> zip(*[(1, 2), (3, 4), (5, 6)])
[(1, 3, 5), (2, 4, 6)]

或者,如果您确实想要列表:

>>> map(list, zip(*[(1, 2), (3, 4), (5, 6)]))
[[1, 3, 5], [2, 4, 6]]

From the python docs:

zip() in conjunction with the * operator can be used to unzip a list:

Specific example:

>>> zip((1,3,5),(2,4,6))
[(1, 2), (3, 4), (5, 6)]
>>> zip(*[(1, 2), (3, 4), (5, 6)])
[(1, 3, 5), (2, 4, 6)]

Or, if you really want lists:

>>> map(list, zip(*[(1, 2), (3, 4), (5, 6)]))
[[1, 3, 5], [2, 4, 6]]

回答 2

用:

a = [(1,2),(3,4),(5,6),]    
b = zip(*a)
>>> [(1, 3, 5), (2, 4, 6)]

Use:

a = [(1,2),(3,4),(5,6),]    
b = zip(*a)
>>> [(1, 3, 5), (2, 4, 6)]

回答 3

franklsf95在回答中选择了性能,因此选择list.append(),但是它们并不是最佳的。

添加列表理解后,我得到以下结果:

def t1(zs):
    xs, ys = zip(*zs)
    return xs, ys

def t2(zs):
    xs, ys = [], []
    for x, y in zs:
        xs.append(x)
        ys.append(y)
    return xs, ys

def t3(zs):
    xs, ys = [x for x, y in zs], [y for x, y in zs]
    return xs, ys

if __name__ == '__main__':
    from timeit import timeit
    setup_string='''\
N = 2000000
xs = list(range(1, N))
ys = list(range(N+1, N*2))
zs = list(zip(xs, ys))
from __main__ import t1, t2, t3
'''
    print(f'zip:\t\t{timeit('t1(zs)', setup=setup_string, number=1000)}')
    print(f'append:\t\t{timeit('t2(zs)', setup=setup_string, number=1000)}')
    print(f'list comp:\t{timeit('t3(zs)', setup=setup_string, number=1000)}')

结果如下:

zip:            122.11585397789766
append:         356.44876132614047
list comp:      144.637765085659

因此,如果您追求性能,那么zip()尽管列表理解并不太落后,但您可能应该使用。append相比较而言,的效果实际上很差。

franklsf95 goes for performance in his answer and opts for list.append(), but they are not optimal.

Adding list comprehensions, I ended up with the following:

def t1(zs):
    xs, ys = zip(*zs)
    return xs, ys

def t2(zs):
    xs, ys = [], []
    for x, y in zs:
        xs.append(x)
        ys.append(y)
    return xs, ys

def t3(zs):
    xs, ys = [x for x, y in zs], [y for x, y in zs]
    return xs, ys

if __name__ == '__main__':
    from timeit import timeit
    setup_string='''\
N = 2000000
xs = list(range(1, N))
ys = list(range(N+1, N*2))
zs = list(zip(xs, ys))
from __main__ import t1, t2, t3
'''
    print(f'zip:\t\t{timeit('t1(zs)', setup=setup_string, number=1000)}')
    print(f'append:\t\t{timeit('t2(zs)', setup=setup_string, number=1000)}')
    print(f'list comp:\t{timeit('t3(zs)', setup=setup_string, number=1000)}')

This gave the result:

zip:            122.11585397789766
append:         356.44876132614047
list comp:      144.637765085659

So if you are after performance, you should probably use zip() although list comprehensions are not too far behind. The performance of append is actually pretty poor in comparison.


回答 4

尽管使用的*zip是Pythonic,但以下代码具有更好的性能:

xs, ys = [], []
for x, y in zs:
    xs.append(x)
    ys.append(y)

同样,当原始列表zs为空时,*zip将引发,但是此代码可以正确处理。

我只是进行了一个快速实验,结果如下:

Using *zip:     1.54701614s
Using append:   0.52687597s

多次运行,append比运行速度快3到4倍zip!测试脚本在这里:

#!/usr/bin/env python3
import time

N = 2000000
xs = list(range(1, N))
ys = list(range(N+1, N*2))
zs = list(zip(xs, ys))

t1 = time.time()

xs_, ys_ = zip(*zs)
print(len(xs_), len(ys_))

t2 = time.time()

xs_, ys_ = [], []
for x, y in zs:
    xs_.append(x)
    ys_.append(y)
print(len(xs_), len(ys_))

t3 = time.time()

print('Using *zip:\t{:.8f}s'.format(t2 - t1))
print('Using append:\t{:.8f}s'.format(t3 - t2))

我的Python版本:

Python 3.6.3 (default, Oct 24 2017, 12:18:40)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

Despite *zip being more Pythonic, the following code has much better performance:

xs, ys = [], []
for x, y in zs:
    xs.append(x)
    ys.append(y)

Also, when the original list zs is empty, *zip will raise, but this code can properly handle.

I just ran a quick experiment, and here is the result:

Using *zip:     1.54701614s
Using append:   0.52687597s

Running it multiple times, append is 3x – 4x faster than zip! The test script is here:

#!/usr/bin/env python3
import time

N = 2000000
xs = list(range(1, N))
ys = list(range(N+1, N*2))
zs = list(zip(xs, ys))

t1 = time.time()

xs_, ys_ = zip(*zs)
print(len(xs_), len(ys_))

t2 = time.time()

xs_, ys_ = [], []
for x, y in zs:
    xs_.append(x)
    ys_.append(y)
print(len(xs_), len(ys_))

t3 = time.time()

print('Using *zip:\t{:.8f}s'.format(t2 - t1))
print('Using append:\t{:.8f}s'.format(t3 - t2))

My Python Version:

Python 3.6.3 (default, Oct 24 2017, 12:18:40)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

回答 5

除了Claudiu的答案,您还可以使用:

>>>a, b = map(list, zip(*[(1, 2), (3, 4), (5, 6)]))
>>>a
[1,3,5]
>>>b
[2,4,6]

根据@Peyman mohseni kiasari编辑

In addition to Claudiu’s answer, you can use:

>>>a, b = map(list, zip(*[(1, 2), (3, 4), (5, 6)]))
>>>a
[1,3,5]
>>>b
[2,4,6]

Edited according to @Peyman mohseni kiasari


回答 6

添加到Claudiu和Claudiu的答案中,并且由于需要在python 3中从itertools导入map,因此您还可以使用以下列表推导:

[[*x] for x in zip(*[(1,2),(3,4),(5,6)])]
>>> [[1, 3, 5], [2, 4, 6]]

Adding to Claudiu’s and Claudiu’s answer and since map needs to be imported from itertools in python 3, you also use a list comprehension like:

[[*x] for x in zip(*[(1,2),(3,4),(5,6)])]
>>> [[1, 3, 5], [2, 4, 6]]

Python Pandas将列表插入单元格

问题:Python Pandas将列表插入单元格

我有一个列表“ abc”和一个数据框“ df”:

abc = ['foo', 'bar']
df =
    A  B
0  12  NaN
1  23  NaN

我想将列表插入单元格1B中,所以我想要这个结果:

    A  B
0  12  NaN
1  23  ['foo', 'bar']

我能做到吗?

1)如果我使用这个:

df.ix[1,'B'] = abc

我收到以下错误消息:

ValueError: Must have equal len keys and value when setting with an iterable

因为它尝试将列表(具有两个元素)插入行/列而不插入单元格。

2)如果我使用这个:

df.ix[1,'B'] = [abc]

然后插入一个只有一个元素的列表,即“ abc”列表( [['foo', 'bar']])。

3)如果我使用这个:

df.ix[1,'B'] = ', '.join(abc)

然后插入一个字符串:( foo, bar),但不插入列表。

4)如果我使用这个:

df.ix[1,'B'] = [', '.join(abc)]

然后插入一个列表,但只有一个元素(['foo, bar']),但没有两个我想要的元素(['foo', 'bar'])。

感谢帮助!


编辑

我的新数据框和旧列表:

abc = ['foo', 'bar']
df2 =
    A    B         C
0  12  NaN      'bla'
1  23  NaN  'bla bla'

另一个数据框:

df3 =
    A    B         C                    D
0  12  NaN      'bla'  ['item1', 'item2']
1  23  NaN  'bla bla'        [11, 12, 13]

我想将“ abc”列表插入df2.loc[1,'B']和/或df3.loc[1,'B']

如果数据框仅包含具有整数值和/或NaN值和/或列表值的列,则将列表插入到单元格中的效果很好。如果数据框仅包含具有字符串值和/或NaN值和/或列表值的列,则将列表插入到单元格中的效果很好。但是,如果数据框具有包含整数和字符串值的列以及其他列,那么如果我使用此错误消息,则会出现错误消息:df2.loc[1,'B'] = abcdf3.loc[1,'B'] = abc

另一个数据框:

df4 =
          A     B
0      'bla'  NaN
1  'bla bla'  NaN

这些插入可以完美地工作:df.loc[1,'B'] = abcdf4.loc[1,'B'] = abc

I have a list ‘abc’ and a dataframe ‘df’:

abc = ['foo', 'bar']
df =
    A  B
0  12  NaN
1  23  NaN

I want to insert the list into cell 1B, so I want this result:

    A  B
0  12  NaN
1  23  ['foo', 'bar']

Ho can I do that?

1) If I use this:

df.ix[1,'B'] = abc

I get the following error message:

ValueError: Must have equal len keys and value when setting with an iterable

because it tries to insert the list (that has two elements) into a row / column but not into a cell.

2) If I use this:

df.ix[1,'B'] = [abc]

then it inserts a list that has only one element that is the ‘abc’ list ( [['foo', 'bar']] ).

3) If I use this:

df.ix[1,'B'] = ', '.join(abc)

then it inserts a string: ( foo, bar ) but not a list.

4) If I use this:

df.ix[1,'B'] = [', '.join(abc)]

then it inserts a list but it has only one element ( ['foo, bar'] ) but not two as I want ( ['foo', 'bar'] ).

Thanks for help!


EDIT

My new dataframe and the old list:

abc = ['foo', 'bar']
df2 =
    A    B         C
0  12  NaN      'bla'
1  23  NaN  'bla bla'

Another dataframe:

df3 =
    A    B         C                    D
0  12  NaN      'bla'  ['item1', 'item2']
1  23  NaN  'bla bla'        [11, 12, 13]

I want insert the ‘abc’ list into df2.loc[1,'B'] and/or df3.loc[1,'B'].

If the dataframe has columns only with integer values and/or NaN values and/or list values then inserting a list into a cell works perfectly. If the dataframe has columns only with string values and/or NaN values and/or list values then inserting a list into a cell works perfectly. But if the dataframe has columns with integer and string values and other columns then the error message appears if I use this: df2.loc[1,'B'] = abc or df3.loc[1,'B'] = abc.

Another dataframe:

df4 =
          A     B
0      'bla'  NaN
1  'bla bla'  NaN

These inserts work perfectly: df.loc[1,'B'] = abc or df4.loc[1,'B'] = abc.


回答 0

由于自0.21.0版set_value开始不推荐使用,因此您现在应该使用at。它可以插入一个列表的小区没有抚养ValueErrorloc一样。我认为这是因为at 总是引用单个值,而loc可以引用值以及行和列。

df = pd.DataFrame(data={'A': [1, 2, 3], 'B': ['x', 'y', 'z']})

df.at[1, 'B'] = ['m', 'n']

df =
    A   B
0   1   x
1   2   [m, n]
2   3   z

您还需要确保要插入的具有dtype=object。例如

>>> df = pd.DataFrame(data={'A': [1, 2, 3], 'B': [1,2,3]})
>>> df.dtypes
A    int64
B    int64
dtype: object

>>> df.at[1, 'B'] = [1, 2, 3]
ValueError: setting an array element with a sequence

>>> df['B'] = df['B'].astype('object')
>>> df.at[1, 'B'] = [1, 2, 3]
>>> df
   A          B
0  1          1
1  2  [1, 2, 3]
2  3          3

Since set_value has been deprecated since version 0.21.0, you should now use at. It can insert a list into a cell without raising a ValueError as loc does. I think this is because at always refers to a single value, while loc can refer to values as well as rows and columns.

df = pd.DataFrame(data={'A': [1, 2, 3], 'B': ['x', 'y', 'z']})

df.at[1, 'B'] = ['m', 'n']

df =
    A   B
0   1   x
1   2   [m, n]
2   3   z

You also need to make sure the column you are inserting into has dtype=object. For example

>>> df = pd.DataFrame(data={'A': [1, 2, 3], 'B': [1,2,3]})
>>> df.dtypes
A    int64
B    int64
dtype: object

>>> df.at[1, 'B'] = [1, 2, 3]
ValueError: setting an array element with a sequence

>>> df['B'] = df['B'].astype('object')
>>> df.at[1, 'B'] = [1, 2, 3]
>>> df
   A          B
0  1          1
1  2  [1, 2, 3]
2  3          3

回答 1

df3.set_value(1, 'B', abc)适用于任何数据框。注意列“ B”的数据类型。例如。不能将列表插入浮点列,在这种情况下df['B'] = df['B'].astype(object)可以提供帮助。

df3.set_value(1, 'B', abc) works for any dataframe. Take care of the data type of column ‘B’. Eg. a list can not be inserted into a float column, at that case df['B'] = df['B'].astype(object) can help.


回答 2

熊猫> = 0.21

set_value已不推荐使用。 现在,您可以使用DataFrame.at按标签DataFrame.iat设置和按整数位置设置。

使用at/ 设置单元格值iat

# Setup
df = pd.DataFrame({'A': [12, 23], 'B': [['a', 'b'], ['c', 'd']]})
df

    A       B
0  12  [a, b]
1  23  [c, d]

df.dtypes

A     int64
B    object
dtype: object

如果要将“ B”第二行中的值设置为一些新列表,请使用DataFrane.at

df.at[1, 'B'] = ['m', 'n']
df

    A       B
0  12  [a, b]
1  23  [m, n]

您也可以使用 DataFrame.iat

df.iat[1, df.columns.get_loc('B')] = ['m', 'n']
df

    A       B
0  12  [a, b]
1  23  [m, n]

如果得到了ValueError: setting an array element with a sequence怎么办?

我将尝试通过以下方式重现该内容:

df

    A   B
0  12 NaN
1  23 NaN

df.dtypes

A      int64
B    float64
dtype: object

df.at[1, 'B'] = ['m', 'n']
# ValueError: setting an array element with a sequence.

这是因为您的对象是float64dtype,而列表是objects,所以那里不匹配。在这种情况下,您要做的是先将列转换为对象。

df['B'] = df['B'].astype(object)
df.dtypes

A     int64
B    object
dtype: object

然后,它起作用:

df.at[1, 'B'] = ['m', 'n']
df

    A       B
0  12     NaN
1  23  [m, n]

可能,但是哈基

更古怪的是,我发现DataFrame.loc如果传递嵌套列表,您可以破解以实现相似的目的。

df.loc[1, 'B'] = [['m'], ['n'], ['o'], ['p']]
df

    A             B
0  12        [a, b]
1  23  [m, n, o, p]

您可以在这里阅读更多有关其工作原理的信息。

Pandas >= 0.21

set_value has been deprecated. You can now use DataFrame.at to set by label, and DataFrame.iat to set by integer position.

Setting Cell Values with at/iat

# Setup
df = pd.DataFrame({'A': [12, 23], 'B': [['a', 'b'], ['c', 'd']]})
df

    A       B
0  12  [a, b]
1  23  [c, d]

df.dtypes

A     int64
B    object
dtype: object

If you want to set a value in second row of the “B” to some new list, use DataFrane.at:

df.at[1, 'B'] = ['m', 'n']
df

    A       B
0  12  [a, b]
1  23  [m, n]

You can also set by integer position using DataFrame.iat

df.iat[1, df.columns.get_loc('B')] = ['m', 'n']
df

    A       B
0  12  [a, b]
1  23  [m, n]

What if I get ValueError: setting an array element with a sequence?

I’ll try to reproduce this with:

df

    A   B
0  12 NaN
1  23 NaN

df.dtypes

A      int64
B    float64
dtype: object

df.at[1, 'B'] = ['m', 'n']
# ValueError: setting an array element with a sequence.

This is because of a your object is of float64 dtype, whereas lists are objects, so there’s a mismatch there. What you would have to do in this situation is to convert the column to object first.

df['B'] = df['B'].astype(object)
df.dtypes

A     int64
B    object
dtype: object

Then, it works:

df.at[1, 'B'] = ['m', 'n']
df

    A       B
0  12     NaN
1  23  [m, n]

Possible, But Hacky

Even more wacky, I’ve found you can hack through DataFrame.loc to achieve something similar if you pass nested lists.

df.loc[1, 'B'] = [['m'], ['n'], ['o'], ['p']]
df

    A             B
0  12        [a, b]
1  23  [m, n, o, p]

You can read more about why this works here.


回答 3

如本篇文章中提到的熊猫:如何在数据框中存储列表?; 数据帧中的dtype可能会影响结果,以及调用数据帧或不将其分配给它。

As mentionned in this post pandas: how to store a list in a dataframe?; the dtypes in the dataframe may influence the results, as well as calling a dataframe or not to be assigned to.


回答 4

快速解决

只需将列表括在新列表中,就像在下面的数据框中对col2所做的那样。它起作用的原因是python获取(列表的)外部列表,并将其转换为列,就好像它包含普通标量项目一样,在我们的例子中是列表,而不是普通标量。

mydict={'col1':[1,2,3],'col2':[[1, 4], [2, 5], [3, 6]]}
data=pd.DataFrame(mydict)
data


   col1     col2
0   1       [1, 4]
1   2       [2, 5]
2   3       [3, 6]

Quick work around

Simply enclose the list within a new list, as done for col2 in the data frame below. The reason it works is that python takes the outer list (of lists) and converts it into a column as if it were containing normal scalar items, which is lists in our case and not normal scalars.

mydict={'col1':[1,2,3],'col2':[[1, 4], [2, 5], [3, 6]]}
data=pd.DataFrame(mydict)
data


   col1     col2
0   1       [1, 4]
1   2       [2, 5]
2   3       [3, 6]

回答 5

也得到

ValueError: Must have equal len keys and value when setting with an iterable

在我的情况下,使用.at而不是.loc并没有任何区别,但是强制使用dataframe列的数据类型可以解决问题:

df['B'] = df['B'].astype(object)

然后,我可以将列表,numpy数组和所有类型的东西设置为数据帧中的单个单元格值。

Also getting

ValueError: Must have equal len keys and value when setting with an iterable,

using .at rather than .loc did not make any difference in my case, but enforcing the datatype of the dataframe column did the trick:

df['B'] = df['B'].astype(object)

Then I could set lists, numpy array and all sorts of things as single cell values in my dataframes.


检查列表中的所有元素是否唯一

问题:检查列表中的所有元素是否唯一

检查列表中所有元素是否唯一的最佳方法(与传统方法一样最佳)是什么?

我目前使用的方法Counter是:

>>> x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
>>> counter = Counter(x)
>>> for values in counter.itervalues():
        if values > 1: 
            # do something

我可以做得更好吗?

What is the best way (best as in the conventional way) of checking whether all elements in a list are unique?

My current approach using a Counter is:

>>> x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
>>> counter = Counter(x)
>>> for values in counter.itervalues():
        if values > 1: 
            # do something

Can I do better?


回答 0

不是最高效的,而是简单明了的:

if len(x) > len(set(x)):
   pass # do something

短名单可能不会有太大的不同。

Not the most efficient, but straight forward and concise:

if len(x) > len(set(x)):
   pass # do something

Probably won’t make much of a difference for short lists.


回答 1

这里有两个班轮,它们也会提前退出:

>>> def allUnique(x):
...     seen = set()
...     return not any(i in seen or seen.add(i) for i in x)
...
>>> allUnique("ABCDEF")
True
>>> allUnique("ABACDEF")
False

如果x的元素不可散列,那么您将不得不使用以下列表seen

>>> def allUnique(x):
...     seen = list()
...     return not any(i in seen or seen.append(i) for i in x)
...
>>> allUnique([list("ABC"), list("DEF")])
True
>>> allUnique([list("ABC"), list("DEF"), list("ABC")])
False

Here is a two-liner that will also do early exit:

>>> def allUnique(x):
...     seen = set()
...     return not any(i in seen or seen.add(i) for i in x)
...
>>> allUnique("ABCDEF")
True
>>> allUnique("ABACDEF")
False

If the elements of x aren’t hashable, then you’ll have to resort to using a list for seen:

>>> def allUnique(x):
...     seen = list()
...     return not any(i in seen or seen.append(i) for i in x)
...
>>> allUnique([list("ABC"), list("DEF")])
True
>>> allUnique([list("ABC"), list("DEF"), list("ABC")])
False

回答 2

提前退出的解决方案可能是

def unique_values(g):
    s = set()
    for x in g:
        if x in s: return False
        s.add(x)
    return True

但是对于小情况或如果提早退出并不常见,那么我希望len(x) != len(set(x))这是最快的方法。

An early-exit solution could be

def unique_values(g):
    s = set()
    for x in g:
        if x in s: return False
        s.add(x)
    return True

however for small cases or if early-exiting is not the common case then I would expect len(x) != len(set(x)) being the fastest method.


回答 3

为了速度:

import numpy as np
x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
np.unique(x).size == len(x)

for speed:

import numpy as np
x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
np.unique(x).size == len(x)

回答 4

如何将所有条目添加到集合中并检查其长度呢?

len(set(x)) == len(x)

How about adding all the entries to a set and checking its length?

len(set(x)) == len(x)

回答 5

替代a set,您可以使用dict

len({}.fromkeys(x)) == len(x)

Alternative to a set, you can use a dict.

len({}.fromkeys(x)) == len(x)

回答 6

完全使用排序和分组方式的另一种方法:

from itertools import groupby
is_unique = lambda seq: all(sum(1 for _ in x[1])==1 for x in groupby(sorted(seq)))

它需要排序,但是在第一个重复值上退出。

Another approach entirely, using sorted and groupby:

from itertools import groupby
is_unique = lambda seq: all(sum(1 for _ in x[1])==1 for x in groupby(sorted(seq)))

It requires a sort, but exits on the first repeated value.


回答 7

这是一个有趣的递归O(N 2)版本:

def is_unique(lst):
    if len(lst) > 1:
        return is_unique(s[1:]) and (s[0] not in s[1:])
    return True

Here is a recursive O(N2) version for fun:

def is_unique(lst):
    if len(lst) > 1:
        return is_unique(s[1:]) and (s[0] not in s[1:])
    return True

回答 8

这是递归的提前退出函数:

def distinct(L):
    if len(L) == 2:
        return L[0] != L[1]
    H = L[0]
    T = L[1:]
    if (H in T):
            return False
    else:
            return distinct(T)    

对于我来说,它足够快,而无需使用怪异的(慢速)转换,同时具有功能样式的方法。

Here is a recursive early-exit function:

def distinct(L):
    if len(L) == 2:
        return L[0] != L[1]
    H = L[0]
    T = L[1:]
    if (H in T):
            return False
    else:
            return distinct(T)    

It’s fast enough for me without using weird(slow) conversions while having a functional-style approach.


回答 9

这个怎么样

def is_unique(lst):
    if not lst:
        return True
    else:
        return Counter(lst).most_common(1)[0][1]==1

How about this

def is_unique(lst):
    if not lst:
        return True
    else:
        return Counter(lst).most_common(1)[0][1]==1

回答 10

您可以使用Yan的语法(len(x)> len(set(x))),但可以定义一个函数来代替set(x):

 def f5(seq, idfun=None): 
    # order preserving
    if idfun is None:
        def idfun(x): return x
    seen = {}
    result = []
    for item in seq:
        marker = idfun(item)
        # in old Python versions:
        # if seen.has_key(marker)
        # but in new ones:
        if marker in seen: continue
        seen[marker] = 1
        result.append(item)
    return result

并做len(x)> len(f5(x))。这样会很快,而且还能保留订单。

此处的代码来自:http//www.peterbe.com/plog/uniqifiers-benchmark

You can use Yan’s syntax (len(x) > len(set(x))), but instead of set(x), define a function:

 def f5(seq, idfun=None): 
    # order preserving
    if idfun is None:
        def idfun(x): return x
    seen = {}
    result = []
    for item in seq:
        marker = idfun(item)
        # in old Python versions:
        # if seen.has_key(marker)
        # but in new ones:
        if marker in seen: continue
        seen[marker] = 1
        result.append(item)
    return result

and do len(x) > len(f5(x)). This will be fast and is also order preserving.

Code there is taken from: http://www.peterbe.com/plog/uniqifiers-benchmark


回答 11

在Pandas数据框中使用类似的方法来测试列的内容是否包含唯一值:

if tempDF['var1'].size == tempDF['var1'].unique().size:
    print("Unique")
else:
    print("Not unique")

对我来说,这在包含一百万行的日期框架中的int变量上是瞬时的。

Using a similar approach in a Pandas dataframe to test if the contents of a column contains unique values:

if tempDF['var1'].size == tempDF['var1'].unique().size:
    print("Unique")
else:
    print("Not unique")

For me, this is instantaneous on an int variable in a dateframe containing over a million rows.


回答 12

以上所有答案都很好,但我更喜欢使用30秒内的Pythonall_unique示例

您需要set()在给定列表上使用来删除重复项,并将其长度与列表的长度进行比较。

def all_unique(lst):
  return len(lst) == len(set(lst))

True如果平面列表中的所有值均为unique,则返回,False否则返回

x = [1,2,3,4,5,6]
y = [1,2,2,3,4,5]
all_unique(x) # True
all_unique(y) # False

all answer above are good but i prefer to use all_unique example from 30 seconds of python

you need to use set() on the given list to remove duplicates, compare its length with the length of the list.

def all_unique(lst):
  return len(lst) == len(set(lst))

it returns True if all the values in a flat list are unique, False otherwise

x = [1,2,3,4,5,6]
y = [1,2,2,3,4,5]
all_unique(x) # True
all_unique(y) # False

回答 13

对于初学者:

def AllDifferent(s):
    for i in range(len(s)):
        for i2 in range(len(s)):
            if i != i2:
                if s[i] == s[i2]:
                    return False
    return True

For begginers:

def AllDifferent(s):
    for i in range(len(s)):
        for i2 in range(len(s)):
            if i != i2:
                if s[i] == s[i2]:
                    return False
    return True

在列表中移动项目?

问题:在列表中移动项目?

在Python中,如何将项目移至列表中的确定索引?

In Python, how do I move an item to a definite index in a list?


回答 0

使用insert列表的方法:

l = list(...)
l.insert(index, item)

另外,您可以使用切片符号:

l[index:index] = [item]

如果要将列表中已经存在的项目移动到指定位置,则必须将其删除并将其插入新位置:

l.insert(newindex, l.pop(oldindex))

Use the insert method of a list:

l = list(...)
l.insert(index, item)

Alternatively, you can use a slice notation:

l[index:index] = [item]

If you want to move an item that’s already in the list to the specified position, you would have to delete it and insert it at the new position:

l.insert(newindex, l.pop(oldindex))

回答 1

稍微短一点的解决方案,它只将项目移到末尾,而不是在任何地方:

l += [l.pop(0)]

例如:

>>> l = [1,2,3,4,5]
>>> l += [l.pop(0)]
>>> l
[2, 3, 4, 5, 1]

A slightly shorter solution, that only moves the item to the end, not anywhere is this:

l += [l.pop(0)]

For example:

>>> l = [1,2,3,4,5]
>>> l += [l.pop(0)]
>>> l
[2, 3, 4, 5, 1]

回答 2

如果您不知道商品的位置,则可能需要先找到索引:

old_index = list1.index(item)

然后移动它:

list1.insert(new_index, list1.pop(old_index))

或恕我直言,更干净的方法:

try:
  list1.remove(item)
  list1.insert(new_index, item)
except ValueError:
  pass

If you don’t know the position of the item, you may need to find the index first:

old_index = list1.index(item)

then move it:

list1.insert(new_index, list1.pop(old_index))

or IMHO a cleaner way:

try:
  list1.remove(item)
  list1.insert(new_index, item)
except ValueError:
  pass

回答 3

解决方案非常简单,但是您必须知道原始位置的索引和新位置的索引:

list1[index1],list1[index2]=list1[index2],list1[index1]

A solution very simple, but you have to know the index of the original position and the index of the new position:

list1[index1],list1[index2]=list1[index2],list1[index1]

回答 4

我介绍了一些方法,可以通过timeit在同一列表中移动项目。如果j> i,这是要使用的那些:

──────────┬────────┐
│14.4usec│x [i:i] = x.pop(j),│
│14.5usec│x [i:i] = [x.pop(j)]│
│15.2usec│x.insert(i,x.pop(j))│
──────────┴────────┘

这里是j <= i时要使用的那些:

┐──────────┬─────────┐
│14.4usec│x [i:i] = x [j] ,; del x [j]│
│14.4usec│x [i:i] = [x [j]]; del x [j]│
│15.4usec│x.insert(i,x [j]); del x [j]│
┘──────────┴─────────┘

如果仅使用几次,则不会有太大的区别,但是如果您进行繁重的工作(如手动分拣),则选择最快的就很重要。否则,我建议您只阅读您认为最易读的内容。

I profiled a few methods to move an item within the same list with timeit. Here are the ones to use if j>i:

┌──────────┬──────────────────────┐
│ 14.4usec │ x[i:i]=x.pop(j),     │
│ 14.5usec │ x[i:i]=[x.pop(j)]    │
│ 15.2usec │ x.insert(i,x.pop(j)) │
└──────────┴──────────────────────┘

and here the ones to use if j<=i:

┌──────────┬───────────────────────────┐
│ 14.4usec │ x[i:i]=x[j],;del x[j]     │
│ 14.4usec │ x[i:i]=[x[j]];del x[j]    │
│ 15.4usec │ x.insert(i,x[j]);del x[j] │
└──────────┴───────────────────────────┘

Not a huge difference if you only use it a few times, but if you do heavy stuff like manual sorting, it’s important to take the fastest one. Otherwise, I’d recommend just taking the one that you think is most readable.