标签归档:list

分配如何与Python列表切片一起使用?

问题:分配如何与Python列表切片一起使用?

Python文档说切片列表会返回一个新列表。
现在,如果返回了“新”列表,我将对“分配给切片”存在以下疑问

a = [1, 2, 3]
a[0:2] = [4, 5]
print a

现在的输出将是:

[4, 5, 3] 
  1. 返回内容的内容如何出现在表达式的左侧?
  2. 是的,我阅读了文档,并说有可能,因为切片列表会返回“新”列表,为什么要修改原始列表?我无法理解其背后的机制。

Python doc says that slicing a list returns a new list.
Now if a “new” list is being returned I’ve the following questions related to “Assignment to slices”

a = [1, 2, 3]
a[0:2] = [4, 5]
print a

Now the output would be:

[4, 5, 3] 
  1. How can something that is returning something come on the left side of expression?
  2. Yes, I read the docs and it says it is possible, now since slicing a list returns a “new” list, why is the original list being modified? I am not able to understand the mechanics behind it.

回答 0

您混淆了两个使用非常相似的语法的不同操作:

1)切片:

b = a[0:2]

这将复制的切片a并将其分配给b

2)切片分配:

a[0:2] = b

这会用的内容替换的切片。ab

尽管语法相似(我想是通过设计实现的!),但这是两个不同的操作。

You are confusing two distinct operation that use very similar syntax:

1) slicing:

b = a[0:2]

This makes a copy of the slice of a and assigns it to b.

2) slice assignment:

a[0:2] = b

This replaces the slice of a with the contents of b.

Although the syntax is similar (I imagine by design!), these are two different operations.


回答 1

当您a=运算符的左侧指定时,您使用的是Python的常规分配,该分配会更改a当前上下文中的名称以指向新值。这不会更改以前a指向的值。

通过a[0:2]=运算符的左侧指定,您可以告诉Python您想使用Slice Assignment。切片分配是列表的一种特殊语法,您可以在其中插入,删除或替换列表中的内容:

插入

>>> a = [1, 2, 3]
>>> a[0:0] = [-3, -2, -1, 0]
>>> a
[-3, -2, -1, 0, 1, 2, 3]

删除

>>> a
[-3, -2, -1, 0, 1, 2, 3]
>>> a[2:4] = []
>>> a
[-3, -2, 1, 2, 3]

更换

>>> a
[-3, -2, 1, 2, 3]
>>> a[:] = [1, 2, 3]
>>> a
[1, 2, 3]

注意:

切片的长度可以与分配序列的长度不同,从而在目标序列允许的情况下更改目标序列的长度。- 来源

切片分配提供类似于元组拆包的功能。例如,a[0:1] = [4, 5]等效于:

# Tuple Unpacking
a[0], a[1] = [4, 5]

使用元组拆包,您可以修改非顺序列表:

>>> a
[4, 5, 3]
>>> a[-1], a[0] = [7, 3]
>>> a
[3, 5, 7]

但是,元组拆包仅限于替换,因为您不能插入或删除元素。

在所有这些操作之前和之后,a是相同的确切列表。Python只是提供了不错的语法糖来就地修改列表。

When you specify a on the left side of the = operator, you are using Python’s normal assignment, which changes the name a in the current context to point to the new value. This does not change the previous value to which a was pointing.

By specifying a[0:2] on the left side of the = operator, you are telling Python you want to use Slice Assignment. Slice Assignment is a special syntax for lists, where you can insert, delete, or replace contents from a list:

Insertion:

>>> a = [1, 2, 3]
>>> a[0:0] = [-3, -2, -1, 0]
>>> a
[-3, -2, -1, 0, 1, 2, 3]

Deletion:

>>> a
[-3, -2, -1, 0, 1, 2, 3]
>>> a[2:4] = []
>>> a
[-3, -2, 1, 2, 3]

Replacement:

>>> a
[-3, -2, 1, 2, 3]
>>> a[:] = [1, 2, 3]
>>> a
[1, 2, 3]

Note:

The length of the slice may be different from the length of the assigned sequence, thus changing the length of the target sequence, if the target sequence allows it. – source

Slice Assignment provides similar function to Tuple Unpacking. For example, a[0:1] = [4, 5] is equivalent to:

# Tuple Unpacking
a[0], a[1] = [4, 5]

With Tuple Unpacking, you can modify non-sequential lists:

>>> a
[4, 5, 3]
>>> a[-1], a[0] = [7, 3]
>>> a
[3, 5, 7]

However, tuple unpacking is limited to replacement, as you cannot insert or remove elements.

Before and after all these operations, a is the same exact list. Python simply provides nice syntactic sugar to modify a list in-place.


回答 2

我之前碰到过同样的问题,它与语言规范有关。根据分配陈述

  1. 如果分配的左侧是subscription,则Python将调用__setitem__该对象。a[i] = x等价于a.__setitem__(i, x)

  2. 如果赋值的左边是slice,Python也将调用__setitem__,但是使用不同的参数: a[1:4]=[1,2,3]等于 a.__setitem__(slice(1,4,None), [1,2,3])

因此,“ =”左侧的列表切片的行为有所不同。

I came across the same question before and it’s related to the language specification. According to assignment-statements,

  1. If the left side of assignment is subscription, Python will call __setitem__ on that object. a[i] = x is equivalent to a.__setitem__(i, x).

  2. If the left side of assignment is slice, Python will also call __setitem__, but with different arguments: a[1:4]=[1,2,3] is equivalent to a.__setitem__(slice(1,4,None), [1,2,3])

That’s why list slice on the left side of ‘=’ behaves differently.


回答 3

通过在分配操作的左侧进行切片,可以指定要分配给哪些项目。

By slicing on the left hand side of an assignment operation, you are specifying which items to assign to.


在列表中的特定索引处插入元素,然后返回更新后的列表

问题:在列表中的特定索引处插入元素,然后返回更新后的列表

我有这个:

>>> a = [1, 2, 4]
>>> print a
[1, 2, 4]

>>> print a.insert(2, 3)
None

>>> print a
[1, 2, 3, 4]

>>> b = a.insert(3, 6)
>>> print b
None

>>> print a
[1, 2, 3, 6, 4]

有没有一种方法可以获取更新的列表作为结果,而不是就地更新原始列表?

I have this:

>>> a = [1, 2, 4]
>>> print a
[1, 2, 4]

>>> print a.insert(2, 3)
None

>>> print a
[1, 2, 3, 4]

>>> b = a.insert(3, 6)
>>> print b
None

>>> print a
[1, 2, 3, 6, 4]

Is there a way I can get the updated list as the result, instead of updating the original list in place?


回答 0

l.insert(index, obj)实际上不返回任何东西。它只是更新列表。

正如ATO所说,您可以做到b = a[:index] + [obj] + a[index:]。但是,另一种方法是:

a = [1, 2, 4]
b = a[:]
b.insert(2, 3)

l.insert(index, obj) doesn’t actually return anything. It just updates the list.

As ATO said, you can do b = a[:index] + [obj] + a[index:]. However, another way is:

a = [1, 2, 4]
b = a[:]
b.insert(2, 3)

回答 1

最高效的方法

您也可以使用列表中的切片索引插入元素。例如:

>>> a = [1, 2, 4]
>>> insert_at = 2  # Index at which you want to insert item

>>> b = a[:]   # Created copy of list "a" as "b".
               # Skip this step if you are ok with modifying the original list

>>> b[insert_at:insert_at] = [3]  # Insert "3" within "b"
>>> b
[1, 2, 3, 4]

在给定索引处将多个元素一起插入,您要做的就是使用list要插入的多个元素中的一个。例如:

>>> a = [1, 2, 4]
>>> insert_at = 2   # Index starting from which multiple elements will be inserted

# List of elements that you want to insert together at "index_at" (above) position
>>> insert_elements = [3, 5, 6]

>>> a[insert_at:insert_at] = insert_elements
>>> a   # [3, 5, 6] are inserted together in `a` starting at index "2"
[1, 2, 3, 5, 6, 4]

使用列表理解的替代方法 (但性能很慢)

作为替代方案,它可以使用来实现清单理解enumerate过。(但是请不要这样做。这只是为了说明)

>>> a = [1, 2, 4]
>>> insert_at = 2

>>> b = [y for i, x in enumerate(a) for y in ((3, x) if i == insert_at else (x, ))]
>>> b
[1, 2, 3, 4]

所有解决方案的性能比较

以下timeit是所有答案与Python 3.4.5的1000个元素列表的比较:

  • 使用切片插入的地雷解答 -最快(每个循环3.08微秒)

     mquadri$ python3 -m timeit -s "a = list(range(1000))" "b = a[:]; b[500:500] = [3]"
     100000 loops, best of 3: 3.08 µsec per loop
  • ATOzTOA接受的基于切片列表合并的答案 -秒(每个循环6.71微秒)

     mquadri$ python3 -m timeit -s "a = list(range(1000))" "b = a[:500] + [3] + a[500:]"
     100000 loops, best of 3: 6.71 µsec per loop
  • 鲁希·潘查尔(Rushy Panchal)的票数最多,答案list.insert(...)-第三(每个循环26.5 微秒

     python3 -m timeit -s "a = list(range(1000))" "b = a[:]; b.insert(500, 3)"
     10000 loops, best of 3: 26.5 µsec per loop
  • 我的回答列表理解enumerate四- (每圈168微秒很慢)

     mquadri$ python3 -m timeit -s "a = list(range(1000))" "[y for i, x in enumerate(a) for y in ((3, x) if i == 500 else (x, )) ]"
     10000 loops, best of 3: 168 µsec per loop

Most performance efficient approach

You may also insert the element using the slice indexing in the list. For example:

>>> a = [1, 2, 4]
>>> insert_at = 2  # Index at which you want to insert item

>>> b = a[:]   # Created copy of list "a" as "b".
               # Skip this step if you are ok with modifying the original list

>>> b[insert_at:insert_at] = [3]  # Insert "3" within "b"
>>> b
[1, 2, 3, 4]

For inserting multiple elements together at a given index, all you need to do is to use a list of multiple elements that you want to insert. For example:

>>> a = [1, 2, 4]
>>> insert_at = 2   # Index starting from which multiple elements will be inserted

# List of elements that you want to insert together at "index_at" (above) position
>>> insert_elements = [3, 5, 6]

>>> a[insert_at:insert_at] = insert_elements
>>> a   # [3, 5, 6] are inserted together in `a` starting at index "2"
[1, 2, 3, 5, 6, 4]

Alternative using list comprehension (but very slow in terms of performance):

As an alternative, it can be achieved using list comprehension with enumerate too. (But please don’t do it this way. It is just for illustration):

>>> a = [1, 2, 4]
>>> insert_at = 2

>>> b = [y for i, x in enumerate(a) for y in ((3, x) if i == insert_at else (x, ))]
>>> b
[1, 2, 3, 4]

Performance comparison of all solutions

Here’s the timeit comparison of all the answers with list of 1000 elements for Python 3.4.5:

  • Mine answer using sliced insertion – Fastest (3.08 µsec per loop)

     mquadri$ python3 -m timeit -s "a = list(range(1000))" "b = a[:]; b[500:500] = [3]"
     100000 loops, best of 3: 3.08 µsec per loop
    
  • ATOzTOA’s accepted answer based on merge of sliced lists – Second (6.71 µsec per loop)

     mquadri$ python3 -m timeit -s "a = list(range(1000))" "b = a[:500] + [3] + a[500:]"
     100000 loops, best of 3: 6.71 µsec per loop
    
  • Rushy Panchal’s answer with most votes using list.insert(...)– Third (26.5 usec per loop)

     python3 -m timeit -s "a = list(range(1000))" "b = a[:]; b.insert(500, 3)"
     10000 loops, best of 3: 26.5 µsec per loop
    
  • My answer with List Comprehension and enumerate – Fourth (very slow with 168 µsec per loop)

     mquadri$ python3 -m timeit -s "a = list(range(1000))" "[y for i, x in enumerate(a) for y in ((3, x) if i == 500 else (x, )) ]"
     10000 loops, best of 3: 168 µsec per loop
    

回答 2

我得到的最短信息: b = a[:2] + [3] + a[2:]

>>>
>>> a = [1, 2, 4]
>>> print a
[1, 2, 4]
>>> b = a[:2] + [3] + a[2:]
>>> print a
[1, 2, 4]
>>> print b
[1, 2, 3, 4]

The shortest I got: b = a[:2] + [3] + a[2:]

>>>
>>> a = [1, 2, 4]
>>> print a
[1, 2, 4]
>>> b = a[:2] + [3] + a[2:]
>>> print a
[1, 2, 4]
>>> print b
[1, 2, 3, 4]

回答 3

最干净的方法是复制列表,然后将对象插入副本。在Python 3上,可以通过list.copy以下方式完成:

new = old.copy()
new.insert(index, value)

在Python 2上,可以通过new = old[:](通过python 3也可以)复制列表。

在性能方面,与其他建议的方法没有区别:

$ python --version
Python 3.8.1
$ python -m timeit -s "a = list(range(1000))" "b = a.copy(); b.insert(500, 3)"
100000 loops, best of 5: 2.84 µsec per loop
$ python -m timeit -s "a = list(range(1000))" "b = a.copy(); b[500:500] = (3,)"
100000 loops, best of 5: 2.76 µsec per loop

The cleanest approach is to copy the list and then insert the object into the copy. On Python 3 this can be done via list.copy:

new = old.copy()
new.insert(index, value)

On Python 2 copying the list can be achieved via new = old[:] (this also works on Python 3).

In terms of performance there is no difference to other proposed methods:

$ python --version
Python 3.8.1
$ python -m timeit -s "a = list(range(1000))" "b = a.copy(); b.insert(500, 3)"
100000 loops, best of 5: 2.84 µsec per loop
$ python -m timeit -s "a = list(range(1000))" "b = a.copy(); b[500:500] = (3,)"
100000 loops, best of 5: 2.76 µsec per loop

回答 4

使用Python列表insert()方法。用法:

#句法

insert()方法的语法-

list.insert(index, obj)

#参数

  • index-这是需要在其中插入对象obj的索引。
  • obj-这是要插入给定列表的对象。

#Return Value此方法不返回任何值,但会将给定元素插入给定索引。

例:

a = [1,2,4,5]

a.insert(2,3)

print(a)

退货 [1, 2, 3, 4, 5]

Use the Python list insert() method. Usage:

#Syntax

The syntax for the insert() method −

list.insert(index, obj)

#Parameters

  • index − This is the Index where the object obj need to be inserted.
  • obj − This is the Object to be inserted into the given list.

#Return Value This method does not return any value, but it inserts the given element at the given index.

Example:

a = [1,2,4,5]

a.insert(2,3)

print(a)

Returns [1, 2, 3, 4, 5]


返回较大列表中第n个项目的列表的Python方法

问题:返回较大列表中第n个项目的列表的Python方法

假设我们有一个从0到1000的数字列表。是否有一种pythonic /高效的方法来生成第一个以及随后的第10个项目的列表,即[0, 10, 20, 30, ... ]

是的,我可以使用for循环来执行此操作,但是我想知道是否有更整洁的方法可以执行此操作,甚至在一行中也可以?

Say we have a list of numbers from 0 to 1000. Is there a pythonic/efficient way to produce a list of the first and every subsequent 10th item, i.e. [0, 10, 20, 30, ... ]?

Yes, I can do this using a for loop, but I’m wondering if there is a neater way to do this, perhaps even in one line?


回答 0

>>> lst = list(range(165))
>>> lst[0::10]
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160]

请注意,这比循环并检查每个元素的模量快约100倍:

$ python -m timeit -s "lst = list(range(1000))" "lst1 = [x for x in lst if x % 10 == 0]"
1000 loops, best of 3: 525 usec per loop
$ python -m timeit -s "lst = list(range(1000))" "lst1 = lst[0::10]"
100000 loops, best of 3: 4.02 usec per loop
>>> lst = list(range(165))
>>> lst[0::10]
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160]

Note that this is around 100 times faster than looping and checking a modulus for each element:

$ python -m timeit -s "lst = list(range(1000))" "lst1 = [x for x in lst if x % 10 == 0]"
1000 loops, best of 3: 525 usec per loop
$ python -m timeit -s "lst = list(range(1000))" "lst1 = lst[0::10]"
100000 loops, best of 3: 4.02 usec per loop

回答 1

  1. source_list[::10] 是最明显的,但这对任何可迭代的方法都无效,并且对于大列表而言内存效率不高。
  2. itertools.islice(source_sequence, 0, None, 10) 适用于任何可迭代且内存有效的方法,但对于大型列表和大型步骤而言,可能不是最快的解决方案。
  3. (source_list[i] for i in xrange(0, len(source_list), 10))
  1. source_list[::10] is the most obvious, but this doesn’t work for any iterable and is not memory efficient for large lists.
  2. itertools.islice(source_sequence, 0, None, 10) works for any iterable and is memory-efficient, but probably is not the fastest solution for large list and big step.
  3. (source_list[i] for i in xrange(0, len(source_list), 10))

回答 2

您可以像这样使用slice运算符:

l = [1,2,3,4,5]
l2 = l[::2] # get subsequent 2nd item

You can use the slice operator like this:

l = [1,2,3,4,5]
l2 = l[::2] # get subsequent 2nd item

回答 3

从手册: s[i:j:k] slice of s from i to j with step k

li = range(100)
sub = li[0::10]

>>> sub
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

From manual: s[i:j:k] slice of s from i to j with step k

li = range(100)
sub = li[0::10]

>>> sub
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

回答 4

newlist = oldlist[::10]

这将选择列表中的第10个元素。

newlist = oldlist[::10]

This picks out every 10th element of the list.


回答 5

为什么不只使用range函数的step参数来获得:

l = range(0, 1000, 10)

为了进行比较,在我的机器上:

H:\>python -m timeit -s "l = range(1000)" "l1 = [x for x in l if x % 10 == 0]"
10000 loops, best of 3: 90.8 usec per loop
H:\>python -m timeit -s "l = range(1000)" "l1 = l[0::10]"
1000000 loops, best of 3: 0.861 usec per loop
H:\>python -m timeit -s "l = range(0, 1000, 10)"
100000000 loops, best of 3: 0.0172 usec per loop

Why not just use a step parameter of range function as well to get:

l = range(0, 1000, 10)

For comparison, on my machine:

H:\>python -m timeit -s "l = range(1000)" "l1 = [x for x in l if x % 10 == 0]"
10000 loops, best of 3: 90.8 usec per loop
H:\>python -m timeit -s "l = range(1000)" "l1 = l[0::10]"
1000000 loops, best of 3: 0.861 usec per loop
H:\>python -m timeit -s "l = range(0, 1000, 10)"
100000000 loops, best of 3: 0.0172 usec per loop

回答 6

existing_list = range(0, 1001)
filtered_list = [i for i in existing_list if i % 10 == 0]
existing_list = range(0, 1001)
filtered_list = [i for i in existing_list if i % 10 == 0]

回答 7

这是“每10个项目”列表理解的一种更好的实现,它不使用列表内容作为成员资格测试的一部分:

>>> l = range(165)
>>> [ item for i,item in enumerate(l) if i%10==0 ]
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160]
>>> l = list("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
>>> [ item for i,item in enumerate(l) if i%10==0 ]
['A', 'K', 'U']

但这仍然比仅使用列表切片要慢得多。

Here is a better implementation of an “every 10th item” list comprehension, that does not use the list contents as part of the membership test:

>>> l = range(165)
>>> [ item for i,item in enumerate(l) if i%10==0 ]
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160]
>>> l = list("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
>>> [ item for i,item in enumerate(l) if i%10==0 ]
['A', 'K', 'U']

But this is still far slower than just using list slicing.


回答 8

列表理解正是为此而做出的:

smaller_list = [x for x in range(100001) if x % 10 == 0]

您可以在python官方文档中获取有关它们的更多信息:http : //docs.python.org/tutorial/datastructures.html#list-comprehensions

List comprehensions are exactly made for that:

smaller_list = [x for x in range(100001) if x % 10 == 0]

You can get more info about them in the python official documentation: http://docs.python.org/tutorial/datastructures.html#list-comprehensions


是否有一个类似zip的函数可以在Python中填充最长的长度?

问题:是否有一个类似zip的函数可以在Python中填充最长的长度?

是否有一个内置功能可以像这样工作,zip()但是会填充结果,以便结果列表的长度是最长输入的长度而不是最短输入的长度

>>> a = ['a1']
>>> b = ['b1', 'b2', 'b3']
>>> c = ['c1', 'c2']

>>> zip(a, b, c)
[('a1', 'b1', 'c1')]

>>> What command goes here?
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

Is there a built-in function that works like zip() but that will pad the results so that the length of the resultant list is the length of the longest input rather than the shortest input?

>>> a = ['a1']
>>> b = ['b1', 'b2', 'b3']
>>> c = ['c1', 'c2']

>>> zip(a, b, c)
[('a1', 'b1', 'c1')]

>>> What command goes here?
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

回答 0

在Python 3中,您可以使用 itertools.zip_longest

>>> list(itertools.zip_longest(a, b, c))
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

您可以None使用与fillvalue参数不同的值进行填充:

>>> list(itertools.zip_longest(a, b, c, fillvalue='foo'))
[('a1', 'b1', 'c1'), ('foo', 'b2', 'c2'), ('foo', 'b3', 'foo')]

使用Python 2,你既可以使用itertools.izip_longest(Python的2.6+),也可以使用mapNone。这是的鲜为人知的功能map(但map在Python 3.x中有所更改,因此仅在Python 2.x中有效)。

>>> map(None, a, b, c)
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

In Python 3 you can use itertools.zip_longest

>>> list(itertools.zip_longest(a, b, c))
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

You can pad with a different value than None by using the fillvalue parameter:

>>> list(itertools.zip_longest(a, b, c, fillvalue='foo'))
[('a1', 'b1', 'c1'), ('foo', 'b2', 'c2'), ('foo', 'b3', 'foo')]

With Python 2 you can either use itertools.izip_longest (Python 2.6+), or you can use map with None. It is a little known feature of map (but map changed in Python 3.x, so this only works in Python 2.x).

>>> map(None, a, b, c)
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

回答 1

对于Python 2.6x,请使用itertools模块的izip_longest

对于Python 3,请zip_longest改用(不加i)。

>>> list(itertools.izip_longest(a, b, c))
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

For Python 2.6x use itertools module’s izip_longest.

For Python 3 use zip_longest instead (no leading i).

>>> list(itertools.izip_longest(a, b, c))
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

回答 2

非itertools Python 3解决方案:

def zip_longest(*lists):
    def g(l):
        for item in l:
            yield item
        while True:
            yield None
    gens = [g(l) for l in lists]    
    for _ in range(max(map(len, lists))):
        yield tuple(next(g) for g in gens)

non itertools Python 3 solution:

def zip_longest(*lists):
    def g(l):
        for item in l:
            yield item
        while True:
            yield None
    gens = [g(l) for l in lists]    
    for _ in range(max(map(len, lists))):
        yield tuple(next(g) for g in gens)

回答 3

non itertools我的Python 2解决方案:

if len(list1) < len(list2):
    list1.extend([None] * (len(list2) - len(list1)))
else:
    list2.extend([None] * (len(list1) - len(list2)))

non itertools My Python 2 solution:

if len(list1) < len(list2):
    list1.extend([None] * (len(list2) - len(list1)))
else:
    list2.extend([None] * (len(list1) - len(list2)))

回答 4

我使用2d数组,但是使用python 2.x的概念相似:

if len(set([len(p) for p in printer])) > 1:
    printer = [column+['']*(max([len(p) for p in printer])-len(column)) for column in printer]

Im using a 2d array but the concept is the similar using python 2.x:

if len(set([len(p) for p in printer])) > 1:
    printer = [column+['']*(max([len(p) for p in printer])-len(column)) for column in printer]

如何在python中构造一组列表项?

问题:如何在python中构造一组列表项?

list在python中有一个文件名,我想set从所有文件名中构造一个。

filelist=[]
for filename in filelist:
    set(filename)

这似乎不起作用。怎么办

I have a list of filenames in python and I would want to construct a set out of all the filenames.

filelist=[]
for filename in filelist:
    set(filename)

This does not seem to work. How can do this?


回答 0

如果您具有可散列对象的列表(文件名可能是字符串,那么它们应该算在内):

lst = ['foo.py', 'bar.py', 'baz.py', 'qux.py', Ellipsis]

您可以直接构造集合:

s = set(lst)

实际上,set将这种方式与任何可迭代对象一起使用! (鸭子打字不好吗?)


如果要迭代进行:

s = set()
for item in iterable:
    s.add(item)

但是很少需要这样做。我只提到它是因为该set.add方法非常有用。

If you have a list of hashable objects (filenames would probably be strings, so they should count):

lst = ['foo.py', 'bar.py', 'baz.py', 'qux.py', Ellipsis]

you can construct the set directly:

s = set(lst)

In fact, set will work this way with any iterable object! (Isn’t duck typing great?)


If you want to do it iteratively:

s = set()
for item in iterable:
    s.add(item)

But there’s rarely a need to do it this way. I only mention it because the set.add method is quite useful.


回答 1

最直接的解决方案是:

 s = set(filelist)

原始代码中的问题是未将值分配给集合。这是您的代码的固定版本:

 s = set()
 for filename in filelist:
     s.add(filename)
 print(s)

The most direct solution is this:

s = set(filelist)

The issue in your original code is that the values weren’t being assigned to the set. Here’s the fixed-up version of your code:

s = set()
for filename in filelist:
    s.add(filename)
print(s)

回答 2

你可以做

my_set = set(my_list)

或者,对于Python 3,

my_set = {*my_list}

从列表创建一个集合。相反,您也可以

my_list = list(my_set)

或者,对于Python 3,

my_list = [*my_set]

从集合创建列表。

只需注意,将列表转换为集合时,列表中元素的顺序通常会丢失,因为集合本质上是无序的。(不过,CPython中的一个exceptions似乎是如果列表仅包含非负整数,但是我认为这是CPython中集合实现的结果,并且这种行为在不同的Python实现之间会有所不同。)

You can do

my_set = set(my_list)

or, in Python 3,

my_set = {*my_list}

to create a set from a list. Conversely, you can also do

my_list = list(my_set)

or, in Python 3,

my_list = [*my_set]

to create a list from a set.

Just note that the order of the elements in a list is generally lost when converting the list to a set since a set is inherently unordered. (One exception in CPython, though, seems to be if the list consists only of non-negative integers, but I assume this is a consequence of the implementation of sets in CPython and that this behavior can vary between different Python implementations.)


回答 3

这是另一种解决方案:

>>>list1=["C:\\","D:\\","E:\\","C:\\"]
>>>set1=set(list1)
>>>set1
set(['E:\\', 'D:\\', 'C:\\'])

在这段代码中,我使用了set方法,以便将其变成一个集合,然后从列表中删除了所有重复的值

Here is another solution:

>>>list1=["C:\\","D:\\","E:\\","C:\\"]
>>>set1=set(list1)
>>>set1
set(['E:\\', 'D:\\', 'C:\\'])

In this code I have used the set method in order to turn it into a set and then it removed all duplicate values from the list


回答 4

一种像这样的迭代方式构造集合的一般方法:

aset = {e for e in alist}

One general way to construct set in iterative way like this:

aset = {e for e in alist}

回答 5

简单地说:

new_list = set(your_list)

Simply put the line:

new_list = set(your_list)

如何获取项目在列表中的位置?

问题:如何获取项目在列表中的位置?

我正在遍历列表,如果满足特定条件,我想打印出该项目的索引。我该怎么做?

例:

testlist = [1,2,3,5,3,1,2,1,6]
for item in testlist:
    if item == 1:
        print position

I am iterating over a list and I want to print out the index of the item if it meets a certain condition. How would I do this?

Example:

testlist = [1,2,3,5,3,1,2,1,6]
for item in testlist:
    if item == 1:
        print position

回答 0

嗯 这里有一个关于列表理解的答案,但是它消失了。

这里:

 [i for i,x in enumerate(testlist) if x == 1]

例:

>>> testlist
[1, 2, 3, 5, 3, 1, 2, 1, 6]
>>> [i for i,x in enumerate(testlist) if x == 1]
[0, 5, 7]

更新:

好的,您需要一个生成器表达式,我们将有一个生成器表达式。再次在for循环中,这是列表理解:

>>> for i in [i for i,x in enumerate(testlist) if x == 1]:
...     print i
... 
0
5
7

现在我们将构建一个生成器…

>>> (i for i,x in enumerate(testlist) if x == 1)
<generator object at 0x6b508>
>>> for i in (i for i,x in enumerate(testlist) if x == 1):
...     print i
... 
0
5
7

令人高兴的是,我们可以将其分配给变量,然后从那里使用它…

>>> gen = (i for i,x in enumerate(testlist) if x == 1)
>>> for i in gen: print i
... 
0
5
7

并且以为我曾经写过FORTRAN。

Hmmm. There was an answer with a list comprehension here, but it’s disappeared.

Here:

 [i for i,x in enumerate(testlist) if x == 1]

Example:

>>> testlist
[1, 2, 3, 5, 3, 1, 2, 1, 6]
>>> [i for i,x in enumerate(testlist) if x == 1]
[0, 5, 7]

Update:

Okay, you want a generator expression, we’ll have a generator expression. Here’s the list comprehension again, in a for loop:

>>> for i in [i for i,x in enumerate(testlist) if x == 1]:
...     print i
... 
0
5
7

Now we’ll construct a generator…

>>> (i for i,x in enumerate(testlist) if x == 1)
<generator object at 0x6b508>
>>> for i in (i for i,x in enumerate(testlist) if x == 1):
...     print i
... 
0
5
7

and niftily enough, we can assign that to a variable, and use it from there…

>>> gen = (i for i,x in enumerate(testlist) if x == 1)
>>> for i in gen: print i
... 
0
5
7

And to think I used to write FORTRAN.


回答 1

接下来呢?

print testlist.index(element)

如果不确定要查找的元素是否确实在列表中,则可以添加初步检查,例如

if element in testlist:
    print testlist.index(element)

要么

print(testlist.index(element) if element in testlist else None)

或“ pythonic方式”,我不太喜欢它,因为代码不太清晰,但有时效率更高,

try:
    print testlist.index(element)
except ValueError:
    pass

What about the following?

print testlist.index(element)

If you are not sure whether the element to look for is actually in the list, you can add a preliminary check, like

if element in testlist:
    print testlist.index(element)

or

print(testlist.index(element) if element in testlist else None)

or the “pythonic way”, which I don’t like so much because code is less clear, but sometimes is more efficient,

try:
    print testlist.index(element)
except ValueError:
    pass

回答 2

使用枚举:

testlist = [1,2,3,5,3,1,2,1,6]
for position, item in enumerate(testlist):
    if item == 1:
        print position

Use enumerate:

testlist = [1,2,3,5,3,1,2,1,6]
for position, item in enumerate(testlist):
    if item == 1:
        print position

回答 3

for i in xrange(len(testlist)):
  if testlist[i] == 1:
    print i

xrange而不是要求的范围(请参阅注释)。

for i in xrange(len(testlist)):
  if testlist[i] == 1:
    print i

xrange instead of range as requested (see comments).


回答 4

这是执行此操作的另一种方法:

try:
   id = testlist.index('1')
   print testlist[id]
except ValueError:
   print "Not Found"

Here is another way to do this:

try:
   id = testlist.index('1')
   print testlist[id]
except ValueError:
   print "Not Found"

回答 5

[x for x in range(len(testlist)) if testlist[x]==1]
[x for x in range(len(testlist)) if testlist[x]==1]

回答 6

请尝试以下方法:

testlist = [1,2,3,5,3,1,2,1,6]    
position=0
for i in testlist:
   if i == 1:
      print(position)
   position=position+1

Try the below:

testlist = [1,2,3,5,3,1,2,1,6]    
position=0
for i in testlist:
   if i == 1:
      print(position)
   position=position+1

回答 7

如果列表足够大,并且只希望在稀疏索引中找到该值,则可以认为该代码可以更快执行,因为您不必迭代列表中的每个值。

lookingFor = 1
i = 0
index = 0
try:
  while i < len(testlist):
    index = testlist.index(lookingFor,i)
    i = index + 1
    print index
except ValueError: #testlist.index() cannot find lookingFor
  pass

如果您希望找到很多值,则应该将“索引”附加到列表中,并在最后打印列表以节省每次迭代的时间。

If your list got large enough and you only expected to find the value in a sparse number of indices, consider that this code could execute much faster because you don’t have to iterate every value in the list.

lookingFor = 1
i = 0
index = 0
try:
  while i < len(testlist):
    index = testlist.index(lookingFor,i)
    i = index + 1
    print index
except ValueError: #testlist.index() cannot find lookingFor
  pass

If you expect to find the value a lot you should probably just append “index” to a list and print the list at the end to save time per iteration.


回答 8

我认为使用Tkinter库中的curselection()方法可能会很有用:

from Tkinter import * 
listbox.curselection()

此方法适用于Tkinter列表框小部件,因此您需要构造其中一个而不是列表。

这将返回如下位置:

(“ 0”,)(尽管Tkinter的更高版本可能会返回一个int列表)

这是第一个位置,编号将根据项目位置而变化。

有关更多信息,请参见以下页面:http : //effbot.org/tkinterbook/listbox.htm

问候。

I think that it might be useful to use the curselection() method from thte Tkinter library:

from Tkinter import * 
listbox.curselection()

This method works on Tkinter listbox widgets, so you’ll need to construct one of them instead of a list.

This will return a position like this:

(‘0’,) (although later versions of Tkinter may return a list of ints instead)

Which is for the first position and the number will change according to the item position.

For more information, see this page: http://effbot.org/tkinterbook/listbox.htm

Greetings.


回答 9

为什么使事情复杂化?

testlist = [1,2,3,5,3,1,2,1,6]
for position, item in enumerate(testlist):
    if item == 1:
        print position

Why complicate things?

testlist = [1,2,3,5,3,1,2,1,6]
for position, item in enumerate(testlist):
    if item == 1:
        print position

回答 10

只是为了说明完整的示例,以及input_list其中具有searies1(example:input_list [0])的示例,您要在其中查找series2(example:input_list [1])并获取series2的索引(如果它在series1中存在)。

注意:certain condition如果条件简单,您将使用lambda表达式

input_list = [[1,2,3,4,5,6,7],[1,3,7]]
series1 = input_list[0]
series2 = input_list[1]
idx_list = list(map(lambda item: series1.index(item) if item in series1 else None, series2))
print(idx_list)

输出:

[0, 2, 6]

Just to illustrate complete example along with the input_list which has searies1 (example: input_list[0]) in which you want to do a lookup of series2 (example: input_list[1]) and get indexes of series2 if it exists in series1.

Note: Your certain condition will go in lambda expression if conditions are simple

input_list = [[1,2,3,4,5,6,7],[1,3,7]]
series1 = input_list[0]
series2 = input_list[1]
idx_list = list(map(lambda item: series1.index(item) if item in series1 else None, series2))
print(idx_list)

output:

[0, 2, 6]

回答 11

testlist = [1,2,3,5,3,1,2,1,6]
for id, value in enumerate(testlist):
    if id == 1:
        print testlist[id]

我想这正是您想要的。;-)’id’将始终是列表中值的索引。

testlist = [1,2,3,5,3,1,2,1,6]
for id, value in enumerate(testlist):
    if id == 1:
        print testlist[id]

I guess that it’s exacly what you want. ;-) ‘id’ will be always the index of the values on the list.


如何将逗号分隔的字符串转换为Python中的列表?

问题:如何将逗号分隔的字符串转换为Python中的列表?

给定一个字符串,该字符串是由逗号分隔的多个值的序列:

mStr = 'A,B,C,D,E' 

如何将字符串转换为列表?

mList = ['A', 'B', 'C', 'D', 'E']

Given a string that is a sequence of several values separated by a commma:

mStr = 'A,B,C,D,E' 

How do I convert the string to a list?

mList = ['A', 'B', 'C', 'D', 'E']

回答 0

您可以使用str.split方法。

>>> my_string = 'A,B,C,D,E'
>>> my_list = my_string.split(",")
>>> print my_list
['A', 'B', 'C', 'D', 'E']

如果要将其转换为元组,只需

>>> print tuple(my_list)
('A', 'B', 'C', 'D', 'E')

如果您希望追加到列表,请尝试以下操作:

>>> my_list.append('F')
>>> print my_list
['A', 'B', 'C', 'D', 'E', 'F']

You can use the str.split method.

>>> my_string = 'A,B,C,D,E'
>>> my_list = my_string.split(",")
>>> print my_list
['A', 'B', 'C', 'D', 'E']

If you want to convert it to a tuple, just

>>> print tuple(my_list)
('A', 'B', 'C', 'D', 'E')

If you are looking to append to a list, try this:

>>> my_list.append('F')
>>> print my_list
['A', 'B', 'C', 'D', 'E', 'F']

回答 1

对于字符串中包含的整数,如果要避免将它们int分别转换为整数,可以执行以下操作:

mList = [int(e) if e.isdigit() else e for e in mStr.split(',')]

这称为列表理解,它基于集合构建器符号。

例如:

>>> mStr = "1,A,B,3,4"
>>> mList = [int(e) if e.isdigit() else e for e in mStr.split(',')]
>>> mList
>>> [1,'A','B',3,4]

In the case of integers that are included at the string, if you want to avoid casting them to int individually you can do:

mList = [int(e) if e.isdigit() else e for e in mStr.split(',')]

It is called list comprehension, and it is based on set builder notation.

ex:

>>> mStr = "1,A,B,3,4"
>>> mList = [int(e) if e.isdigit() else e for e in mStr.split(',')]
>>> mList
>>> [1,'A','B',3,4]

回答 2

>>> some_string='A,B,C,D,E'
>>> new_tuple= tuple(some_string.split(','))
>>> new_tuple
('A', 'B', 'C', 'D', 'E')
>>> some_string='A,B,C,D,E'
>>> new_tuple= tuple(some_string.split(','))
>>> new_tuple
('A', 'B', 'C', 'D', 'E')

回答 3

您可以使用此功能将以逗号分隔的单个字符串转换为list-

def stringtolist(x):
    mylist=[]
    for i in range(0,len(x),2):
        mylist.append(x[i])
    return mylist

You can use this function to convert comma-delimited single character strings to list-

def stringtolist(x):
    mylist=[]
    for i in range(0,len(x),2):
        mylist.append(x[i])
    return mylist

回答 4

#splits string according to delimeters 
'''
Let's make a function that can split a string
into list according the given delimeters. 
example data: cat;dog:greff,snake/
example delimeters: ,;- /|:
'''
def string_to_splitted_array(data,delimeters):
    #result list
    res = []
    # we will add chars into sub_str until
    # reach a delimeter
    sub_str = ''
    for c in data: #iterate over data char by char
        # if we reached a delimeter, we store the result 
        if c in delimeters: 
            # avoid empty strings
            if len(sub_str)>0:
                # looks like a valid string.
                res.append(sub_str)
                # reset sub_str to start over
                sub_str = ''
        else:
            # c is not a deilmeter. then it is 
            # part of the string.
            sub_str += c
    # there may not be delimeter at end of data. 
    # if sub_str is not empty, we should att it to list. 
    if len(sub_str)>0:
        res.append(sub_str)
    # result is in res 
    return res

# test the function. 
delimeters = ',;- /|:'
# read the csv data from console. 
csv_string = input('csv string:')
#lets check if working. 
splitted_array = string_to_splitted_array(csv_string,delimeters)
print(splitted_array)
#splits string according to delimeters 
'''
Let's make a function that can split a string
into list according the given delimeters. 
example data: cat;dog:greff,snake/
example delimeters: ,;- /|:
'''
def string_to_splitted_array(data,delimeters):
    #result list
    res = []
    # we will add chars into sub_str until
    # reach a delimeter
    sub_str = ''
    for c in data: #iterate over data char by char
        # if we reached a delimeter, we store the result 
        if c in delimeters: 
            # avoid empty strings
            if len(sub_str)>0:
                # looks like a valid string.
                res.append(sub_str)
                # reset sub_str to start over
                sub_str = ''
        else:
            # c is not a deilmeter. then it is 
            # part of the string.
            sub_str += c
    # there may not be delimeter at end of data. 
    # if sub_str is not empty, we should att it to list. 
    if len(sub_str)>0:
        res.append(sub_str)
    # result is in res 
    return res

# test the function. 
delimeters = ',;- /|:'
# read the csv data from console. 
csv_string = input('csv string:')
#lets check if working. 
splitted_array = string_to_splitted_array(csv_string,delimeters)
print(splitted_array)

回答 5

考虑以下内容以处理空字符串的情况:

>>> my_string = 'A,B,C,D,E'
>>> my_string.split(",") if my_string else []
['A', 'B', 'C', 'D', 'E']
>>> my_string = ""
>>> my_string.split(",") if my_string else []
[]

Consider the following in order to handle the case of an empty string:

>>> my_string = 'A,B,C,D,E'
>>> my_string.split(",") if my_string else []
['A', 'B', 'C', 'D', 'E']
>>> my_string = ""
>>> my_string.split(",") if my_string else []
[]

回答 6

您可以拆分该字符串,并直接获取列表:

mStr = 'A,B,C,D,E'
list1 = mStr.split(',')
print(list1)

输出:

['A', 'B', 'C', 'D', 'E']

您还可以将其转换为n元组:

print(tuple(list1))

输出:

('A', 'B', 'C', 'D', 'E')

You can split that string on , and directly get a list:

mStr = 'A,B,C,D,E'
list1 = mStr.split(',')
print(list1)

Output:

['A', 'B', 'C', 'D', 'E']

You can also convert it to an n-tuple:

print(tuple(list1))

Output:

('A', 'B', 'C', 'D', 'E')


Pandas DataFrame到字典列表

问题:Pandas DataFrame到字典列表

我有以下DataFrame:

客户item1 item2 item3
1个苹果牛奶番茄
2水橙土豆
3汁芒果片

我想将其翻译为每行词典列表

rows = [{'customer': 1, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
    {'customer': 2, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
    {'customer': 3, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]

I have the following DataFrame:

customer    item1      item2    item3
1           apple      milk     tomato
2           water      orange   potato
3           juice      mango    chips

which I want to translate it to list of dictionaries per row

rows = [{'customer': 1, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
    {'customer': 2, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
    {'customer': 3, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]

回答 0

编辑

正如John Galt在回答中提到的那样,您可能应该改用df.to_dict('records')。它比手动移调要快。

In [20]: timeit df.T.to_dict().values()
1000 loops, best of 3: 395 µs per loop

In [21]: timeit df.to_dict('records')
10000 loops, best of 3: 53 µs per loop

原始答案

使用df.T.to_dict().values(),如下所示:

In [1]: df
Out[1]:
   customer  item1   item2   item3
0         1  apple    milk  tomato
1         2  water  orange  potato
2         3  juice   mango   chips

In [2]: df.T.to_dict().values()
Out[2]:
[{'customer': 1.0, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
 {'customer': 2.0, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
 {'customer': 3.0, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]

Edit

As John Galt mentions in his answer , you should probably instead use df.to_dict('records'). It’s faster than transposing manually.

In [20]: timeit df.T.to_dict().values()
1000 loops, best of 3: 395 µs per loop

In [21]: timeit df.to_dict('records')
10000 loops, best of 3: 53 µs per loop

Original answer

Use df.T.to_dict().values(), like below:

In [1]: df
Out[1]:
   customer  item1   item2   item3
0         1  apple    milk  tomato
1         2  water  orange  potato
2         3  juice   mango   chips

In [2]: df.T.to_dict().values()
Out[2]:
[{'customer': 1.0, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
 {'customer': 2.0, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
 {'customer': 3.0, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]

回答 1

用途df.to_dict('records')-提供输出,而无需外部转置。

In [2]: df.to_dict('records')
Out[2]:
[{'customer': 1L, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
 {'customer': 2L, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
 {'customer': 3L, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]

Use df.to_dict('records') — gives the output without having to transpose externally.

In [2]: df.to_dict('records')
Out[2]:
[{'customer': 1L, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
 {'customer': 2L, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
 {'customer': 3L, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]

回答 2

作为对John Galt答案的扩展-

对于以下DataFrame,

   customer  item1   item2   item3
0         1  apple    milk  tomato
1         2  water  orange  potato
2         3  juice   mango   chips

如果要获取包含索引值的词典列表,可以执行以下操作:

df.to_dict('index')

输出字典的字典,其中父字典的键是索引值。在这种情况下

{0: {'customer': 1, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
 1: {'customer': 2, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
 2: {'customer': 3, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}}

As an extension to John Galt’s answer –

For the following DataFrame,

   customer  item1   item2   item3
0         1  apple    milk  tomato
1         2  water  orange  potato
2         3  juice   mango   chips

If you want to get a list of dictionaries including the index values, you can do something like,

df.to_dict('index')

Which outputs a dictionary of dictionaries where keys of the parent dictionary are index values. In this particular case,

{0: {'customer': 1, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
 1: {'customer': 2, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
 2: {'customer': 3, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}}

回答 3

如果您只想选择一列,则可以使用。

df[["item1"]].to_dict("records")

下面将工作,并产生一个类型错误:不支持的类型。我相信这是因为它正在尝试将系列转换为字典,而不是将数据帧转换为字典。

df["item1"].to_dict("records")

我只需要选择一个列,然后将其转换为以列名作为键的字典列表,然后在此卡住一会儿,以至于我想与大家分享。

If you are interested in only selecting one column this will work.

df[["item1"]].to_dict("records")

The below will NOT work and produces a TypeError: unsupported type: . I believe this is because it is trying to convert a series to a dict and not a Data Frame to a dict.

df["item1"].to_dict("records")

I had a requirement to only select one column and convert it to a list of dicts with the column name as the key and was stuck on this for a bit so figured I’d share.


列表的熊猫列,为每个列表元素创建一行

问题:列表的熊猫列,为每个列表元素创建一行

我有一个数据框,其中某些单元格包含多个值的列表。我不想扩展一个单元格中的多个值,而是想扩展数据框,以便列表中的每个项目都有自己的行(所有其他列中的值都相同)。所以,如果我有:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {'trial_num': [1, 2, 3, 1, 2, 3],
     'subject': [1, 1, 1, 2, 2, 2],
     'samples': [list(np.random.randn(3).round(2)) for i in range(6)]
    }
)

df
Out[10]: 
                 samples  subject  trial_num
0    [0.57, -0.83, 1.44]        1          1
1    [-0.01, 1.13, 0.36]        1          2
2   [1.18, -1.46, -0.94]        1          3
3  [-0.08, -4.22, -2.05]        2          1
4     [0.72, 0.79, 0.53]        2          2
5    [0.4, -0.32, -0.13]        2          3

如何转换为长格式,例如:

   subject  trial_num  sample  sample_num
0        1          1    0.57           0
1        1          1   -0.83           1
2        1          1    1.44           2
3        1          2   -0.01           0
4        1          2    1.13           1
5        1          2    0.36           2
6        1          3    1.18           0
# etc.

索引并不重要,可以将现有列设置为索引也可以,最后的顺序也不重要。

I have a dataframe where some cells contain lists of multiple values. Rather than storing multiple values in a cell, I’d like to expand the dataframe so that each item in the list gets its own row (with the same values in all other columns). So if I have:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {'trial_num': [1, 2, 3, 1, 2, 3],
     'subject': [1, 1, 1, 2, 2, 2],
     'samples': [list(np.random.randn(3).round(2)) for i in range(6)]
    }
)

df
Out[10]: 
                 samples  subject  trial_num
0    [0.57, -0.83, 1.44]        1          1
1    [-0.01, 1.13, 0.36]        1          2
2   [1.18, -1.46, -0.94]        1          3
3  [-0.08, -4.22, -2.05]        2          1
4     [0.72, 0.79, 0.53]        2          2
5    [0.4, -0.32, -0.13]        2          3

How do I convert to long form, e.g.:

   subject  trial_num  sample  sample_num
0        1          1    0.57           0
1        1          1   -0.83           1
2        1          1    1.44           2
3        1          2   -0.01           0
4        1          2    1.13           1
5        1          2    0.36           2
6        1          3    1.18           0
# etc.

The index is not important, it’s OK to set existing columns as the index and the final ordering isn’t important.


回答 0

lst_col = 'samples'

r = pd.DataFrame({
      col:np.repeat(df[col].values, df[lst_col].str.len())
      for col in df.columns.drop(lst_col)}
    ).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]

结果:

In [103]: r
Out[103]:
    samples  subject  trial_num
0      0.10        1          1
1     -0.20        1          1
2      0.05        1          1
3      0.25        1          2
4      1.32        1          2
5     -0.17        1          2
6      0.64        1          3
7     -0.22        1          3
8     -0.71        1          3
9     -0.03        2          1
10    -0.65        2          1
11     0.76        2          1
12     1.77        2          2
13     0.89        2          2
14     0.65        2          2
15    -0.98        2          3
16     0.65        2          3
17    -0.30        2          3

PS 在这里你可能会发现一些通用的解决方案


更新:一些解释:IMO了解此代码的最简单方法是尝试逐步执行它:

在下一行中,我们将在一列N时间内重复值,其中N-是相应列表的长度:

In [10]: np.repeat(df['trial_num'].values, df[lst_col].str.len())
Out[10]: array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int64)

可以对所有包含标量值的列进行概括:

In [11]: pd.DataFrame({
    ...:           col:np.repeat(df[col].values, df[lst_col].str.len())
    ...:           for col in df.columns.drop(lst_col)}
    ...:         )
Out[11]:
    trial_num  subject
0           1        1
1           1        1
2           1        1
3           2        1
4           2        1
5           2        1
6           3        1
..        ...      ...
11          1        2
12          2        2
13          2        2
14          2        2
15          3        2
16          3        2
17          3        2

[18 rows x 2 columns]

使用np.concatenate()我们可以展平列listsamples)中的所有值并获得一维矢量:

In [12]: np.concatenate(df[lst_col].values)
Out[12]: array([-1.04, -0.58, -1.32,  0.82, -0.59, -0.34,  0.25,  2.09,  0.12,  0.83, -0.88,  0.68,  0.55, -0.56,  0.65, -0.04,  0.36, -0.31])

将所有这些放在一起:

In [13]: pd.DataFrame({
    ...:           col:np.repeat(df[col].values, df[lst_col].str.len())
    ...:           for col in df.columns.drop(lst_col)}
    ...:         ).assign(**{lst_col:np.concatenate(df[lst_col].values)})
Out[13]:
    trial_num  subject  samples
0           1        1    -1.04
1           1        1    -0.58
2           1        1    -1.32
3           2        1     0.82
4           2        1    -0.59
5           2        1    -0.34
6           3        1     0.25
..        ...      ...      ...
11          1        2     0.68
12          2        2     0.55
13          2        2    -0.56
14          2        2     0.65
15          3        2    -0.04
16          3        2     0.36
17          3        2    -0.31

[18 rows x 3 columns]

使用pd.DataFrame()[df.columns]将确保我们按原始顺序选择列…

lst_col = 'samples'

r = pd.DataFrame({
      col:np.repeat(df[col].values, df[lst_col].str.len())
      for col in df.columns.drop(lst_col)}
    ).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]

Result:

In [103]: r
Out[103]:
    samples  subject  trial_num
0      0.10        1          1
1     -0.20        1          1
2      0.05        1          1
3      0.25        1          2
4      1.32        1          2
5     -0.17        1          2
6      0.64        1          3
7     -0.22        1          3
8     -0.71        1          3
9     -0.03        2          1
10    -0.65        2          1
11     0.76        2          1
12     1.77        2          2
13     0.89        2          2
14     0.65        2          2
15    -0.98        2          3
16     0.65        2          3
17    -0.30        2          3

PS here you may find a bit more generic solution


UPDATE: some explanations: IMO the easiest way to understand this code is to try to execute it step-by-step:

in the following line we are repeating values in one column N times where N – is the length of the corresponding list:

In [10]: np.repeat(df['trial_num'].values, df[lst_col].str.len())
Out[10]: array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int64)

this can be generalized for all columns, containing scalar values:

In [11]: pd.DataFrame({
    ...:           col:np.repeat(df[col].values, df[lst_col].str.len())
    ...:           for col in df.columns.drop(lst_col)}
    ...:         )
Out[11]:
    trial_num  subject
0           1        1
1           1        1
2           1        1
3           2        1
4           2        1
5           2        1
6           3        1
..        ...      ...
11          1        2
12          2        2
13          2        2
14          2        2
15          3        2
16          3        2
17          3        2

[18 rows x 2 columns]

using np.concatenate() we can flatten all values in the list column (samples) and get a 1D vector:

In [12]: np.concatenate(df[lst_col].values)
Out[12]: array([-1.04, -0.58, -1.32,  0.82, -0.59, -0.34,  0.25,  2.09,  0.12,  0.83, -0.88,  0.68,  0.55, -0.56,  0.65, -0.04,  0.36, -0.31])

putting all this together:

In [13]: pd.DataFrame({
    ...:           col:np.repeat(df[col].values, df[lst_col].str.len())
    ...:           for col in df.columns.drop(lst_col)}
    ...:         ).assign(**{lst_col:np.concatenate(df[lst_col].values)})
Out[13]:
    trial_num  subject  samples
0           1        1    -1.04
1           1        1    -0.58
2           1        1    -1.32
3           2        1     0.82
4           2        1    -0.59
5           2        1    -0.34
6           3        1     0.25
..        ...      ...      ...
11          1        2     0.68
12          2        2     0.55
13          2        2    -0.56
14          2        2     0.65
15          3        2    -0.04
16          3        2     0.36
17          3        2    -0.31

[18 rows x 3 columns]

using pd.DataFrame()[df.columns] will guarantee that we are selecting columns in the original order…


回答 1

比我预期的长一点:

>>> df
                samples  subject  trial_num
0  [-0.07, -2.9, -2.44]        1          1
1   [-1.52, -0.35, 0.1]        1          2
2  [-0.17, 0.57, -0.65]        1          3
3  [-0.82, -1.06, 0.47]        2          1
4   [0.79, 1.35, -0.09]        2          2
5   [1.17, 1.14, -1.79]        2          3
>>>
>>> s = df.apply(lambda x: pd.Series(x['samples']),axis=1).stack().reset_index(level=1, drop=True)
>>> s.name = 'sample'
>>>
>>> df.drop('samples', axis=1).join(s)
   subject  trial_num  sample
0        1          1   -0.07
0        1          1   -2.90
0        1          1   -2.44
1        1          2   -1.52
1        1          2   -0.35
1        1          2    0.10
2        1          3   -0.17
2        1          3    0.57
2        1          3   -0.65
3        2          1   -0.82
3        2          1   -1.06
3        2          1    0.47
4        2          2    0.79
4        2          2    1.35
4        2          2   -0.09
5        2          3    1.17
5        2          3    1.14
5        2          3   -1.79

如果要顺序索引,可以将其应用于reset_index(drop=True)结果。

更新

>>> res = df.set_index(['subject', 'trial_num'])['samples'].apply(pd.Series).stack()
>>> res = res.reset_index()
>>> res.columns = ['subject','trial_num','sample_num','sample']
>>> res
    subject  trial_num  sample_num  sample
0         1          1           0    1.89
1         1          1           1   -2.92
2         1          1           2    0.34
3         1          2           0    0.85
4         1          2           1    0.24
5         1          2           2    0.72
6         1          3           0   -0.96
7         1          3           1   -2.72
8         1          3           2   -0.11
9         2          1           0   -1.33
10        2          1           1    3.13
11        2          1           2   -0.65
12        2          2           0    0.10
13        2          2           1    0.65
14        2          2           2    0.15
15        2          3           0    0.64
16        2          3           1   -0.10
17        2          3           2   -0.76

A bit longer than I expected:

>>> df
                samples  subject  trial_num
0  [-0.07, -2.9, -2.44]        1          1
1   [-1.52, -0.35, 0.1]        1          2
2  [-0.17, 0.57, -0.65]        1          3
3  [-0.82, -1.06, 0.47]        2          1
4   [0.79, 1.35, -0.09]        2          2
5   [1.17, 1.14, -1.79]        2          3
>>>
>>> s = df.apply(lambda x: pd.Series(x['samples']),axis=1).stack().reset_index(level=1, drop=True)
>>> s.name = 'sample'
>>>
>>> df.drop('samples', axis=1).join(s)
   subject  trial_num  sample
0        1          1   -0.07
0        1          1   -2.90
0        1          1   -2.44
1        1          2   -1.52
1        1          2   -0.35
1        1          2    0.10
2        1          3   -0.17
2        1          3    0.57
2        1          3   -0.65
3        2          1   -0.82
3        2          1   -1.06
3        2          1    0.47
4        2          2    0.79
4        2          2    1.35
4        2          2   -0.09
5        2          3    1.17
5        2          3    1.14
5        2          3   -1.79

If you want sequential index, you can apply reset_index(drop=True) to the result.

update:

>>> res = df.set_index(['subject', 'trial_num'])['samples'].apply(pd.Series).stack()
>>> res = res.reset_index()
>>> res.columns = ['subject','trial_num','sample_num','sample']
>>> res
    subject  trial_num  sample_num  sample
0         1          1           0    1.89
1         1          1           1   -2.92
2         1          1           2    0.34
3         1          2           0    0.85
4         1          2           1    0.24
5         1          2           2    0.72
6         1          3           0   -0.96
7         1          3           1   -2.72
8         1          3           2   -0.11
9         2          1           0   -1.33
10        2          1           1    3.13
11        2          1           2   -0.65
12        2          2           0    0.10
13        2          2           1    0.65
14        2          2           2    0.15
15        2          3           0    0.64
16        2          3           1   -0.10
17        2          3           2   -0.76

回答 2

熊猫> = 0.25

Series和DataFrame方法定义一个.explode()将列表分解为单独的行的方法。请参阅爆炸类似列表的docs部分。

df = pd.DataFrame({
    'var1': [['a', 'b', 'c'], ['d', 'e',], [], np.nan], 
    'var2': [1, 2, 3, 4]
})
df
        var1  var2
0  [a, b, c]     1
1     [d, e]     2
2         []     3
3        NaN     4

df.explode('var1')

  var1  var2
0    a     1
0    b     1
0    c     1
1    d     2
1    e     2
2  NaN     3  # empty list converted to NaN
3  NaN     4  # NaN entry preserved as-is

# to reset the index to be monotonically increasing...
df.explode('var1').reset_index(drop=True)

  var1  var2
0    a     1
1    b     1
2    c     1
3    d     2
4    e     2
5  NaN     3
6  NaN     4

请注意,这还可以处理列表和标量的混合列,以及适当的空列表和NaN(这是repeat基于解决方案的缺点)。

但是,您应该注意,explode仅适用于单个列(目前)。

PS:如果要爆炸一列字符串,则需要先在分隔符上进行分割,然后使用explode。看看我的这个(很多)相关答案。

Pandas >= 0.25

Series and DataFrame methods define a .explode() method that explodes lists into separate rows. See the docs section on Exploding a list-like column.

df = pd.DataFrame({
    'var1': [['a', 'b', 'c'], ['d', 'e',], [], np.nan], 
    'var2': [1, 2, 3, 4]
})
df
        var1  var2
0  [a, b, c]     1
1     [d, e]     2
2         []     3
3        NaN     4

df.explode('var1')

  var1  var2
0    a     1
0    b     1
0    c     1
1    d     2
1    e     2
2  NaN     3  # empty list converted to NaN
3  NaN     4  # NaN entry preserved as-is

# to reset the index to be monotonically increasing...
df.explode('var1').reset_index(drop=True)

  var1  var2
0    a     1
1    b     1
2    c     1
3    d     2
4    e     2
5  NaN     3
6  NaN     4

Note that this also handles mixed columns of lists and scalars, as well as empty lists and NaNs appropriately (this is a drawback of repeat-based solutions).

However, you should note that explode only works on a single column (for now).

P.S.: if you are looking to explode a column of strings, you need to split on a separator first, then use explode. See this (very much) related answer by me.


回答 3

您还可以使用pd.concatpd.melt为此:

>>> objs = [df, pd.DataFrame(df['samples'].tolist())]
>>> pd.concat(objs, axis=1).drop('samples', axis=1)
   subject  trial_num     0     1     2
0        1          1 -0.49 -1.00  0.44
1        1          2 -0.28  1.48  2.01
2        1          3 -0.52 -1.84  0.02
3        2          1  1.23 -1.36 -1.06
4        2          2  0.54  0.18  0.51
5        2          3 -2.18 -0.13 -1.35
>>> pd.melt(_, var_name='sample_num', value_name='sample', 
...         value_vars=[0, 1, 2], id_vars=['subject', 'trial_num'])
    subject  trial_num sample_num  sample
0         1          1          0   -0.49
1         1          2          0   -0.28
2         1          3          0   -0.52
3         2          1          0    1.23
4         2          2          0    0.54
5         2          3          0   -2.18
6         1          1          1   -1.00
7         1          2          1    1.48
8         1          3          1   -1.84
9         2          1          1   -1.36
10        2          2          1    0.18
11        2          3          1   -0.13
12        1          1          2    0.44
13        1          2          2    2.01
14        1          3          2    0.02
15        2          1          2   -1.06
16        2          2          2    0.51
17        2          3          2   -1.35

最后,如果需要,您可以根据前三列进行排序。

you can also use pd.concat and pd.melt for this:

>>> objs = [df, pd.DataFrame(df['samples'].tolist())]
>>> pd.concat(objs, axis=1).drop('samples', axis=1)
   subject  trial_num     0     1     2
0        1          1 -0.49 -1.00  0.44
1        1          2 -0.28  1.48  2.01
2        1          3 -0.52 -1.84  0.02
3        2          1  1.23 -1.36 -1.06
4        2          2  0.54  0.18  0.51
5        2          3 -2.18 -0.13 -1.35
>>> pd.melt(_, var_name='sample_num', value_name='sample', 
...         value_vars=[0, 1, 2], id_vars=['subject', 'trial_num'])
    subject  trial_num sample_num  sample
0         1          1          0   -0.49
1         1          2          0   -0.28
2         1          3          0   -0.52
3         2          1          0    1.23
4         2          2          0    0.54
5         2          3          0   -2.18
6         1          1          1   -1.00
7         1          2          1    1.48
8         1          3          1   -1.84
9         2          1          1   -1.36
10        2          2          1    0.18
11        2          3          1   -0.13
12        1          1          2    0.44
13        1          2          2    2.01
14        1          3          2    0.02
15        2          1          2   -1.06
16        2          2          2    0.51
17        2          3          2   -1.35

last, if you need you can sort base on the first the first three columns.


回答 4

为了逐步了解Roman Pekar的解决方案以更好地理解它,我想出了自己的解决方案,该解决方案melt用于避免一些令人困惑的堆栈和索引重置。我不能说这显然是一个更清晰的解决方案:

items_as_cols = df.apply(lambda x: pd.Series(x['samples']), axis=1)
# Keep original df index as a column so it's retained after melt
items_as_cols['orig_index'] = items_as_cols.index

melted_items = pd.melt(items_as_cols, id_vars='orig_index', 
                       var_name='sample_num', value_name='sample')
melted_items.set_index('orig_index', inplace=True)

df.merge(melted_items, left_index=True, right_index=True)

输出(显然,我们现在可以删除原始样本列):

                 samples  subject  trial_num sample_num  sample
0    [1.84, 1.05, -0.66]        1          1          0    1.84
0    [1.84, 1.05, -0.66]        1          1          1    1.05
0    [1.84, 1.05, -0.66]        1          1          2   -0.66
1    [-0.24, -0.9, 0.65]        1          2          0   -0.24
1    [-0.24, -0.9, 0.65]        1          2          1   -0.90
1    [-0.24, -0.9, 0.65]        1          2          2    0.65
2    [1.15, -0.87, -1.1]        1          3          0    1.15
2    [1.15, -0.87, -1.1]        1          3          1   -0.87
2    [1.15, -0.87, -1.1]        1          3          2   -1.10
3   [-0.8, -0.62, -0.68]        2          1          0   -0.80
3   [-0.8, -0.62, -0.68]        2          1          1   -0.62
3   [-0.8, -0.62, -0.68]        2          1          2   -0.68
4    [0.91, -0.47, 1.43]        2          2          0    0.91
4    [0.91, -0.47, 1.43]        2          2          1   -0.47
4    [0.91, -0.47, 1.43]        2          2          2    1.43
5  [-1.14, -0.24, -0.91]        2          3          0   -1.14
5  [-1.14, -0.24, -0.91]        2          3          1   -0.24
5  [-1.14, -0.24, -0.91]        2          3          2   -0.91

Trying to work through Roman Pekar’s solution step-by-step to understand it better, I came up with my own solution, which uses melt to avoid some of the confusing stacking and index resetting. I can’t say that it’s obviously a clearer solution though:

items_as_cols = df.apply(lambda x: pd.Series(x['samples']), axis=1)
# Keep original df index as a column so it's retained after melt
items_as_cols['orig_index'] = items_as_cols.index

melted_items = pd.melt(items_as_cols, id_vars='orig_index', 
                       var_name='sample_num', value_name='sample')
melted_items.set_index('orig_index', inplace=True)

df.merge(melted_items, left_index=True, right_index=True)

Output (obviously we can drop the original samples column now):

                 samples  subject  trial_num sample_num  sample
0    [1.84, 1.05, -0.66]        1          1          0    1.84
0    [1.84, 1.05, -0.66]        1          1          1    1.05
0    [1.84, 1.05, -0.66]        1          1          2   -0.66
1    [-0.24, -0.9, 0.65]        1          2          0   -0.24
1    [-0.24, -0.9, 0.65]        1          2          1   -0.90
1    [-0.24, -0.9, 0.65]        1          2          2    0.65
2    [1.15, -0.87, -1.1]        1          3          0    1.15
2    [1.15, -0.87, -1.1]        1          3          1   -0.87
2    [1.15, -0.87, -1.1]        1          3          2   -1.10
3   [-0.8, -0.62, -0.68]        2          1          0   -0.80
3   [-0.8, -0.62, -0.68]        2          1          1   -0.62
3   [-0.8, -0.62, -0.68]        2          1          2   -0.68
4    [0.91, -0.47, 1.43]        2          2          0    0.91
4    [0.91, -0.47, 1.43]        2          2          1   -0.47
4    [0.91, -0.47, 1.43]        2          2          2    1.43
5  [-1.14, -0.24, -0.91]        2          3          0   -1.14
5  [-1.14, -0.24, -0.91]        2          3          1   -0.24
5  [-1.14, -0.24, -0.91]        2          3          2   -0.91

回答 5

对于那些寻求避免手动列命名的Roman Pekar答案版本的用户:

column_to_explode = 'samples'
res = (df
       .set_index([x for x in df.columns if x != column_to_explode])[column_to_explode]
       .apply(pd.Series)
       .stack()
       .reset_index())
res = res.rename(columns={
          res.columns[-2]:'exploded_{}_index'.format(column_to_explode),
          res.columns[-1]: '{}_exploded'.format(column_to_explode)})

For those looking for a version of Roman Pekar’s answer that avoids manual column naming:

column_to_explode = 'samples'
res = (df
       .set_index([x for x in df.columns if x != column_to_explode])[column_to_explode]
       .apply(pd.Series)
       .stack()
       .reset_index())
res = res.rename(columns={
          res.columns[-2]:'exploded_{}_index'.format(column_to_explode),
          res.columns[-1]: '{}_exploded'.format(column_to_explode)})

回答 6

我发现最简单的方法是:

  1. samples列转换为DataFrame
  2. 加入原始df
  3. 融化

如图所示:

    df.samples.apply(lambda x: pd.Series(x)).join(df).\
melt(['subject','trial_num'],[0,1,2],var_name='sample')

        subject  trial_num sample  value
    0         1          1      0  -0.24
    1         1          2      0   0.14
    2         1          3      0  -0.67
    3         2          1      0  -1.52
    4         2          2      0  -0.00
    5         2          3      0  -1.73
    6         1          1      1  -0.70
    7         1          2      1  -0.70
    8         1          3      1  -0.29
    9         2          1      1  -0.70
    10        2          2      1  -0.72
    11        2          3      1   1.30
    12        1          1      2  -0.55
    13        1          2      2   0.10
    14        1          3      2  -0.44
    15        2          1      2   0.13
    16        2          2      2  -1.44
    17        2          3      2   0.73

值得注意的是,这可能仅是有效的,因为每个试验具有相同数量的样本(3)。对于不同样本量的试验,可能需要更巧妙的方法。

I found the easiest way was to:

  1. Convert the samples column into a DataFrame
  2. Joining with the original df
  3. Melting

Shown here:

    df.samples.apply(lambda x: pd.Series(x)).join(df).\
melt(['subject','trial_num'],[0,1,2],var_name='sample')

        subject  trial_num sample  value
    0         1          1      0  -0.24
    1         1          2      0   0.14
    2         1          3      0  -0.67
    3         2          1      0  -1.52
    4         2          2      0  -0.00
    5         2          3      0  -1.73
    6         1          1      1  -0.70
    7         1          2      1  -0.70
    8         1          3      1  -0.29
    9         2          1      1  -0.70
    10        2          2      1  -0.72
    11        2          3      1   1.30
    12        1          1      2  -0.55
    13        1          2      2   0.10
    14        1          3      2  -0.44
    15        2          1      2   0.13
    16        2          2      2  -1.44
    17        2          3      2   0.73

It’s worth noting that this may have only worked because each trial has the same number of samples (3). Something more clever may be necessary for trials of different sample sizes.


回答 7

答案很晚,但我想添加以下内容:

使用vanilla Python的快速解决方案,它也可以处理sample_numOP示例中的专栏。在我自己的大型数据集上,该数据集具有超过1000万行,结果是2800万行,这仅需要38秒。接受的解决方案会完全破坏该数量的数据,并导致memory error我的系统上具有128GB的RAM。

df = df.reset_index(drop=True)
lstcol = df.lstcol.values
lstcollist = []
indexlist = []
countlist = []
for ii in range(len(lstcol)):
    lstcollist.extend(lstcol[ii])
    indexlist.extend([ii]*len(lstcol[ii]))
    countlist.extend([jj for jj in range(len(lstcol[ii]))])
df = pd.merge(df.drop("lstcol",axis=1),pd.DataFrame({"lstcol":lstcollist,"lstcol_num":countlist},
index=indexlist),left_index=True,right_index=True).reset_index(drop=True)

Very late answer but I want to add this:

A fast solution using vanilla Python that also takes care of the sample_num column in OP’s example. On my own large dataset with over 10 million rows and a result with 28 million rows this only takes about 38 seconds. The accepted solution completely breaks down with that amount of data and leads to a memory error on my system that has 128GB of RAM.

df = df.reset_index(drop=True)
lstcol = df.lstcol.values
lstcollist = []
indexlist = []
countlist = []
for ii in range(len(lstcol)):
    lstcollist.extend(lstcol[ii])
    indexlist.extend([ii]*len(lstcol[ii]))
    countlist.extend([jj for jj in range(len(lstcol[ii]))])
df = pd.merge(df.drop("lstcol",axis=1),pd.DataFrame({"lstcol":lstcollist,"lstcol_num":countlist},
index=indexlist),left_index=True,right_index=True).reset_index(drop=True)

回答 8

也很晚,但是如果您没有熊猫> = 0.25版本,这是Karvy1的答案,对我来说效果很好:https://stackoverflow.com/a/52511166/10740287

对于上面的示例,您可以编写:

data = [(row.subject, row.trial_num, sample) for row in df.itertuples() for sample in row.samples]
data = pd.DataFrame(data, columns=['subject', 'trial_num', 'samples'])

速度测试:

%timeit data = pd.DataFrame([(row.subject, row.trial_num, sample) for row in df.itertuples() for sample in row.samples], columns=['subject', 'trial_num', 'samples'])

每个循环1.33 ms±74.8 µs(平均±标准偏差,共运行7次,每个循环1000个)

%timeit data = df.set_index(['subject', 'trial_num'])['samples'].apply(pd.Series).stack().reset_index()

每个循环4.9 ms±189 µs(平均±标准偏差,共运行7次,每个循环100个)

%timeit data = pd.DataFrame({col:np.repeat(df[col].values, df['samples'].str.len())for col in df.columns.drop('samples')}).assign(**{'samples':np.concatenate(df['samples'].values)})

每个循环1.38 ms±25 µs(平均±标准偏差,共运行7次,每个循环1000个)

Also very late, but here is an answer from Karvy1 that worked well for me if you don’t have pandas >=0.25 version: https://stackoverflow.com/a/52511166/10740287

For the example above you may write:

data = [(row.subject, row.trial_num, sample) for row in df.itertuples() for sample in row.samples]
data = pd.DataFrame(data, columns=['subject', 'trial_num', 'samples'])

Speed test:

%timeit data = pd.DataFrame([(row.subject, row.trial_num, sample) for row in df.itertuples() for sample in row.samples], columns=['subject', 'trial_num', 'samples'])

1.33 ms ± 74.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit data = df.set_index(['subject', 'trial_num'])['samples'].apply(pd.Series).stack().reset_index()

4.9 ms ± 189 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit data = pd.DataFrame({col:np.repeat(df[col].values, df['samples'].str.len())for col in df.columns.drop('samples')}).assign(**{'samples':np.concatenate(df['samples'].values)})

1.38 ms ± 25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


回答 9

import pandas as pd
df = pd.DataFrame([{'Product': 'Coke', 'Prices': [100,123,101,105,99,94,98]},{'Product': 'Pepsi', 'Prices': [101,104,104,101,99,99,99]}])
print(df)
df = df.assign(Prices=df.Prices.str.split(',')).explode('Prices')
print(df)

在熊猫> = 0.25版本中尝试一下

import pandas as pd
df = pd.DataFrame([{'Product': 'Coke', 'Prices': [100,123,101,105,99,94,98]},{'Product': 'Pepsi', 'Prices': [101,104,104,101,99,99,99]}])
print(df)
df = df.assign(Prices=df.Prices.str.split(',')).explode('Prices')
print(df)

Try this in pandas >=0.25 version


单行不带括号的打印列表

问题:单行不带括号的打印列表

我在Python中有一个列表

names = ["Sam", "Peter", "James", "Julian", "Ann"]

我想在没有正常的“ []的情况下在单行中打印数组

names = ["Sam", "Peter", "James", "Julian", "Ann"]
print (names)

将给出的输出为;

["Sam", "Peter", "James", "Julian", "Ann"]

那不是我想要的格式,而是我希望它像这样;

Sam, Peter, James, Julian, Ann

注意:它必须在一行中。

I have a list in Python e.g.

names = ["Sam", "Peter", "James", "Julian", "Ann"]

I want to print the array in a single line without the normal ” []

names = ["Sam", "Peter", "James", "Julian", "Ann"]
print (names)

Will give the output as;

["Sam", "Peter", "James", "Julian", "Ann"]

That is not the format I want instead I want it to be like this;

Sam, Peter, James, Julian, Ann

Note: It must be in a single row.


回答 0

print(', '.join(names))

听起来很简单,它只接受列表中的所有元素,然后将它们加入', '

print(', '.join(names))

This, like it sounds, just takes all the elements of the list and joins them with ', '.


回答 1

这是一个简单的例子。

names = ["Sam", "Peter", "James", "Julian", "Ann"]
print(*names, sep=", ")

星标将列表解压缩并返回列表中的每个元素。

Here is a simple one.

names = ["Sam", "Peter", "James", "Julian", "Ann"]
print(*names, sep=", ")

the star unpacks the list and return every element in the list.


回答 2

通用解决方案,适用于非字符串数组:

>>> print str(names)[1:-1]
'Sam', 'Peter', 'James', 'Julian', 'Ann'

General solution, works on arrays of non-strings:

>>> print str(names)[1:-1]
'Sam', 'Peter', 'James', 'Julian', 'Ann'

回答 3

如果输入数组是Integer类型,则需要先将数组转换为字符串类型 array,然后使用join方法与所需的空间连接,或分隔。例如:

>>> arr = [1, 2, 4, 3]
>>> print(", " . join(arr))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected string, int found
>>> sarr = [str(a) for a in arr]
>>> print(", " . join(sarr))
1, 2, 4, 3
>>>

直接使用join来连接整数和字符串将抛出错误,如上所示。

If the input array is Integer type then you need to first convert array into string type array and then use join method for joining with , or space whatever you want. e.g:

>>> arr = [1, 2, 4, 3]
>>> print(", " . join(arr))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected string, int found
>>> sarr = [str(a) for a in arr]
>>> print(", " . join(sarr))
1, 2, 4, 3
>>>

Direct using of join which will join the integer and string will throw error as show above.


回答 4

有两个答案,首先是使用“ sep”设置

>>> print(*names, sep = ', ')

另一个在下面

>>> print(', '.join(names))

There are two answers , First is use ‘sep’ setting

>>> print(*names, sep = ', ')

The other is below

>>> print(', '.join(names))

回答 5

这就是你所需要的

", ".join(names)

This is what you need

", ".join(names)

回答 6

','.join(list)仅当列表中的所有项目均为字符串时,该选项才有效。如果您希望将数字列表转换为逗号分隔的字符串。例如a = [1, 2, 3, 4]进入,'1,2,3,4'则可以

str(a)[1:-1] # '1, 2, 3, 4'

要么

str(a).lstrip('[').rstrip(']') # '1, 2, 3, 4'

尽管这不会删除任何嵌套列表。

将其转换回列表

a = '1,2,3,4'
import ast
ast.literal_eval('['+a+']')
#[1, 2, 3, 4]

','.join(list) will work only if all the items in the list are strings. If you are looking to convert a list of numbers to a comma separated string. such as a = [1, 2, 3, 4] into '1,2,3,4' then you can either

str(a)[1:-1] # '1, 2, 3, 4'

or

str(a).lstrip('[').rstrip(']') # '1, 2, 3, 4'

although this won’t remove any nested list.

To convert it back to a list

a = '1,2,3,4'
import ast
ast.literal_eval('['+a+']')
#[1, 2, 3, 4]

回答 7

您需要遍历列表并将end=" "其保持在一行上

names = ["Sam", "Peter", "James", "Julian", "Ann"]
    index=0
    for name in names:
        print(names[index], end=", ")
        index += 1

You need to loop through the list and use end=" "to keep it on one line

names = ["Sam", "Peter", "James", "Julian", "Ann"]
    index=0
    for name in names:
        print(names[index], end=", ")
        index += 1

回答 8

打印(*名称)

如果您希望将它们以空格分隔的形式打印出来,那么它将在python 3中起作用。如果您需要逗号或介于两者之间的其他内容,请继续使用.join()解决方案

print(*names)

this will work in python 3 if you want them to be printed out as space separated. If you need comma or anything else in between go ahead with .join() solution


回答 9

我不知道这是否像其他方法一样有效,但是简单的逻辑总是有效:

import sys
name = ["Sam", "Peter", "James", "Julian", "Ann"]
for i in range(0, len(names)):
    sys.stdout.write(names[i])
    if i != len(names)-1:
        sys.stdout.write(", ")

输出:

山姆,彼得,詹姆斯,朱利安·安

I don’t know if this is efficient as others but simple logic always works:

import sys
name = ["Sam", "Peter", "James", "Julian", "Ann"]
for i in range(0, len(names)):
    sys.stdout.write(names[i])
    if i != len(names)-1:
        sys.stdout.write(", ")

Output:

Sam, Peter, James, Julian, Ann


回答 10

以下函数将接收列表并返回列表项的字符串。然后可以将其用于记录或打印目的。

def listToString(inList):
    outString = ''
    if len(inList)==1:
        outString = outString+str(inList[0])
    if len(inList)>1:
        outString = outString+str(inList[0])
        for items in inList[1:]:
            outString = outString+', '+str(items)
    return outString

The following function will take in a list and return a string of the lists’ items. This can then be used for logging or printing purposes.

def listToString(inList):
    outString = ''
    if len(inList)==1:
        outString = outString+str(inList[0])
    if len(inList)>1:
        outString = outString+str(inList[0])
        for items in inList[1:]:
            outString = outString+', '+str(items)
    return outString