Python 实用宝典

Question 1

Python doc says that slicing a list returns a new list.
Now if a “new” list is being returned I’ve the following questions related to “Assignment to slices”

a = [1, 2, 3]
a[0:2] = [4, 5]
print a

Now the output would be:

[4, 5, 3]

How can something that is returning something come on the left side of expression?
Yes, I read the docs and it says it is possible, now since slicing a list returns a “new” list, why is the original list being modified? I am not able to understand the mechanics behind it.

Question 2

You are confusing two distinct operation that use very similar syntax:

1) slicing:

b = a[0:2]

This makes a copy of the slice of a and assigns it to b.

2) slice assignment:

a[0:2] = b

This replaces the slice of a with the contents of b.

Although the syntax is similar (I imagine by design!), these are two different operations.

Question 3

When you specify a on the left side of the = operator, you are using Python’s normal assignment, which changes the name a in the current context to point to the new value. This does not change the previous value to which a was pointing.

By specifying a[0:2] on the left side of the = operator, you are telling Python you want to use Slice Assignment. Slice Assignment is a special syntax for lists, where you can insert, delete, or replace contents from a list:

Insertion:

>>> a = [1, 2, 3]
>>> a[0:0] = [-3, -2, -1, 0]
>>> a
[-3, -2, -1, 0, 1, 2, 3]

Deletion:

>>> a
[-3, -2, -1, 0, 1, 2, 3]
>>> a[2:4] = []
>>> a
[-3, -2, 1, 2, 3]

Replacement:

>>> a
[-3, -2, 1, 2, 3]
>>> a[:] = [1, 2, 3]
>>> a
[1, 2, 3]

Note:

The length of the slice may be different from the length of the assigned sequence, thus changing the length of the target sequence, if the target sequence allows it. – source

Slice Assignment provides similar function to Tuple Unpacking. For example, a[0:1] = [4, 5] is equivalent to:

# Tuple Unpacking
a[0], a[1] = [4, 5]

With Tuple Unpacking, you can modify non-sequential lists:

>>> a
[4, 5, 3]
>>> a[-1], a[0] = [7, 3]
>>> a
[3, 5, 7]

However, tuple unpacking is limited to replacement, as you cannot insert or remove elements.

Before and after all these operations, a is the same exact list. Python simply provides nice syntactic sugar to modify a list in-place.

Question 4

I came across the same question before and it’s related to the language specification. According to assignment-statements,

If the left side of assignment is subscription, Python will call __setitem__ on that object. a[i] = x is equivalent to a.__setitem__(i, x).
If the left side of assignment is slice, Python will also call __setitem__, but with different arguments: a[1:4]=[1,2,3] is equivalent to a.__setitem__(slice(1,4,None), [1,2,3])

That’s why list slice on the left side of ‘=’ behaves differently.

Question 5

By slicing on the left hand side of an assignment operation, you are specifying which items to assign to.

Question 6

I have this:

>>> a = [1, 2, 4]
>>> print a
[1, 2, 4]

>>> print a.insert(2, 3)
None

>>> print a
[1, 2, 3, 4]

>>> b = a.insert(3, 6)
>>> print b
None

>>> print a
[1, 2, 3, 6, 4]

Is there a way I can get the updated list as the result, instead of updating the original list in place?

Question 7

l.insert(index, obj) doesn’t actually return anything. It just updates the list.

As ATO said, you can do b = a[:index] + [obj] + a[index:]. However, another way is:

a = [1, 2, 4]
b = a[:]
b.insert(2, 3)

Question 8

Most performance efficient approach

You may also insert the element using the slice indexing in the list. For example:

>>> a = [1, 2, 4]
>>> insert_at = 2  # Index at which you want to insert item

>>> b = a[:]   # Created copy of list "a" as "b".
               # Skip this step if you are ok with modifying the original list

>>> b[insert_at:insert_at] = [3]  # Insert "3" within "b"
>>> b
[1, 2, 3, 4]

For inserting multiple elements together at a given index, all you need to do is to use a list of multiple elements that you want to insert. For example:

>>> a = [1, 2, 4]
>>> insert_at = 2   # Index starting from which multiple elements will be inserted

# List of elements that you want to insert together at "index_at" (above) position
>>> insert_elements = [3, 5, 6]

>>> a[insert_at:insert_at] = insert_elements
>>> a   # [3, 5, 6] are inserted together in `a` starting at index "2"
[1, 2, 3, 5, 6, 4]

Alternative using list comprehension (but very slow in terms of performance):

As an alternative, it can be achieved using list comprehension with enumerate too. (But please don’t do it this way. It is just for illustration):

>>> a = [1, 2, 4]
>>> insert_at = 2

>>> b = [y for i, x in enumerate(a) for y in ((3, x) if i == insert_at else (x, ))]
>>> b
[1, 2, 3, 4]

Performance comparison of all solutions

Here’s the timeit comparison of all the answers with list of 1000 elements for Python 3.4.5:

Mine answer using sliced insertion – Fastest (3.08 µsec per loop)

 mquadri$ python3 -m timeit -s "a = list(range(1000))" "b = a[:]; b[500:500] = [3]"
 100000 loops, best of 3: 3.08 µsec per loop

ATOzTOA’s accepted answer based on merge of sliced lists – Second (6.71 µsec per loop)

 mquadri$ python3 -m timeit -s "a = list(range(1000))" "b = a[:500] + [3] + a[500:]"
 100000 loops, best of 3: 6.71 µsec per loop

Rushy Panchal’s answer with most votes using list.insert(...)– Third (26.5 usec per loop)

 python3 -m timeit -s "a = list(range(1000))" "b = a[:]; b.insert(500, 3)"
 10000 loops, best of 3: 26.5 µsec per loop

My answer with List Comprehension and enumerate – Fourth (very slow with 168 µsec per loop)

 mquadri$ python3 -m timeit -s "a = list(range(1000))" "[y for i, x in enumerate(a) for y in ((3, x) if i == 500 else (x, )) ]"
 10000 loops, best of 3: 168 µsec per loop

Question 9

The shortest I got: b = a[:2] + [3] + a[2:]

>>>
>>> a = [1, 2, 4]
>>> print a
[1, 2, 4]
>>> b = a[:2] + [3] + a[2:]
>>> print a
[1, 2, 4]
>>> print b
[1, 2, 3, 4]

Question 10

The cleanest approach is to copy the list and then insert the object into the copy. On Python 3 this can be done via list.copy:

new = old.copy()
new.insert(index, value)

On Python 2 copying the list can be achieved via new = old[:] (this also works on Python 3).

In terms of performance there is no difference to other proposed methods:

$ python --version
Python 3.8.1
$ python -m timeit -s "a = list(range(1000))" "b = a.copy(); b.insert(500, 3)"
100000 loops, best of 5: 2.84 µsec per loop
$ python -m timeit -s "a = list(range(1000))" "b = a.copy(); b[500:500] = (3,)"
100000 loops, best of 5: 2.76 µsec per loop

Question 11

Use the Python list insert() method. Usage:

#Syntax

The syntax for the insert() method −

list.insert(index, obj)

#Parameters

index − This is the Index where the object obj need to be inserted.
obj − This is the Object to be inserted into the given list.

#Return Value This method does not return any value, but it inserts the given element at the given index.

Example:

a = [1,2,4,5]

a.insert(2,3)

print(a)

Returns [1, 2, 3, 4, 5]

Question 12

Say we have a list of numbers from 0 to 1000. Is there a pythonic/efficient way to produce a list of the first and every subsequent 10th item, i.e. [0, 10, 20, 30, ... ]?

Yes, I can do this using a for loop, but I’m wondering if there is a neater way to do this, perhaps even in one line?

Question 13

>>> lst = list(range(165))
>>> lst[0::10]
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160]

Note that this is around 100 times faster than looping and checking a modulus for each element:

$ python -m timeit -s "lst = list(range(1000))" "lst1 = [x for x in lst if x % 10 == 0]"
1000 loops, best of 3: 525 usec per loop
$ python -m timeit -s "lst = list(range(1000))" "lst1 = lst[0::10]"
100000 loops, best of 3: 4.02 usec per loop

Question 14

source_list[::10] is the most obvious, but this doesn’t work for any iterable and is not memory efficient for large lists.
itertools.islice(source_sequence, 0, None, 10) works for any iterable and is memory-efficient, but probably is not the fastest solution for large list and big step.
(source_list[i] for i in xrange(0, len(source_list), 10))

Question 15

You can use the slice operator like this:

l = [1,2,3,4,5]
l2 = l[::2] # get subsequent 2nd item

Question 16

From manual: s[i:j:k] slice of s from i to j with step k

li = range(100)
sub = li[0::10]

>>> sub
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

Question 17

newlist = oldlist[::10]

This picks out every 10th element of the list.

Question 18

Why not just use a step parameter of range function as well to get:

l = range(0, 1000, 10)

For comparison, on my machine:

H:\>python -m timeit -s "l = range(1000)" "l1 = [x for x in l if x % 10 == 0]"
10000 loops, best of 3: 90.8 usec per loop
H:\>python -m timeit -s "l = range(1000)" "l1 = l[0::10]"
1000000 loops, best of 3: 0.861 usec per loop
H:\>python -m timeit -s "l = range(0, 1000, 10)"
100000000 loops, best of 3: 0.0172 usec per loop

Question 19

existing_list = range(0, 1001)
filtered_list = [i for i in existing_list if i % 10 == 0]

Question 20

Here is a better implementation of an “every 10th item” list comprehension, that does not use the list contents as part of the membership test:

>>> l = range(165)
>>> [ item for i,item in enumerate(l) if i%10==0 ]
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160]
>>> l = list("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
>>> [ item for i,item in enumerate(l) if i%10==0 ]
['A', 'K', 'U']

But this is still far slower than just using list slicing.

Question 21

List comprehensions are exactly made for that:

smaller_list = [x for x in range(100001) if x % 10 == 0]

You can get more info about them in the python official documentation: http://docs.python.org/tutorial/datastructures.html#list-comprehensions

Question 22

Is there a built-in function that works like zip() but that will pad the results so that the length of the resultant list is the length of the longest input rather than the shortest input?

>>> a = ['a1']
>>> b = ['b1', 'b2', 'b3']
>>> c = ['c1', 'c2']

>>> zip(a, b, c)
[('a1', 'b1', 'c1')]

>>> What command goes here?
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

Question 23

In Python 3 you can use itertools.zip_longest

>>> list(itertools.zip_longest(a, b, c))
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

You can pad with a different value than None by using the fillvalue parameter:

>>> list(itertools.zip_longest(a, b, c, fillvalue='foo'))
[('a1', 'b1', 'c1'), ('foo', 'b2', 'c2'), ('foo', 'b3', 'foo')]

With Python 2 you can either use itertools.izip_longest (Python 2.6+), or you can use map with None. It is a little known feature of map (but map changed in Python 3.x, so this only works in Python 2.x).

>>> map(None, a, b, c)
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

Question 24

For Python 2.6x use itertools module’s izip_longest.

For Python 3 use zip_longest instead (no leading i).

>>> list(itertools.izip_longest(a, b, c))
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

Question 25

non itertools Python 3 solution:

def zip_longest(*lists):
    def g(l):
        for item in l:
            yield item
        while True:
            yield None
    gens = [g(l) for l in lists]    
    for _ in range(max(map(len, lists))):
        yield tuple(next(g) for g in gens)

Question 26

non itertools My Python 2 solution:

if len(list1) < len(list2):
    list1.extend([None] * (len(list2) - len(list1)))
else:
    list2.extend([None] * (len(list1) - len(list2)))

Question 27

Im using a 2d array but the concept is the similar using python 2.x:

if len(set([len(p) for p in printer])) > 1:
    printer = [column+['']*(max([len(p) for p in printer])-len(column)) for column in printer]

Question 28

I have a list of filenames in python and I would want to construct a set out of all the filenames.

filelist=[]
for filename in filelist:
    set(filename)

This does not seem to work. How can do this?

Question 29

If you have a list of hashable objects (filenames would probably be strings, so they should count):

lst = ['foo.py', 'bar.py', 'baz.py', 'qux.py', Ellipsis]

you can construct the set directly:

s = set(lst)

In fact, set will work this way with any iterable object! (Isn’t duck typing great?)

If you want to do it iteratively:

s = set()
for item in iterable:
    s.add(item)

But there’s rarely a need to do it this way. I only mention it because the set.add method is quite useful.

Question 30

The most direct solution is this:

s = set(filelist)

The issue in your original code is that the values weren’t being assigned to the set. Here’s the fixed-up version of your code:

s = set()
for filename in filelist:
    s.add(filename)
print(s)

Question 31

You can do

my_set = set(my_list)

or, in Python 3,

my_set = {*my_list}

to create a set from a list. Conversely, you can also do

my_list = list(my_set)

or, in Python 3,

my_list = [*my_set]

to create a list from a set.

Just note that the order of the elements in a list is generally lost when converting the list to a set since a set is inherently unordered. (One exception in CPython, though, seems to be if the list consists only of non-negative integers, but I assume this is a consequence of the implementation of sets in CPython and that this behavior can vary between different Python implementations.)

Question 32

Here is another solution:

>>>list1=["C:\\","D:\\","E:\\","C:\\"]
>>>set1=set(list1)
>>>set1
set(['E:\\', 'D:\\', 'C:\\'])

In this code I have used the set method in order to turn it into a set and then it removed all duplicate values from the list

Question 33

One general way to construct set in iterative way like this:

aset = {e for e in alist}

Question 34

Simply put the line:

new_list = set(your_list)

Question 35

I am iterating over a list and I want to print out the index of the item if it meets a certain condition. How would I do this?

Example:

testlist = [1,2,3,5,3,1,2,1,6]
for item in testlist:
    if item == 1:
        print position

Question 36

Hmmm. There was an answer with a list comprehension here, but it’s disappeared.

Here:

 [i for i,x in enumerate(testlist) if x == 1]

Example:

>>> testlist
[1, 2, 3, 5, 3, 1, 2, 1, 6]
>>> [i for i,x in enumerate(testlist) if x == 1]
[0, 5, 7]

Update:

Okay, you want a generator expression, we’ll have a generator expression. Here’s the list comprehension again, in a for loop:

>>> for i in [i for i,x in enumerate(testlist) if x == 1]:
...     print i
... 
0
5
7

Now we’ll construct a generator…

>>> (i for i,x in enumerate(testlist) if x == 1)
<generator object at 0x6b508>
>>> for i in (i for i,x in enumerate(testlist) if x == 1):
...     print i
... 
0
5
7

and niftily enough, we can assign that to a variable, and use it from there…

>>> gen = (i for i,x in enumerate(testlist) if x == 1)
>>> for i in gen: print i
... 
0
5
7

And to think I used to write FORTRAN.

Question 37

What about the following?

print testlist.index(element)

If you are not sure whether the element to look for is actually in the list, you can add a preliminary check, like

if element in testlist:
    print testlist.index(element)

or

print(testlist.index(element) if element in testlist else None)

or the “pythonic way”, which I don’t like so much because code is less clear, but sometimes is more efficient,

try:
    print testlist.index(element)
except ValueError:
    pass

Question 38

Use enumerate:

testlist = [1,2,3,5,3,1,2,1,6]
for position, item in enumerate(testlist):
    if item == 1:
        print position

Question 39

for i in xrange(len(testlist)):
  if testlist[i] == 1:
    print i

xrange instead of range as requested (see comments).

Question 40

Here is another way to do this:

try:
   id = testlist.index('1')
   print testlist[id]
except ValueError:
   print "Not Found"

Question 41

[x for x in range(len(testlist)) if testlist[x]==1]

Question 42

Try the below:

testlist = [1,2,3,5,3,1,2,1,6]    
position=0
for i in testlist:
   if i == 1:
      print(position)
   position=position+1

Question 43

If your list got large enough and you only expected to find the value in a sparse number of indices, consider that this code could execute much faster because you don’t have to iterate every value in the list.

lookingFor = 1
i = 0
index = 0
try:
  while i < len(testlist):
    index = testlist.index(lookingFor,i)
    i = index + 1
    print index
except ValueError: #testlist.index() cannot find lookingFor
  pass

If you expect to find the value a lot you should probably just append “index” to a list and print the list at the end to save time per iteration.

Question 44

I think that it might be useful to use the curselection() method from thte Tkinter library:

from Tkinter import * 
listbox.curselection()

This method works on Tkinter listbox widgets, so you’ll need to construct one of them instead of a list.

This will return a position like this:

(‘0’,) (although later versions of Tkinter may return a list of ints instead)

Which is for the first position and the number will change according to the item position.

For more information, see this page: http://effbot.org/tkinterbook/listbox.htm

Greetings.

Question 45

Why complicate things?

testlist = [1,2,3,5,3,1,2,1,6]
for position, item in enumerate(testlist):
    if item == 1:
        print position

Question 46

Just to illustrate complete example along with the input_list which has searies1 (example: input_list[0]) in which you want to do a lookup of series2 (example: input_list[1]) and get indexes of series2 if it exists in series1.

Note: Your certain condition will go in lambda expression if conditions are simple

input_list = [[1,2,3,4,5,6,7],[1,3,7]]
series1 = input_list[0]
series2 = input_list[1]
idx_list = list(map(lambda item: series1.index(item) if item in series1 else None, series2))
print(idx_list)

output:

[0, 2, 6]

Question 47

testlist = [1,2,3,5,3,1,2,1,6]
for id, value in enumerate(testlist):
    if id == 1:
        print testlist[id]

I guess that it’s exacly what you want. ;-) ‘id’ will be always the index of the values on the list.

Question 48

Given a string that is a sequence of several values separated by a commma:

mStr = 'A,B,C,D,E'

How do I convert the string to a list?

mList = ['A', 'B', 'C', 'D', 'E']

Question 49

You can use the str.split method.

>>> my_string = 'A,B,C,D,E'
>>> my_list = my_string.split(",")
>>> print my_list
['A', 'B', 'C', 'D', 'E']

If you want to convert it to a tuple, just

>>> print tuple(my_list)
('A', 'B', 'C', 'D', 'E')

If you are looking to append to a list, try this:

>>> my_list.append('F')
>>> print my_list
['A', 'B', 'C', 'D', 'E', 'F']

Question 50

In the case of integers that are included at the string, if you want to avoid casting them to int individually you can do:

mList = [int(e) if e.isdigit() else e for e in mStr.split(',')]

It is called list comprehension, and it is based on set builder notation.

ex:

>>> mStr = "1,A,B,3,4"
>>> mList = [int(e) if e.isdigit() else e for e in mStr.split(',')]
>>> mList
>>> [1,'A','B',3,4]

Question 51

>>> some_string='A,B,C,D,E'
>>> new_tuple= tuple(some_string.split(','))
>>> new_tuple
('A', 'B', 'C', 'D', 'E')

Question 52

You can use this function to convert comma-delimited single character strings to list-

def stringtolist(x):
    mylist=[]
    for i in range(0,len(x),2):
        mylist.append(x[i])
    return mylist

Question 53

#splits string according to delimeters 
'''
Let's make a function that can split a string
into list according the given delimeters. 
example data: cat;dog:greff,snake/
example delimeters: ,;- /|:
'''
def string_to_splitted_array(data,delimeters):
    #result list
    res = []
    # we will add chars into sub_str until
    # reach a delimeter
    sub_str = ''
    for c in data: #iterate over data char by char
        # if we reached a delimeter, we store the result 
        if c in delimeters: 
            # avoid empty strings
            if len(sub_str)>0:
                # looks like a valid string.
                res.append(sub_str)
                # reset sub_str to start over
                sub_str = ''
        else:
            # c is not a deilmeter. then it is 
            # part of the string.
            sub_str += c
    # there may not be delimeter at end of data. 
    # if sub_str is not empty, we should att it to list. 
    if len(sub_str)>0:
        res.append(sub_str)
    # result is in res 
    return res

# test the function. 
delimeters = ',;- /|:'
# read the csv data from console. 
csv_string = input('csv string:')
#lets check if working. 
splitted_array = string_to_splitted_array(csv_string,delimeters)
print(splitted_array)

Question 54

Consider the following in order to handle the case of an empty string:

>>> my_string = 'A,B,C,D,E'
>>> my_string.split(",") if my_string else []
['A', 'B', 'C', 'D', 'E']
>>> my_string = ""
>>> my_string.split(",") if my_string else []
[]

Question 55

You can split that string on , and directly get a list:

mStr = 'A,B,C,D,E'
list1 = mStr.split(',')
print(list1)

Output:

['A', 'B', 'C', 'D', 'E']

You can also convert it to an n-tuple:

print(tuple(list1))

Output:

('A', 'B', 'C', 'D', 'E')

Question 56

I have the following DataFrame:

customer    item1      item2    item3
1           apple      milk     tomato
2           water      orange   potato
3           juice      mango    chips

which I want to translate it to list of dictionaries per row

rows = [{'customer': 1, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
    {'customer': 2, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
    {'customer': 3, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]

Question 57

Edit

As John Galt mentions in his answer , you should probably instead use df.to_dict('records'). It’s faster than transposing manually.

In [20]: timeit df.T.to_dict().values()
1000 loops, best of 3: 395 µs per loop

In [21]: timeit df.to_dict('records')
10000 loops, best of 3: 53 µs per loop

Original answer

Use df.T.to_dict().values(), like below:

In [1]: df
Out[1]:
   customer  item1   item2   item3
0         1  apple    milk  tomato
1         2  water  orange  potato
2         3  juice   mango   chips

In [2]: df.T.to_dict().values()
Out[2]:
[{'customer': 1.0, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
 {'customer': 2.0, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
 {'customer': 3.0, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]

Question 58

Use df.to_dict('records') — gives the output without having to transpose externally.

In [2]: df.to_dict('records')
Out[2]:
[{'customer': 1L, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
 {'customer': 2L, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
 {'customer': 3L, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]

Question 59

As an extension to John Galt’s answer –

For the following DataFrame,

   customer  item1   item2   item3
0         1  apple    milk  tomato
1         2  water  orange  potato
2         3  juice   mango   chips

If you want to get a list of dictionaries including the index values, you can do something like,

df.to_dict('index')

Which outputs a dictionary of dictionaries where keys of the parent dictionary are index values. In this particular case,

{0: {'customer': 1, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
 1: {'customer': 2, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
 2: {'customer': 3, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}}

Question 60

If you are interested in only selecting one column this will work.

df[["item1"]].to_dict("records")

The below will NOT work and produces a TypeError: unsupported type: . I believe this is because it is trying to convert a series to a dict and not a Data Frame to a dict.

df["item1"].to_dict("records")

I had a requirement to only select one column and convert it to a list of dicts with the column name as the key and was stuck on this for a bit so figured I’d share.

Question 61

I have a dataframe where some cells contain lists of multiple values. Rather than storing multiple values in a cell, I’d like to expand the dataframe so that each item in the list gets its own row (with the same values in all other columns). So if I have:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {'trial_num': [1, 2, 3, 1, 2, 3],
     'subject': [1, 1, 1, 2, 2, 2],
     'samples': [list(np.random.randn(3).round(2)) for i in range(6)]
    }
)

df
Out[10]: 
                 samples  subject  trial_num
0    [0.57, -0.83, 1.44]        1          1
1    [-0.01, 1.13, 0.36]        1          2
2   [1.18, -1.46, -0.94]        1          3
3  [-0.08, -4.22, -2.05]        2          1
4     [0.72, 0.79, 0.53]        2          2
5    [0.4, -0.32, -0.13]        2          3

How do I convert to long form, e.g.:

   subject  trial_num  sample  sample_num
0        1          1    0.57           0
1        1          1   -0.83           1
2        1          1    1.44           2
3        1          2   -0.01           0
4        1          2    1.13           1
5        1          2    0.36           2
6        1          3    1.18           0
# etc.

The index is not important, it’s OK to set existing columns as the index and the final ordering isn’t important.

Question 62

lst_col = 'samples'

r = pd.DataFrame({
      col:np.repeat(df[col].values, df[lst_col].str.len())
      for col in df.columns.drop(lst_col)}
    ).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]

Result:

In [103]: r
Out[103]:
    samples  subject  trial_num
0      0.10        1          1
1     -0.20        1          1
2      0.05        1          1
3      0.25        1          2
4      1.32        1          2
5     -0.17        1          2
6      0.64        1          3
7     -0.22        1          3
8     -0.71        1          3
9     -0.03        2          1
10    -0.65        2          1
11     0.76        2          1
12     1.77        2          2
13     0.89        2          2
14     0.65        2          2
15    -0.98        2          3
16     0.65        2          3
17    -0.30        2          3

PS here you may find a bit more generic solution

UPDATE: some explanations: IMO the easiest way to understand this code is to try to execute it step-by-step:

in the following line we are repeating values in one column N times where N – is the length of the corresponding list:

In [10]: np.repeat(df['trial_num'].values, df[lst_col].str.len())
Out[10]: array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int64)

this can be generalized for all columns, containing scalar values:

In [11]: pd.DataFrame({
    ...:           col:np.repeat(df[col].values, df[lst_col].str.len())
    ...:           for col in df.columns.drop(lst_col)}
    ...:         )
Out[11]:
    trial_num  subject
0           1        1
1           1        1
2           1        1
3           2        1
4           2        1
5           2        1
6           3        1
..        ...      ...
11          1        2
12          2        2
13          2        2
14          2        2
15          3        2
16          3        2
17          3        2

[18 rows x 2 columns]

using np.concatenate() we can flatten all values in the list column (samples) and get a 1D vector:

In [12]: np.concatenate(df[lst_col].values)
Out[12]: array([-1.04, -0.58, -1.32,  0.82, -0.59, -0.34,  0.25,  2.09,  0.12,  0.83, -0.88,  0.68,  0.55, -0.56,  0.65, -0.04,  0.36, -0.31])

putting all this together:

In [13]: pd.DataFrame({
    ...:           col:np.repeat(df[col].values, df[lst_col].str.len())
    ...:           for col in df.columns.drop(lst_col)}
    ...:         ).assign(**{lst_col:np.concatenate(df[lst_col].values)})
Out[13]:
    trial_num  subject  samples
0           1        1    -1.04
1           1        1    -0.58
2           1        1    -1.32
3           2        1     0.82
4           2        1    -0.59
5           2        1    -0.34
6           3        1     0.25
..        ...      ...      ...
11          1        2     0.68
12          2        2     0.55
13          2        2    -0.56
14          2        2     0.65
15          3        2    -0.04
16          3        2     0.36
17          3        2    -0.31

[18 rows x 3 columns]

using pd.DataFrame()[df.columns] will guarantee that we are selecting columns in the original order…

Question 63

A bit longer than I expected:

>>> df
                samples  subject  trial_num
0  [-0.07, -2.9, -2.44]        1          1
1   [-1.52, -0.35, 0.1]        1          2
2  [-0.17, 0.57, -0.65]        1          3
3  [-0.82, -1.06, 0.47]        2          1
4   [0.79, 1.35, -0.09]        2          2
5   [1.17, 1.14, -1.79]        2          3
>>>
>>> s = df.apply(lambda x: pd.Series(x['samples']),axis=1).stack().reset_index(level=1, drop=True)
>>> s.name = 'sample'
>>>
>>> df.drop('samples', axis=1).join(s)
   subject  trial_num  sample
0        1          1   -0.07
0        1          1   -2.90
0        1          1   -2.44
1        1          2   -1.52
1        1          2   -0.35
1        1          2    0.10
2        1          3   -0.17
2        1          3    0.57
2        1          3   -0.65
3        2          1   -0.82
3        2          1   -1.06
3        2          1    0.47
4        2          2    0.79
4        2          2    1.35
4        2          2   -0.09
5        2          3    1.17
5        2          3    1.14
5        2          3   -1.79

If you want sequential index, you can apply reset_index(drop=True) to the result.

update:

>>> res = df.set_index(['subject', 'trial_num'])['samples'].apply(pd.Series).stack()
>>> res = res.reset_index()
>>> res.columns = ['subject','trial_num','sample_num','sample']
>>> res
    subject  trial_num  sample_num  sample
0         1          1           0    1.89
1         1          1           1   -2.92
2         1          1           2    0.34
3         1          2           0    0.85
4         1          2           1    0.24
5         1          2           2    0.72
6         1          3           0   -0.96
7         1          3           1   -2.72
8         1          3           2   -0.11
9         2          1           0   -1.33
10        2          1           1    3.13
11        2          1           2   -0.65
12        2          2           0    0.10
13        2          2           1    0.65
14        2          2           2    0.15
15        2          3           0    0.64
16        2          3           1   -0.10
17        2          3           2   -0.76

Question 64

Pandas >= 0.25

Series and DataFrame methods define a .explode() method that explodes lists into separate rows. See the docs section on Exploding a list-like column.

df = pd.DataFrame({
    'var1': [['a', 'b', 'c'], ['d', 'e',], [], np.nan], 
    'var2': [1, 2, 3, 4]
})
df
        var1  var2
0  [a, b, c]     1
1     [d, e]     2
2         []     3
3        NaN     4

df.explode('var1')

  var1  var2
0    a     1
0    b     1
0    c     1
1    d     2
1    e     2
2  NaN     3  # empty list converted to NaN
3  NaN     4  # NaN entry preserved as-is

# to reset the index to be monotonically increasing...
df.explode('var1').reset_index(drop=True)

  var1  var2
0    a     1
1    b     1
2    c     1
3    d     2
4    e     2
5  NaN     3
6  NaN     4

Note that this also handles mixed columns of lists and scalars, as well as empty lists and NaNs appropriately (this is a drawback of repeat-based solutions).

However, you should note that explode only works on a single column (for now).

P.S.: if you are looking to explode a column of strings, you need to split on a separator first, then use explode. See this (very much) related answer by me.

问题：分配如何与Python列表切片一起使用？

回答 0

回答 1

回答 2

回答 3

问题：在列表中的特定索引处插入元素，然后返回更新后的列表

回答 0

回答 1

最高效的方法

所有解决方案的性能比较

Most performance efficient approach

Performance comparison of all solutions

回答 2

回答 3

回答 4

问题：返回较大列表中第n个项目的列表的Python方法

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：是否有一个类似zip的函数可以在Python中填充最长的长度？

回答 0

回答 1

回答 2

回答 3

回答 4

问题：如何在python中构造一组列表项？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：如何获取项目在列表中的位置？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

问题：如何将逗号分隔的字符串转换为Python中的列表？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：Pandas DataFrame到字典列表

回答 0

编辑

原始答案

Edit

Original answer

回答 1

回答 2

回答 3

问题：列表的熊猫列，为每个列表元素创建一行

回答 0

回答 1

回答 2

熊猫> = 0.25

Pandas >= 0.25

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8