设置熊猫数据框中的列顺序

问题:设置熊猫数据框中的列顺序

有没有一种方法可以根据我的个人喜好(即不按字母或数字排序,而是更像遵循某些约定)对熊猫数据框中的列进行重新排序?

简单的例子:

frame = pd.DataFrame({
        'one thing':[1,2,3,4],
        'second thing':[0.1,0.2,1,2],
        'other thing':['a','e','i','o']})

产生这个:

   one thing other thing  second thing
0          1           a           0.1
1          2           e           0.2
2          3           i           1.0
3          4           o           2.0

但是,我想这样:

   one thing second thing  other thing
0          1           0.1           a
1          2           0.2           e
2          3           1.0           i
3          4           2.0           o

(请提供通用解决方案,而不是针对此情况。非常感谢。)

Is there a way to reorder columns in pandas dataframe based on my personal preference (i.e. not alphabetically or numerically sorted, but more like following certain conventions)?

Simple example:

frame = pd.DataFrame({
        'one thing':[1,2,3,4],
        'second thing':[0.1,0.2,1,2],
        'other thing':['a','e','i','o']})

produces this:

   one thing other thing  second thing
0          1           a           0.1
1          2           e           0.2
2          3           i           1.0
3          4           o           2.0

But instead, I would like this:

   one thing second thing  other thing
0          1           0.1           a
1          2           0.2           e
2          3           1.0           i
3          4           2.0           o

(Please, provide a generic solution rather than specific to this case. Many thanks.)


回答 0

只需输入列名称即可自己选择订单。请注意双括号:

frame = frame[['column I want first', 'column I want second'...etc.]]

Just select the order yourself by typing in the column names. Note the double brackets:

frame = frame[['column I want first', 'column I want second'...etc.]]

回答 1

您可以使用此:

columnsTitles = ['onething', 'secondthing', 'otherthing']

frame = frame.reindex(columns=columnsTitles)

You can use this:

columnsTitles = ['onething', 'secondthing', 'otherthing']

frame = frame.reindex(columns=columnsTitles)

回答 2

这是我经常使用的解决方案。当您拥有包含大量列的大型数据集时,您绝对不希望手动重新排列所有列。

您可以而且很可能想做的是只是对您经常使用的前几列进行排序,而让所有其他列成为自己。这是R中的常用方法。df %>%select(one, two, three, everything())

因此,您可以首先手动键入要排序的列和要位于列表中所有其他列之前的列cols_to_order

然后,通过组合其余各列来构造新列的列表:

new_columns = cols_to_order + (frame.columns.drop(cols_to_order).tolist())

之后,您可以使用new_columns建议的其他解决方案。

import pandas as pd
frame = pd.DataFrame({
    'one thing': [1, 2, 3, 4],
    'other thing': ['a', 'e', 'i', 'o'],
    'more things': ['a', 'e', 'i', 'o'],
    'second thing': [0.1, 0.2, 1, 2],
})

cols_to_order = ['one thing', 'second thing']
new_columns = cols_to_order + (frame.columns.drop(cols_to_order).tolist())
frame = frame[new_columns]

   one thing  second thing other thing more things
0          1           0.1           a           a
1          2           0.2           e           e
2          3           1.0           i           i
3          4           2.0           o           o

Here is a solution I use very often. When you have a large data set with tons of columns, you definitely do not want to manually rearrange all the columns.

What you can and, most likely, want to do is to just order the first a few columns that you frequently use, and let all other columns just be themselves. This is a common approach in R. df %>%select(one, two, three, everything())

So you can first manually type the columns that you want to order and to be positioned before all the other columns in a list cols_to_order.

Then you construct a list for new columns by combining the rest of the columns:

new_columns = cols_to_order + (frame.columns.drop(cols_to_order).tolist())

After this, you can use the new_columns as other solutions suggested.

import pandas as pd
frame = pd.DataFrame({
    'one thing': [1, 2, 3, 4],
    'other thing': ['a', 'e', 'i', 'o'],
    'more things': ['a', 'e', 'i', 'o'],
    'second thing': [0.1, 0.2, 1, 2],
})

cols_to_order = ['one thing', 'second thing']
new_columns = cols_to_order + (frame.columns.drop(cols_to_order).tolist())
frame = frame[new_columns]

   one thing  second thing other thing more things
0          1           0.1           a           a
1          2           0.2           e           e
2          3           1.0           i           i
3          4           2.0           o           o

回答 3

您也可以做类似的事情 df = df[['x', 'y', 'a', 'b']]

import pandas as pd
frame = pd.DataFrame({'one thing':[1,2,3,4],'second thing':[0.1,0.2,1,2],'other thing':['a','e','i','o']})
frame = frame[['second thing', 'other thing', 'one thing']]
print frame
   second thing other thing  one thing
0           0.1           a          1
1           0.2           e          2
2           1.0           i          3
3           2.0           o          4

另外,您可以通过以下方式获取列列表:

cols = list(df.columns.values)

输出将产生如下内容:

['x', 'y', 'a', 'b']

这样就很容易手动重新排列。

You could also do something like df = df[['x', 'y', 'a', 'b']]

import pandas as pd
frame = pd.DataFrame({'one thing':[1,2,3,4],'second thing':[0.1,0.2,1,2],'other thing':['a','e','i','o']})
frame = frame[['second thing', 'other thing', 'one thing']]
print frame
   second thing other thing  one thing
0           0.1           a          1
1           0.2           e          2
2           1.0           i          3
3           2.0           o          4

Also, you can get the list of columns with:

cols = list(df.columns.values)

The output will produce something like this:

['x', 'y', 'a', 'b']

Which is then easy to rearrange manually.


回答 4

用列表而不是字典构造它

frame = pd.DataFrame([
        [1, .1, 'a'],
        [2, .2, 'e'],
        [3,  1, 'i'],
        [4,  4, 'o']
    ], columns=['one thing', 'second thing', 'other thing'])

frame

   one thing  second thing other thing
0          1           0.1           a
1          2           0.2           e
2          3           1.0           i
3          4           4.0           o

Construct it with a list instead of a dictionary

frame = pd.DataFrame([
        [1, .1, 'a'],
        [2, .2, 'e'],
        [3,  1, 'i'],
        [4,  4, 'o']
    ], columns=['one thing', 'second thing', 'other thing'])

frame

   one thing  second thing other thing
0          1           0.1           a
1          2           0.2           e
2          3           1.0           i
3          4           4.0           o

回答 5

您还可以使用OrderedDict:

In [183]: from collections import OrderedDict

In [184]: data = OrderedDict()

In [185]: data['one thing'] = [1,2,3,4]

In [186]: data['second thing'] = [0.1,0.2,1,2]

In [187]: data['other thing'] = ['a','e','i','o']

In [188]: frame = pd.DataFrame(data)

In [189]: frame
Out[189]:
   one thing  second thing other thing
0          1           0.1           a
1          2           0.2           e
2          3           1.0           i
3          4           2.0           o

You can also use OrderedDict:

In [183]: from collections import OrderedDict

In [184]: data = OrderedDict()

In [185]: data['one thing'] = [1,2,3,4]

In [186]: data['second thing'] = [0.1,0.2,1,2]

In [187]: data['other thing'] = ['a','e','i','o']

In [188]: frame = pd.DataFrame(data)

In [189]: frame
Out[189]:
   one thing  second thing other thing
0          1           0.1           a
1          2           0.2           e
2          3           1.0           i
3          4           2.0           o

回答 6

添加“ columns”参数:

frame = pd.DataFrame({
        'one thing':[1,2,3,4],
        'second thing':[0.1,0.2,1,2],
        'other thing':['a','e','i','o']},
        columns=['one thing', 'second thing', 'other thing']
)

Add the ‘columns’ parameter:

frame = pd.DataFrame({
        'one thing':[1,2,3,4],
        'second thing':[0.1,0.2,1,2],
        'other thing':['a','e','i','o']},
        columns=['one thing', 'second thing', 'other thing']
)

回答 7

尝试建立索引(因此您不仅需要通用的解决方案,因此索引顺序也可以是您想要的):

l=[0,2,1] # index order
frame=frame[[frame.columns[i] for i in l]]

现在:

print(frame)

是:

   one thing second thing  other thing
0          1           0.1           a
1          2           0.2           e
2          3           1.0           i
3          4           2.0           o

Try indexing (so you want a generic solution not only for this, so index order can be just what you want):

l=[0,2,1] # index order
frame=frame[[frame.columns[i] for i in l]]

Now:

print(frame)

Is:

   one thing second thing  other thing
0          1           0.1           a
1          2           0.2           e
2          3           1.0           i
3          4           2.0           o

回答 8

我发现这是最简单,最有效的方法:

df = pd.DataFrame({
        'one thing':[1,2,3,4],
        'second thing':[0.1,0.2,1,2],
        'other thing':['a','e','i','o']})

df = df[['one thing','second thing', 'other thing']]

I find this to be the most straightforward and working:

df = pd.DataFrame({
        'one thing':[1,2,3,4],
        'second thing':[0.1,0.2,1,2],
        'other thing':['a','e','i','o']})

df = df[['one thing','second thing', 'other thing']]