问题:从pandas DataFrame删除列

在删除DataFrame中的列时,我使用:

del df['column_name']

这很棒。为什么不能使用以下内容?

del df.column_name

由于可以按来访问列/系列df.column_name,因此我希望它能正常工作。

When deleting a column in a DataFrame I use:

del df['column_name']

And this works great. Why can’t I use the following?

del df.column_name

Since it is possible to access the column/Series as df.column_name, I expected this to work.


回答 0

如您所料,正确的语法是

del df['column_name']

del df.column_name仅由于Python的语法限制而使工作变得困难。del df[name]df.__delitem__(name)Python掩盖。

As you’ve guessed, the right syntax is

del df['column_name']

It’s difficult to make del df.column_name work simply as the result of syntactic limitations in Python. del df[name] gets translated to df.__delitem__(name) under the covers by Python.


回答 1

在熊猫中做到这一点的最好方法是使用drop

df = df.drop('column_name', 1)

其中1数(0行和1列的)。

要删除该列而无需重新分配df,可以执行以下操作:

df.drop('column_name', axis=1, inplace=True)

最后,要按列而不是按列标签删除,请尝试将其删除,例如第一,第二和第四列:

df = df.drop(df.columns[[0, 1, 3]], axis=1)  # df.columns is zero-based pd.Index 

还可以对列使用“文本”语法:

df.drop(['column_nameA', 'column_nameB'], axis=1, inplace=True)

The best way to do this in pandas is to use drop:

df = df.drop('column_name', 1)

where 1 is the axis number (0 for rows and 1 for columns.)

To delete the column without having to reassign df you can do:

df.drop('column_name', axis=1, inplace=True)

Finally, to drop by column number instead of by column label, try this to delete, e.g. the 1st, 2nd and 4th columns:

df = df.drop(df.columns[[0, 1, 3]], axis=1)  # df.columns is zero-based pd.Index 

Also working with “text” syntax for the columns:

df.drop(['column_nameA', 'column_nameB'], axis=1, inplace=True)

回答 2

采用:

columns = ['Col1', 'Col2', ...]
df.drop(columns, inplace=True, axis=1)

这将就地删除一个或多个列。请注意,该功能inplace=True已在pandas v0.13中添加,不适用于旧版本。在这种情况下,您必须将结果分配回去:

df = df.drop(columns, axis=1)

Use:

columns = ['Col1', 'Col2', ...]
df.drop(columns, inplace=True, axis=1)

This will delete one or more columns in-place. Note that inplace=True was added in pandas v0.13 and won’t work on older versions. You’d have to assign the result back in that case:

df = df.drop(columns, axis=1)

回答 3

按索引下降

删除第一,第二和第四列:

df.drop(df.columns[[0,1,3]], axis=1, inplace=True)

删除第一列:

df.drop(df.columns[[0]], axis=1, inplace=True)

有一个可选参数,inplace因此可以在不创建副本的情况下修改原始数据。

弹出

列选择,添加,删除

删除栏column-name

df.pop('column-name')

例子:

df = DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6]), ('C', [7,8, 9])], orient='index', columns=['one', 'two', 'three'])

print df

   one  two  three
A    1    2      3
B    4    5      6
C    7    8      9

df.drop(df.columns[[0]], axis=1, inplace=True) print df

   two  three
A    2      3
B    5      6
C    8      9

three = df.pop('three') print df

   two
A    2
B    5
C    8

Drop by index

Delete first, second and fourth columns:

df.drop(df.columns[[0,1,3]], axis=1, inplace=True)

Delete first column:

df.drop(df.columns[[0]], axis=1, inplace=True)

There is an optional parameter inplace so that the original data can be modified without creating a copy.

Popped

Column selection, addition, deletion

Delete column column-name:

df.pop('column-name')

Examples:

df = DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6]), ('C', [7,8, 9])], orient='index', columns=['one', 'two', 'three'])

print df:

   one  two  three
A    1    2      3
B    4    5      6
C    7    8      9

df.drop(df.columns[[0]], axis=1, inplace=True) print df:

   two  three
A    2      3
B    5      6
C    8      9

three = df.pop('three') print df:

   two
A    2
B    5
C    8

回答 4

此处提出的实际问题是大多数答案都遗漏的:

我为什么不能使用del df.column_name

首先,我们需要了解问题,这需要我们深入研究Python魔术方法

正如Wes在他的答案中指出的那样,它del df['column']映射到python 魔术方法 df.__delitem__('column'),该方法在熊猫中实现以删除列

但是,正如上面有关python魔术方法的链接所指出的:

实际上,__del__由于调用它的不稳定环境,几乎不应该使用它;谨慎使用!

您可能会认为del df['column_name']不应使用或鼓励这样做,因此del df.column_name甚至不应考虑。

然而,从理论上讲,del df.column_name可以Implemeted一个工作中使用熊猫魔术方法__delattr__。然而,这的确引入了某些问题,即del df['column_name']实施中已经存在的问题,但是程度较小。

示例问题

如果我在称为“ dtypes”或“ columns”的数据框中定义一列怎么办。

然后假设我要删除这些列。

del df.dtypes会使该__delattr__方法感到困惑,好像它应该删除“ dtypes”属性或“ dtypes”列一样。

这个问题背后的架构问题

  1. 数据框是的集合吗?
  2. 数据框是的集合吗?
  3. 列是数据框的属性吗?

熊猫答案:

  1. 是的,在所有方面
  2. 没有,但是如果你希望它是,你可以使用.ix.loc.iloc方法。
  3. 也许,您想读取数据吗?然后除非该属性的名称已被属于该数据帧的另一个属性采用。您要修改数据吗?那不行

TLDR;

您不能这样做,del df.column_name因为熊猫的结构非常疯狂,需要重新考虑,以免使用者出现这种认知失调

专家提示:

不要使用df.column_name,它可能很漂亮,但是会导致认知失调

适用于以下情况的Python Zen报价:

删除列有多种方法。

应该有一种-最好只有一种-显而易见的方法。

列有时是属性,但有时不是。

特殊情况不足以违反规则。

是否del df.dtypes删除dtypes属性或dtypes列?

面对模棱两可的想法,拒绝猜测的诱惑。

The actual question posed, missed by most answers here is:

Why can’t I use del df.column_name?

At first we need to understand the problem, which requires us to dive into python magic methods.

As Wes points out in his answer del df['column'] maps to the python magic method df.__delitem__('column') which is implemented in pandas to drop the column

However, as pointed out in the link above about python magic methods:

In fact, __del__ should almost never be used because of the precarious circumstances under which it is called; use it with caution!

You could argue that del df['column_name'] should not be used or encouraged, and thereby del df.column_name should not even be considered.

However, in theory, del df.column_name could be implemeted to work in pandas using the magic method __delattr__. This does however introduce certain problems, problems which the del df['column_name'] implementation already has, but in lesser degree.

Example Problem

What if I define a column in a dataframe called “dtypes” or “columns”.

Then assume I want to delete these columns.

del df.dtypes would make the __delattr__ method confused as if it should delete the “dtypes” attribute or the “dtypes” column.

Architectural questions behind this problem

  1. Is a dataframe a collection of columns?
  2. Is a dataframe a collection of rows?
  3. Is a column an attribute of a dataframe?

Pandas answers:

  1. Yes, in all ways
  2. No, but if you want it to be, you can use the .ix, .loc or .iloc methods.
  3. Maybe, do you want to read data? Then yes, unless the name of the attribute is already taken by another attribute belonging to the dataframe. Do you want to modify data? Then no.

TLDR;

You cannot do del df.column_name because pandas has a quite wildly grown architecture that needs to be reconsidered in order for this kind of cognitive dissonance not to occur to its users.

Protip:

Don’t use df.column_name, It may be pretty, but it causes cognitive dissonance

Zen of Python quotes that fits in here:

There are multiple ways of deleting a column.

There should be one– and preferably only one –obvious way to do it.

Columns are sometimes attributes but sometimes not.

Special cases aren’t special enough to break the rules.

Does del df.dtypes delete the dtypes attribute or the dtypes column?

In the face of ambiguity, refuse the temptation to guess.


回答 5

一个不错的附加功能是仅在存在列的情况下才删除列的功能。这样,您可以涵盖更多用例,并且只会从传递给它的标签中删除现有列:

例如,只需添加errors =’ignore’::

df.drop(['col_name_1', 'col_name_2', ..., 'col_name_N'], inplace=True, axis=1, errors='ignore')
  • 这是从熊猫0.16.1开始的新功能。文档在这里

A nice addition is the ability to drop columns only if they exist. This way you can cover more use cases, and it will only drop the existing columns from the labels passed to it:

Simply add errors=’ignore’, for example.:

df.drop(['col_name_1', 'col_name_2', ..., 'col_name_N'], inplace=True, axis=1, errors='ignore')
  • This is new from pandas 0.16.1 onward. Documentation is here.

回答 6

从0.16.1版本开始就可以

df.drop(['column_name'], axis = 1, inplace = True, errors = 'ignore')

from version 0.16.1 you can do

df.drop(['column_name'], axis = 1, inplace = True, errors = 'ignore')

回答 7

始终使用该[]符号是个好习惯。原因之一是属性符号(df.column_name)对编号索引不起作用:

In [1]: df = DataFrame([[1, 2, 3], [4, 5, 6]])

In [2]: df[1]
Out[2]:
0    2
1    5
Name: 1

In [3]: df.1
  File "<ipython-input-3-e4803c0d1066>", line 1
    df.1
       ^
SyntaxError: invalid syntax

It’s good practice to always use the [] notation. One reason is that attribute notation (df.column_name) does not work for numbered indices:

In [1]: df = DataFrame([[1, 2, 3], [4, 5, 6]])

In [2]: df[1]
Out[2]:
0    2
1    5
Name: 1

In [3]: df.1
  File "<ipython-input-3-e4803c0d1066>", line 1
    df.1
       ^
SyntaxError: invalid syntax

回答 8

熊猫0.21+答案

熊猫0.21版对方法进行了少许更改,以包括indexcolumns参数,以匹配renamereindex方法的签名。

df.drop(columns=['column_a', 'column_c'])

就我个人而言,我更喜欢使用该axis参数来表示列或索引,因为它是几乎所有熊猫方法中使用的主要关键字参数。但是,现在您在0.21版中有了一些附加选择。

Pandas 0.21+ answer

Pandas version 0.21 has changed the method slightly to include both the index and columns parameters to match the signature of the rename and reindex methods.

df.drop(columns=['column_a', 'column_c'])

Personally, I prefer using the axis parameter to denote columns or index because it is the predominant keyword parameter used in nearly all pandas methods. But, now you have some added choices in version 0.21.


回答 9

在pandas 0.16.1+中,只有按照@eiTanLaVi发布的解决方案存在的情况下,才能删除列。在该版本之前,您可以通过条件列表理解来获得相同的结果:

df.drop([col for col in ['col_name_1','col_name_2',...,'col_name_N'] if col in df], 
        axis=1, inplace=True)

In pandas 0.16.1+ you can drop columns only if they exist per the solution posted by @eiTanLaVi. Prior to that version, you can achieve the same result via a conditional list comprehension:

df.drop([col for col in ['col_name_1','col_name_2',...,'col_name_N'] if col in df], 
        axis=1, inplace=True)

回答 10

TL; DR

寻找一点点更有效的解决方案需要付出很多努力。难以证明增加的复杂性,同时又牺牲了简单性df.drop(dlst, 1, errors='ignore')

df.reindex_axis(np.setdiff1d(df.columns.values, dlst), 1)

前言
删除列在语义上与选择其他列相同。我将展示一些其他方法可供考虑。

我还将关注一下一次删除多个列并允许尝试删除不存在的列的一般解决方案。

通常使用这些解决方案,并且也适用于简单情况。


设置
考虑pd.DataFrame df和要删除的列表dlst

df = pd.DataFrame(dict(zip('ABCDEFGHIJ', range(1, 11))), range(3))
dlst = list('HIJKLM')

df

   A  B  C  D  E  F  G  H  I   J
0  1  2  3  4  5  6  7  8  9  10
1  1  2  3  4  5  6  7  8  9  10
2  1  2  3  4  5  6  7  8  9  10

dlst

['H', 'I', 'J', 'K', 'L', 'M']

结果应如下所示:

df.drop(dlst, 1, errors='ignore')

   A  B  C  D  E  F  G
0  1  2  3  4  5  6  7
1  1  2  3  4  5  6  7
2  1  2  3  4  5  6  7

由于我将删除列等同于选择其他列,因此将其分为两种类型:

  1. 标签选择
  2. 布尔选择

标签选择

我们首先制造标签的列表/数组,这些标签表示要保留的列而没有要删除的列。

  1. df.columns.difference(dlst)

    Index(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype='object')
  2. np.setdiff1d(df.columns.values, dlst)

    array(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype=object)
  3. df.columns.drop(dlst, errors='ignore')

    Index(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype='object')
  4. list(set(df.columns.values.tolist()).difference(dlst))

    # does not preserve order
    ['E', 'D', 'B', 'F', 'G', 'A', 'C']
  5. [x for x in df.columns.values.tolist() if x not in dlst]

    ['A', 'B', 'C', 'D', 'E', 'F', 'G']

标签中
的列为了比较选择过程,假定:

 cols = [x for x in df.columns.values.tolist() if x not in dlst]

然后我们可以评估

  1. df.loc[:, cols]
  2. df[cols]
  3. df.reindex(columns=cols)
  4. df.reindex_axis(cols, 1)

全部评估为:

   A  B  C  D  E  F  G
0  1  2  3  4  5  6  7
1  1  2  3  4  5  6  7
2  1  2  3  4  5  6  7

布尔切片

我们可以构造一个布尔数组/列表进行切片

  1. ~df.columns.isin(dlst)
  2. ~np.in1d(df.columns.values, dlst)
  3. [x not in dlst for x in df.columns.values.tolist()]
  4. (df.columns.values[:, None] != dlst).all(1)

布尔中
的列为了比较

bools = [x not in dlst for x in df.columns.values.tolist()]
  1. df.loc[: bools]

全部评估为:

   A  B  C  D  E  F  G
0  1  2  3  4  5  6  7
1  1  2  3  4  5  6  7
2  1  2  3  4  5  6  7

稳健的时机

功能

setdiff1d = lambda df, dlst: np.setdiff1d(df.columns.values, dlst)
difference = lambda df, dlst: df.columns.difference(dlst)
columndrop = lambda df, dlst: df.columns.drop(dlst, errors='ignore')
setdifflst = lambda df, dlst: list(set(df.columns.values.tolist()).difference(dlst))
comprehension = lambda df, dlst: [x for x in df.columns.values.tolist() if x not in dlst]

loc = lambda df, cols: df.loc[:, cols]
slc = lambda df, cols: df[cols]
ridx = lambda df, cols: df.reindex(columns=cols)
ridxa = lambda df, cols: df.reindex_axis(cols, 1)

isin = lambda df, dlst: ~df.columns.isin(dlst)
in1d = lambda df, dlst: ~np.in1d(df.columns.values, dlst)
comp = lambda df, dlst: [x not in dlst for x in df.columns.values.tolist()]
brod = lambda df, dlst: (df.columns.values[:, None] != dlst).all(1)

测试中

res1 = pd.DataFrame(
    index=pd.MultiIndex.from_product([
        'loc slc ridx ridxa'.split(),
        'setdiff1d difference columndrop setdifflst comprehension'.split(),
    ], names=['Select', 'Label']),
    columns=[10, 30, 100, 300, 1000],
    dtype=float
)

res2 = pd.DataFrame(
    index=pd.MultiIndex.from_product([
        'loc'.split(),
        'isin in1d comp brod'.split(),
    ], names=['Select', 'Label']),
    columns=[10, 30, 100, 300, 1000],
    dtype=float
)

res = res1.append(res2).sort_index()

dres = pd.Series(index=res.columns, name='drop')

for j in res.columns:
    dlst = list(range(j))
    cols = list(range(j // 2, j + j // 2))
    d = pd.DataFrame(1, range(10), cols)
    dres.at[j] = timeit('d.drop(dlst, 1, errors="ignore")', 'from __main__ import d, dlst', number=100)
    for s, l in res.index:
        stmt = '{}(d, {}(d, dlst))'.format(s, l)
        setp = 'from __main__ import d, dlst, {}, {}'.format(s, l)
        res.at[(s, l), j] = timeit(stmt, setp, number=100)

rs = res / dres

rs

                          10        30        100       300        1000
Select Label                                                           
loc    brod           0.747373  0.861979  0.891144  1.284235   3.872157
       columndrop     1.193983  1.292843  1.396841  1.484429   1.335733
       comp           0.802036  0.732326  1.149397  3.473283  25.565922
       comprehension  1.463503  1.568395  1.866441  4.421639  26.552276
       difference     1.413010  1.460863  1.587594  1.568571   1.569735
       in1d           0.818502  0.844374  0.994093  1.042360   1.076255
       isin           1.008874  0.879706  1.021712  1.001119   0.964327
       setdiff1d      1.352828  1.274061  1.483380  1.459986   1.466575
       setdifflst     1.233332  1.444521  1.714199  1.797241   1.876425
ridx   columndrop     0.903013  0.832814  0.949234  0.976366   0.982888
       comprehension  0.777445  0.827151  1.108028  3.473164  25.528879
       difference     1.086859  1.081396  1.293132  1.173044   1.237613
       setdiff1d      0.946009  0.873169  0.900185  0.908194   1.036124
       setdifflst     0.732964  0.823218  0.819748  0.990315   1.050910
ridxa  columndrop     0.835254  0.774701  0.907105  0.908006   0.932754
       comprehension  0.697749  0.762556  1.215225  3.510226  25.041832
       difference     1.055099  1.010208  1.122005  1.119575   1.383065
       setdiff1d      0.760716  0.725386  0.849949  0.879425   0.946460
       setdifflst     0.710008  0.668108  0.778060  0.871766   0.939537
slc    columndrop     1.268191  1.521264  2.646687  1.919423   1.981091
       comprehension  0.856893  0.870365  1.290730  3.564219  26.208937
       difference     1.470095  1.747211  2.886581  2.254690   2.050536
       setdiff1d      1.098427  1.133476  1.466029  2.045965   3.123452
       setdifflst     0.833700  0.846652  1.013061  1.110352   1.287831

fig, axes = plt.subplots(2, 2, figsize=(8, 6), sharey=True)
for i, (n, g) in enumerate([(n, g.xs(n)) for n, g in rs.groupby('Select')]):
    ax = axes[i // 2, i % 2]
    g.plot.bar(ax=ax, title=n)
    ax.legend_.remove()
fig.tight_layout()

这是相对于运行时间而言的df.drop(dlst, 1, errors='ignore')。经过所有这些努力,似乎我们只能适度地提高性能。

在此处输入图片说明

如果事实最好的解决办法是使用reindexreindex_axis破解list(set(df.columns.values.tolist()).difference(dlst))。紧随其后,仍然比drop现在好一点np.setdiff1d

rs.idxmin().pipe(
    lambda x: pd.DataFrame(
        dict(idx=x.values, val=rs.lookup(x.values, x.index)),
        x.index
    )
)

                      idx       val
10     (ridx, setdifflst)  0.653431
30    (ridxa, setdifflst)  0.746143
100   (ridxa, setdifflst)  0.816207
300    (ridx, setdifflst)  0.780157
1000  (ridxa, setdifflst)  0.861622

TL;DR

A lot of effort to find a marginally more efficient solution. Difficult to justify the added complexity while sacrificing the simplicity of df.drop(dlst, 1, errors='ignore')

df.reindex_axis(np.setdiff1d(df.columns.values, dlst), 1)

Preamble
Deleting a column is semantically the same as selecting the other columns. I’ll show a few additional methods to consider.

I’ll also focus on the general solution of deleting multiple columns at once and allowing for the attempt to delete columns not present.

Using these solutions are general and will work for the simple case as well.


Setup
Consider the pd.DataFrame df and list to delete dlst

df = pd.DataFrame(dict(zip('ABCDEFGHIJ', range(1, 11))), range(3))
dlst = list('HIJKLM')

df

   A  B  C  D  E  F  G  H  I   J
0  1  2  3  4  5  6  7  8  9  10
1  1  2  3  4  5  6  7  8  9  10
2  1  2  3  4  5  6  7  8  9  10

dlst

['H', 'I', 'J', 'K', 'L', 'M']

The result should look like:

df.drop(dlst, 1, errors='ignore')

   A  B  C  D  E  F  G
0  1  2  3  4  5  6  7
1  1  2  3  4  5  6  7
2  1  2  3  4  5  6  7

Since I’m equating deleting a column to selecting the other columns, I’ll break it into two types:

  1. Label selection
  2. Boolean selection

Label Selection

We start by manufacturing the list/array of labels that represent the columns we want to keep and without the columns we want to delete.

  1. df.columns.difference(dlst)

    Index(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype='object')
    
  2. np.setdiff1d(df.columns.values, dlst)

    array(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype=object)
    
  3. df.columns.drop(dlst, errors='ignore')

    Index(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype='object')
    
  4. list(set(df.columns.values.tolist()).difference(dlst))

    # does not preserve order
    ['E', 'D', 'B', 'F', 'G', 'A', 'C']
    
  5. [x for x in df.columns.values.tolist() if x not in dlst]

    ['A', 'B', 'C', 'D', 'E', 'F', 'G']
    

Columns from Labels
For the sake of comparing the selection process, assume:

 cols = [x for x in df.columns.values.tolist() if x not in dlst]

Then we can evaluate

  1. df.loc[:, cols]
  2. df[cols]
  3. df.reindex(columns=cols)
  4. df.reindex_axis(cols, 1)

Which all evaluate to:

   A  B  C  D  E  F  G
0  1  2  3  4  5  6  7
1  1  2  3  4  5  6  7
2  1  2  3  4  5  6  7

Boolean Slice

We can construct an array/list of booleans for slicing

  1. ~df.columns.isin(dlst)
  2. ~np.in1d(df.columns.values, dlst)
  3. [x not in dlst for x in df.columns.values.tolist()]
  4. (df.columns.values[:, None] != dlst).all(1)

Columns from Boolean
For the sake of comparison

bools = [x not in dlst for x in df.columns.values.tolist()]
  1. df.loc[: bools]

Which all evaluate to:

   A  B  C  D  E  F  G
0  1  2  3  4  5  6  7
1  1  2  3  4  5  6  7
2  1  2  3  4  5  6  7

Robust Timing

Functions

setdiff1d = lambda df, dlst: np.setdiff1d(df.columns.values, dlst)
difference = lambda df, dlst: df.columns.difference(dlst)
columndrop = lambda df, dlst: df.columns.drop(dlst, errors='ignore')
setdifflst = lambda df, dlst: list(set(df.columns.values.tolist()).difference(dlst))
comprehension = lambda df, dlst: [x for x in df.columns.values.tolist() if x not in dlst]

loc = lambda df, cols: df.loc[:, cols]
slc = lambda df, cols: df[cols]
ridx = lambda df, cols: df.reindex(columns=cols)
ridxa = lambda df, cols: df.reindex_axis(cols, 1)

isin = lambda df, dlst: ~df.columns.isin(dlst)
in1d = lambda df, dlst: ~np.in1d(df.columns.values, dlst)
comp = lambda df, dlst: [x not in dlst for x in df.columns.values.tolist()]
brod = lambda df, dlst: (df.columns.values[:, None] != dlst).all(1)

Testing

res1 = pd.DataFrame(
    index=pd.MultiIndex.from_product([
        'loc slc ridx ridxa'.split(),
        'setdiff1d difference columndrop setdifflst comprehension'.split(),
    ], names=['Select', 'Label']),
    columns=[10, 30, 100, 300, 1000],
    dtype=float
)

res2 = pd.DataFrame(
    index=pd.MultiIndex.from_product([
        'loc'.split(),
        'isin in1d comp brod'.split(),
    ], names=['Select', 'Label']),
    columns=[10, 30, 100, 300, 1000],
    dtype=float
)

res = res1.append(res2).sort_index()

dres = pd.Series(index=res.columns, name='drop')

for j in res.columns:
    dlst = list(range(j))
    cols = list(range(j // 2, j + j // 2))
    d = pd.DataFrame(1, range(10), cols)
    dres.at[j] = timeit('d.drop(dlst, 1, errors="ignore")', 'from __main__ import d, dlst', number=100)
    for s, l in res.index:
        stmt = '{}(d, {}(d, dlst))'.format(s, l)
        setp = 'from __main__ import d, dlst, {}, {}'.format(s, l)
        res.at[(s, l), j] = timeit(stmt, setp, number=100)

rs = res / dres

rs

                          10        30        100       300        1000
Select Label                                                           
loc    brod           0.747373  0.861979  0.891144  1.284235   3.872157
       columndrop     1.193983  1.292843  1.396841  1.484429   1.335733
       comp           0.802036  0.732326  1.149397  3.473283  25.565922
       comprehension  1.463503  1.568395  1.866441  4.421639  26.552276
       difference     1.413010  1.460863  1.587594  1.568571   1.569735
       in1d           0.818502  0.844374  0.994093  1.042360   1.076255
       isin           1.008874  0.879706  1.021712  1.001119   0.964327
       setdiff1d      1.352828  1.274061  1.483380  1.459986   1.466575
       setdifflst     1.233332  1.444521  1.714199  1.797241   1.876425
ridx   columndrop     0.903013  0.832814  0.949234  0.976366   0.982888
       comprehension  0.777445  0.827151  1.108028  3.473164  25.528879
       difference     1.086859  1.081396  1.293132  1.173044   1.237613
       setdiff1d      0.946009  0.873169  0.900185  0.908194   1.036124
       setdifflst     0.732964  0.823218  0.819748  0.990315   1.050910
ridxa  columndrop     0.835254  0.774701  0.907105  0.908006   0.932754
       comprehension  0.697749  0.762556  1.215225  3.510226  25.041832
       difference     1.055099  1.010208  1.122005  1.119575   1.383065
       setdiff1d      0.760716  0.725386  0.849949  0.879425   0.946460
       setdifflst     0.710008  0.668108  0.778060  0.871766   0.939537
slc    columndrop     1.268191  1.521264  2.646687  1.919423   1.981091
       comprehension  0.856893  0.870365  1.290730  3.564219  26.208937
       difference     1.470095  1.747211  2.886581  2.254690   2.050536
       setdiff1d      1.098427  1.133476  1.466029  2.045965   3.123452
       setdifflst     0.833700  0.846652  1.013061  1.110352   1.287831

fig, axes = plt.subplots(2, 2, figsize=(8, 6), sharey=True)
for i, (n, g) in enumerate([(n, g.xs(n)) for n, g in rs.groupby('Select')]):
    ax = axes[i // 2, i % 2]
    g.plot.bar(ax=ax, title=n)
    ax.legend_.remove()
fig.tight_layout()

This is relative to the time it takes to run df.drop(dlst, 1, errors='ignore'). It seems like after all that effort, we only improve performance modestly.

enter image description here

If fact the best solutions use reindex or reindex_axis on the hack list(set(df.columns.values.tolist()).difference(dlst)). A close second and still very marginally better than drop is np.setdiff1d.

rs.idxmin().pipe(
    lambda x: pd.DataFrame(
        dict(idx=x.values, val=rs.lookup(x.values, x.index)),
        x.index
    )
)

                      idx       val
10     (ridx, setdifflst)  0.653431
30    (ridxa, setdifflst)  0.746143
100   (ridxa, setdifflst)  0.816207
300    (ridx, setdifflst)  0.780157
1000  (ridxa, setdifflst)  0.861622

回答 11

点语法在JavaScript中有效,但在Python中无效。

  • Python: del df['column_name']
  • JavaScript:del df['column_name'] del df.column_name

The dot syntax works in JavaScript, but not in Python.

  • Python: del df['column_name']
  • JavaScript: del df['column_name'] or del df.column_name

回答 12

如果原始数据帧df不太大,则没有内存限制,只需要保留几列,那么最好只用所需的列创建一个新的数据帧:

new_df = df[['spam', 'sausage']]

If your original dataframe df is not too big, you have no memory constraints, and you only need to keep a few columns then you might as well create a new dataframe with only the columns you need:

new_df = df[['spam', 'sausage']]

回答 13

我们可以通过drop()方法删除删除指定的列或特定的列。

假设df是一个数据帧。

要删除的列= column0

码:

df = df.drop(column0, axis=1)

要删除多列col1,col2,…。。。,coln,我们必须在列表中插入所有需要删除的列。然后通过drop()方法将其删除。

码:

df = df.drop([col1, col2, . . . , coln], axis=1)

希望对您有所帮助。

We can Remove or Delete a specified column or sprcified columns by drop() method.

Suppose df is a dataframe.

Column to be removed = column0

Code:

df = df.drop(column0, axis=1)

To remove multiple columns col1, col2, . . . , coln, we have to insert all the columns that needed to be removed in a list. Then remove them by drop() method.

Code:

df = df.drop([col1, col2, . . . , coln], axis=1)

I hope it would be helpful.


回答 14

在Pandas DataFrame中删除列的另一种方法

如果您不希望就地删除,则可以通过使用DataFrame(...)函数指定列来创建新的DataFrame

my_dict = { 'name' : ['a','b','c','d'], 'age' : [10,20,25,22], 'designation' : ['CEO', 'VP', 'MD', 'CEO']}

df = pd.DataFrame(my_dict)

创建一个新的DataFrame为

newdf = pd.DataFrame(df, columns=['name', 'age'])

您获得的结果与通过del / drop获得的结果一样好

Another way of Deleting a Column in Pandas DataFrame

if you’re not looking for In-Place deletion then you can create a new DataFrame by specifying the columns using DataFrame(...) function as

my_dict = { 'name' : ['a','b','c','d'], 'age' : [10,20,25,22], 'designation' : ['CEO', 'VP', 'MD', 'CEO']}

df = pd.DataFrame(my_dict)

Create a new DataFrame as

newdf = pd.DataFrame(df, columns=['name', 'age'])

You get a result as good as what you get with del / drop


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。