问题:使用int的python dataframe pandas drop column
我知道要删除列,您可以使用df.drop(’column name’,axis = 1)。有没有一种方法可以使用数字索引而不是列名来删除列?
I understand that to drop a column you use df.drop(‘column name’, axis=1). Is there a way to drop a column using a numerical index instead of the column name?
回答 0
您可以i
像这样删除索引上的列:
df.drop(df.columns[i], axis=1)
如果列中有重复的名称,这可能会很奇怪,因此,您可以重命名要用新名称删除列的列。或者,您可以像这样重新分配DataFrame:
df = df.iloc[:, [j for j, c in enumerate(df.columns) if j != i]]
You can delete column on i
index like this:
df.drop(df.columns[i], axis=1)
It could work strange, if you have duplicate names in columns, so to do this you can rename column you want to delete column by new name. Or you can reassign DataFrame like this:
df = df.iloc[:, [j for j, c in enumerate(df.columns) if j != i]]
回答 1
像这样删除多列:
cols = [1,2,4,5,12]
df.drop(df.columns[cols],axis=1,inplace=True)
inplace=True
用于在数据框本身中进行更改,而无需将列放在数据框的副本上。如果您需要保持原样,请使用:
df_after_dropping = df.drop(df.columns[cols],axis=1)
Drop multiple columns like this:
cols = [1,2,4,5,12]
df.drop(df.columns[cols],axis=1,inplace=True)
inplace=True
is used to make the changes in the dataframe itself without doing the column dropping on a copy of the data frame. If you need to keep your original intact, use:
df_after_dropping = df.drop(df.columns[cols],axis=1)
回答 2
如果存在多个具有相同名称的列,那么到目前为止给出的解决方案将删除所有列,而这可能并不是所要查找的。如果尝试删除一个实例以外的重复列,则可能是这种情况。下面的示例阐明了这种情况:
# make a df with duplicate columns 'x'
df = pd.DataFrame({'x': range(5) , 'x':range(5), 'y':range(6, 11)}, columns = ['x', 'x', 'y'])
df
Out[495]:
x x y
0 0 0 6
1 1 1 7
2 2 2 8
3 3 3 9
4 4 4 10
# attempting to drop the first column according to the solution offered so far
df.drop(df.columns[0], axis = 1)
y
0 6
1 7
2 8
3 9
4 10
如您所见,两个Xs列均被删除。替代解决方案:
column_numbers = [x for x in range(df.shape[1])] # list of columns' integer indices
column_numbers .remove(0) #removing column integer index 0
df.iloc[:, column_numbers] #return all columns except the 0th column
x y
0 0 6
1 1 7
2 2 8
3 3 9
4 4 10
如您所见,这确实删除了仅第0列(第一个“ x”)。
If there are multiple columns with identical names, the solutions given here so far will remove all of the columns, which may not be what one is looking for. This may be the case if one is trying to remove duplicate columns except one instance. The example below clarifies this situation:
# make a df with duplicate columns 'x'
df = pd.DataFrame({'x': range(5) , 'x':range(5), 'y':range(6, 11)}, columns = ['x', 'x', 'y'])
df
Out[495]:
x x y
0 0 0 6
1 1 1 7
2 2 2 8
3 3 3 9
4 4 4 10
# attempting to drop the first column according to the solution offered so far
df.drop(df.columns[0], axis = 1)
y
0 6
1 7
2 8
3 9
4 10
As you can see, both Xs columns were dropped.
Alternative solution:
column_numbers = [x for x in range(df.shape[1])] # list of columns' integer indices
column_numbers .remove(0) #removing column integer index 0
df.iloc[:, column_numbers] #return all columns except the 0th column
x y
0 0 6
1 1 7
2 2 8
3 3 9
4 4 10
As you can see, this truly removed only the 0th column (first ‘x’).
回答 3
您需要根据列在数据框中的位置来标识它们。例如,如果您要删除(删除)第2,3和5列,它将是
df.drop(df.columns[[2,3,5]], axis = 1)
You need to identify the columns based on their position in dataframe. For example, if you want to drop (del) column number 2,3 and 5, it will be,
df.drop(df.columns[[2,3,5]], axis = 1)
回答 4
如果您有两个具有相同名称的列。一种简单的方法是像这样手动重命名列:
df.columns = ['column1', 'column2', 'column3']
然后,您可以根据需要通过列索引进行删除,如下所示:-
df.drop(df.columns[1], axis=1, inplace=True)
df.column[1]
将删除索引1。
请记住,轴1 =列,轴0 =行。
If you have two columns with the same name. One simple way is to manually rename the columns like this:-
df.columns = ['column1', 'column2', 'column3']
Then you can drop via column index as you requested, like this:-
df.drop(df.columns[1], axis=1, inplace=True)
df.column[1]
will drop index 1.
Remember axis 1 = columns and axis 0 = rows.
回答 5
如果您真的想使用整数(但是为什么呢?),则可以构建一个字典。
col_dict = {x: col for x, col in enumerate(df.columns)}
然后df = df.drop(col_dict[0], 1)
将按需要工作
编辑:您可以将其放入为您执行此操作的函数中,尽管这样,每次调用它时都会创建字典
def drop_col_n(df, col_n_to_drop):
col_dict = {x: col for x, col in enumerate(df.columns)}
return df.drop(col_dict[col_n_to_drop], 1)
df = drop_col_n(df, 2)
if you really want to do it with integers (but why?), then you could build a dictionary.
col_dict = {x: col for x, col in enumerate(df.columns)}
then df = df.drop(col_dict[0], 1)
will work as desired
edit: you can put it in a function that does that for you, though this way it creates the dictionary every time you call it
def drop_col_n(df, col_n_to_drop):
col_dict = {x: col for x, col in enumerate(df.columns)}
return df.drop(col_dict[col_n_to_drop], 1)
df = drop_col_n(df, 2)
回答 6
您可以使用以下行删除前两列(或不需要的任何列):
df.drop([df.columns[0], df.columns[1]], axis=1)
参考
You can use the following line to drop the first two columns (or any column you don’t need):
df.drop([df.columns[0], df.columns[1]], axis=1)
Reference
回答 7
由于可以有多个具有相同名称的列,我们应该首先重命名这些列。这是解决方案的代码。
df.columns=list(range(0,len(df.columns)))
df.drop(columns=[1,2])#drop second and third columns
Since there can be multiple columns with same name , we should first rename the columns.
Here is code for the solution.
df.columns=list(range(0,len(df.columns)))
df.drop(columns=[1,2])#drop second and third columns