问题:如何在Pandas中找到数字列?
假设df
是一个熊猫DataFrame。我想找到所有数字类型的列。就像是:
isNumeric = is_numeric(df)
Let’s say df
is a pandas DataFrame.
I would like to find all columns of numeric type.
Something like:
isNumeric = is_numeric(df)
回答 0
您可以使用select_dtypes
DataFrame的方法。它包括两个参数include和exclude。所以isNumeric看起来像:
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
newdf = df.select_dtypes(include=numerics)
You could use select_dtypes
method of DataFrame. It includes two parameters include and exclude. So isNumeric would look like:
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
newdf = df.select_dtypes(include=numerics)
回答 1
您可以使用未记录的功能_get_numeric_data()
来仅过滤数字列:
df._get_numeric_data()
例:
In [32]: data
Out[32]:
A B
0 1 s
1 2 s
2 3 s
3 4 s
In [33]: data._get_numeric_data()
Out[33]:
A
0 1
1 2
2 3
3 4
注意,这是一个“私有方法”(即实现细节),将来可能会更改或完全删除。请谨慎使用。
You can use the undocumented function _get_numeric_data()
to filter only numeric columns:
df._get_numeric_data()
Example:
In [32]: data
Out[32]:
A B
0 1 s
1 2 s
2 3 s
3 4 s
In [33]: data._get_numeric_data()
Out[33]:
A
0 1
1 2
2 3
3 4
Note that this is a “private method” (i.e., an implementation detail) and is subject to change or total removal in the future. Use with caution.
回答 2
简单的单行答案即可创建仅包含数字列的新数据框:
df.select_dtypes(include=np.number)
如果需要数字列的名称:
df.select_dtypes(include=np.number).columns.tolist()
完整的代码:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': range(7, 10),
'B': np.random.rand(3),
'C': ['foo','bar','baz'],
'D': ['who','what','when']})
df
# A B C D
# 0 7 0.704021 foo who
# 1 8 0.264025 bar what
# 2 9 0.230671 baz when
df_numerics_only = df.select_dtypes(include=np.number)
df_numerics_only
# A B
# 0 7 0.704021
# 1 8 0.264025
# 2 9 0.230671
colnames_numerics_only = df.select_dtypes(include=np.number).columns.tolist()
colnames_numerics_only
# ['A', 'B']
Simple one-line answer to create a new dataframe with only numeric columns:
df.select_dtypes(include=np.number)
If you want the names of numeric columns:
df.select_dtypes(include=np.number).columns.tolist()
Complete code:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': range(7, 10),
'B': np.random.rand(3),
'C': ['foo','bar','baz'],
'D': ['who','what','when']})
df
# A B C D
# 0 7 0.704021 foo who
# 1 8 0.264025 bar what
# 2 9 0.230671 baz when
df_numerics_only = df.select_dtypes(include=np.number)
df_numerics_only
# A B
# 0 7 0.704021
# 1 8 0.264025
# 2 9 0.230671
colnames_numerics_only = df.select_dtypes(include=np.number).columns.tolist()
colnames_numerics_only
# ['A', 'B']
回答 3
df.select_dtypes(exclude=['object'])
df.select_dtypes(exclude = ['object'])
Update
df.select_dtypes(inlcude = np.number)
#or with new version of panda
df.select_dtypes('number')
回答 4
简单的一线:
df.select_dtypes('number').columns
Simple one-liner:
df.select_dtypes('number').columns
回答 5
以下代码将返回数据集的数字列的名称列表。
cnames=list(marketing_train.select_dtypes(exclude=['object']).columns)
这marketing_train
是我的数据集,它select_dtypes()
是使用exclude和include参数选择数据类型的功能,而column用于获取上述代码输出的数据集的列名,如下所示:
['custAge',
'campaign',
'pdays',
'previous',
'emp.var.rate',
'cons.price.idx',
'cons.conf.idx',
'euribor3m',
'nr.employed',
'pmonths',
'pastEmail']
谢谢
Following codes will return list of names of the numeric columns of a data set.
cnames=list(marketing_train.select_dtypes(exclude=['object']).columns)
here marketing_train
is my data set and select_dtypes()
is function to select data types using exclude and include arguments and columns is used to fetch the column name of data set
output of above code will be following:
['custAge',
'campaign',
'pdays',
'previous',
'emp.var.rate',
'cons.price.idx',
'cons.conf.idx',
'euribor3m',
'nr.employed',
'pmonths',
'pastEmail']
Thanks
回答 6
这是用于在熊猫数据框中查找数字列的另一种简单代码,
numeric_clmns = df.dtypes[df.dtypes != "object"].index
This is another simple code for finding numeric column in pandas data frame,
numeric_clmns = df.dtypes[df.dtypes != "object"].index
回答 7
def is_type(df, baseType):
import numpy as np
import pandas as pd
test = [issubclass(np.dtype(d).type, baseType) for d in df.dtypes]
return pd.DataFrame(data = test, index = df.columns, columns = ["test"])
def is_float(df):
import numpy as np
return is_type(df, np.float)
def is_number(df):
import numpy as np
return is_type(df, np.number)
def is_integer(df):
import numpy as np
return is_type(df, np.integer)
def is_type(df, baseType):
import numpy as np
import pandas as pd
test = [issubclass(np.dtype(d).type, baseType) for d in df.dtypes]
return pd.DataFrame(data = test, index = df.columns, columns = ["test"])
def is_float(df):
import numpy as np
return is_type(df, np.float)
def is_number(df):
import numpy as np
return is_type(df, np.number)
def is_integer(df):
import numpy as np
return is_type(df, np.integer)
回答 8
改编这个答案,你可以做
df.ix[:,df.applymap(np.isreal).all(axis=0)]
在这里,np.applymap(np.isreal)
显示数据框中的每个单元格是否都是数字,并.axis(all=0)
检查列中的所有值是否均为True,并返回一系列布尔值,这些布尔值可用于索引所需的列。
Adapting this answer, you could do
df.ix[:,df.applymap(np.isreal).all(axis=0)]
Here, np.applymap(np.isreal)
shows whether every cell in the data frame is numeric, and .axis(all=0)
checks if all values in a column are True and returns a series of Booleans that can be used to index the desired columns.
回答 9
请看下面的代码:
if(dataset.select_dtypes(include=[np.number]).shape[1] > 0):
display(dataset.select_dtypes(include=[np.number]).describe())
if(dataset.select_dtypes(include=[np.object]).shape[1] > 0):
display(dataset.select_dtypes(include=[np.object]).describe())
这样,您可以检查值是否为数字,例如float和int或srting值。第二条if语句用于检查对象引用的字符串值。
Please see the below code:
if(dataset.select_dtypes(include=[np.number]).shape[1] > 0):
display(dataset.select_dtypes(include=[np.number]).describe())
if(dataset.select_dtypes(include=[np.object]).shape[1] > 0):
display(dataset.select_dtypes(include=[np.object]).describe())
This way you can check whether the value are numeric such as float and int or the srting values. the second if statement is used for checking the string values which is referred by the object.
回答 10
我们可以根据以下要求包括和排除数据类型:
train.select_dtypes(include=None, exclude=None)
train.select_dtypes(include='number') #will include all the numeric types
从Jupyter Notebook引用。
要选择所有数字类型,请使用np.number
或'number'
要选择字符串,您必须使用object
dtype,但是请注意,这将返回所有对象dtype列
见NumPy dtype hierarchy <http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html>
__
要选择日期时间,使用np.datetime64
,'datetime'
或
'datetime64'
要选择timedeltas,使用np.timedelta64
,'timedelta'
或
'timedelta64'
要选择Pandas类别dtype,请使用 'category'
要选择Pandas datetimetz dtypes,请使用'datetimetz'
(0.20.0中的新功能)或“’datetime64 [ns,tz]’
We can include and exclude data types as per the requirement as below:
train.select_dtypes(include=None, exclude=None)
train.select_dtypes(include='number') #will include all the numeric types
Referred from Jupyter Notebook.
To select all numeric types, use np.number
or 'number'
To select strings you must use the object
dtype but note that
this will return all object dtype columns
See the NumPy dtype hierarchy <http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html>
__
To select datetimes, use np.datetime64
, 'datetime'
or
'datetime64'
To select timedeltas, use np.timedelta64
, 'timedelta'
or
'timedelta64'
To select Pandas categorical dtypes, use 'category'
To select Pandas datetimetz dtypes, use 'datetimetz'
(new in
0.20.0) or “’datetime64[ns, tz]’