问题:如何在Pandas中找到数字列?

假设df是一个熊猫DataFrame。我想找到所有数字类型的列。就像是:

isNumeric = is_numeric(df)

Let’s say df is a pandas DataFrame. I would like to find all columns of numeric type. Something like:

isNumeric = is_numeric(df)

回答 0

您可以使用select_dtypesDataFrame的方法。它包括两个参数include和exclude。所以isNumeric看起来像:

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']

newdf = df.select_dtypes(include=numerics)

You could use select_dtypes method of DataFrame. It includes two parameters include and exclude. So isNumeric would look like:

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']

newdf = df.select_dtypes(include=numerics)

回答 1

您可以使用未记录的功能_get_numeric_data()来仅过滤数字列:

df._get_numeric_data()

例:

In [32]: data
Out[32]:
   A  B
0  1  s
1  2  s
2  3  s
3  4  s

In [33]: data._get_numeric_data()
Out[33]:
   A
0  1
1  2
2  3
3  4

注意,这是一个“私有方法”(即实现细节),将来可能会更改或完全删除。请谨慎使用

You can use the undocumented function _get_numeric_data() to filter only numeric columns:

df._get_numeric_data()

Example:

In [32]: data
Out[32]:
   A  B
0  1  s
1  2  s
2  3  s
3  4  s

In [33]: data._get_numeric_data()
Out[33]:
   A
0  1
1  2
2  3
3  4

Note that this is a “private method” (i.e., an implementation detail) and is subject to change or total removal in the future. Use with caution.


回答 2

简单的单行答案即可创建仅包含数字列的新数据框:

df.select_dtypes(include=np.number)

如果需要数字列的名称:

df.select_dtypes(include=np.number).columns.tolist()

完整的代码:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': range(7, 10),
                   'B': np.random.rand(3),
                   'C': ['foo','bar','baz'],
                   'D': ['who','what','when']})
df
#    A         B    C     D
# 0  7  0.704021  foo   who
# 1  8  0.264025  bar  what
# 2  9  0.230671  baz  when

df_numerics_only = df.select_dtypes(include=np.number)
df_numerics_only
#    A         B
# 0  7  0.704021
# 1  8  0.264025
# 2  9  0.230671

colnames_numerics_only = df.select_dtypes(include=np.number).columns.tolist()
colnames_numerics_only
# ['A', 'B']

Simple one-line answer to create a new dataframe with only numeric columns:

df.select_dtypes(include=np.number)

If you want the names of numeric columns:

df.select_dtypes(include=np.number).columns.tolist()

Complete code:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': range(7, 10),
                   'B': np.random.rand(3),
                   'C': ['foo','bar','baz'],
                   'D': ['who','what','when']})
df
#    A         B    C     D
# 0  7  0.704021  foo   who
# 1  8  0.264025  bar  what
# 2  9  0.230671  baz  when

df_numerics_only = df.select_dtypes(include=np.number)
df_numerics_only
#    A         B
# 0  7  0.704021
# 1  8  0.264025
# 2  9  0.230671

colnames_numerics_only = df.select_dtypes(include=np.number).columns.tolist()
colnames_numerics_only
# ['A', 'B']

回答 3

df.select_dtypes(exclude=['object'])
df.select_dtypes(exclude = ['object'])

Update

df.select_dtypes(inlcude = np.number)
#or with new version of panda
df.select_dtypes('number')

回答 4

简单的一线:

df.select_dtypes('number').columns

Simple one-liner:

df.select_dtypes('number').columns

回答 5

以下代码将返回数据集的数字列的名称列表。

cnames=list(marketing_train.select_dtypes(exclude=['object']).columns)

marketing_train是我的数据集,它select_dtypes()是使用exclude和include参数选择数据类型的功能,而column用于获取上述代码输出的数据集的列名,如下所示:

['custAge',
     'campaign',
     'pdays',
     'previous',
     'emp.var.rate',
     'cons.price.idx',
     'cons.conf.idx',
     'euribor3m',
     'nr.employed',
     'pmonths',
     'pastEmail']

谢谢

Following codes will return list of names of the numeric columns of a data set.

cnames=list(marketing_train.select_dtypes(exclude=['object']).columns)

here marketing_train is my data set and select_dtypes() is function to select data types using exclude and include arguments and columns is used to fetch the column name of data set output of above code will be following:

['custAge',
     'campaign',
     'pdays',
     'previous',
     'emp.var.rate',
     'cons.price.idx',
     'cons.conf.idx',
     'euribor3m',
     'nr.employed',
     'pmonths',
     'pastEmail']

Thanks


回答 6

这是用于在熊猫数据框中查找数字列的另一种简单代码,

numeric_clmns = df.dtypes[df.dtypes != "object"].index 

This is another simple code for finding numeric column in pandas data frame,

numeric_clmns = df.dtypes[df.dtypes != "object"].index 

回答 7

def is_type(df, baseType):
    import numpy as np
    import pandas as pd
    test = [issubclass(np.dtype(d).type, baseType) for d in df.dtypes]
    return pd.DataFrame(data = test, index = df.columns, columns = ["test"])
def is_float(df):
    import numpy as np
    return is_type(df, np.float)
def is_number(df):
    import numpy as np
    return is_type(df, np.number)
def is_integer(df):
    import numpy as np
    return is_type(df, np.integer)
def is_type(df, baseType):
    import numpy as np
    import pandas as pd
    test = [issubclass(np.dtype(d).type, baseType) for d in df.dtypes]
    return pd.DataFrame(data = test, index = df.columns, columns = ["test"])
def is_float(df):
    import numpy as np
    return is_type(df, np.float)
def is_number(df):
    import numpy as np
    return is_type(df, np.number)
def is_integer(df):
    import numpy as np
    return is_type(df, np.integer)

回答 8

改编这个答案,你可以做

df.ix[:,df.applymap(np.isreal).all(axis=0)]

在这里,np.applymap(np.isreal)显示数据框中的每个单元格是否都是数字,并.axis(all=0)检查列中的所有值是否均为True,并返回一系列布尔值,这些布尔值可用于索引所需的列。

Adapting this answer, you could do

df.ix[:,df.applymap(np.isreal).all(axis=0)]

Here, np.applymap(np.isreal) shows whether every cell in the data frame is numeric, and .axis(all=0) checks if all values in a column are True and returns a series of Booleans that can be used to index the desired columns.


回答 9

请看下面的代码:

if(dataset.select_dtypes(include=[np.number]).shape[1] > 0):
display(dataset.select_dtypes(include=[np.number]).describe())
if(dataset.select_dtypes(include=[np.object]).shape[1] > 0):
display(dataset.select_dtypes(include=[np.object]).describe())

这样,您可以检查值是否为数字,例如float和int或srting值。第二条if语句用于检查对象引用的字符串值。

Please see the below code:

if(dataset.select_dtypes(include=[np.number]).shape[1] > 0):
display(dataset.select_dtypes(include=[np.number]).describe())
if(dataset.select_dtypes(include=[np.object]).shape[1] > 0):
display(dataset.select_dtypes(include=[np.object]).describe())

This way you can check whether the value are numeric such as float and int or the srting values. the second if statement is used for checking the string values which is referred by the object.


回答 10

我们可以根据以下要求包括和排除数据类型:

train.select_dtypes(include=None, exclude=None)
train.select_dtypes(include='number') #will include all the numeric types

从Jupyter Notebook引用。

要选择所有数字类型,请使用np.number'number'

  • 要选择字符串,您必须使用objectdtype,但是请注意,这将返回所有对象dtype列

  • NumPy dtype hierarchy <http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html>__

  • 要选择日期时间,使用np.datetime64'datetime''datetime64'

  • 要选择timedeltas,使用np.timedelta64'timedelta''timedelta64'

  • 要选择Pandas类别dtype,请使用 'category'

  • 要选择Pandas datetimetz dtypes,请使用'datetimetz'(0.20.0中的新功能)或“’datetime64 [ns,tz]’

We can include and exclude data types as per the requirement as below:

train.select_dtypes(include=None, exclude=None)
train.select_dtypes(include='number') #will include all the numeric types

Referred from Jupyter Notebook.

To select all numeric types, use np.number or 'number'

  • To select strings you must use the object dtype but note that this will return all object dtype columns

  • See the NumPy dtype hierarchy <http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html>__

  • To select datetimes, use np.datetime64, 'datetime' or 'datetime64'

  • To select timedeltas, use np.timedelta64, 'timedelta' or 'timedelta64'

  • To select Pandas categorical dtypes, use 'category'

  • To select Pandas datetimetz dtypes, use 'datetimetz' (new in 0.20.0) or “’datetime64[ns, tz]’


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。