熊猫中的dtype（’O’）是什么？

Question 1

I have a dataframe in pandas and I’m trying to figure out what the types of its values are. I am unsure what the type is of column 'Test'. However, when I run myFrame['Test'].dtype, I get;

dtype('O')

What does this mean?

Question 2

It means:

'O'     (Python) objects

Source.

The first character specifies the kind of data and the remaining characters specify the number of bytes per item, except for Unicode, where it is interpreted as the number of characters. The item size must correspond to an existing type, or an error will be raised. The supported kinds are to an existing type, or an error will be raised. The supported kinds are:

'b'       boolean
'i'       (signed) integer
'u'       unsigned integer
'f'       floating-point
'c'       complex-floating point
'O'       (Python) objects
'S', 'a'  (byte-)string
'U'       Unicode
'V'       raw data (void)

Another answer helps if need check types.

Question 3

When you see `dtype('O')` inside dataframe this means Pandas string.

What is dtype?

Something that belongs to pandas or numpy, or both, or something else? If we examine pandas code:

df = pd.DataFrame({'float': [1.0],
                    'int': [1],
                    'datetime': [pd.Timestamp('20180310')],
                    'string': ['foo']})
print(df)
print(df['float'].dtype,df['int'].dtype,df['datetime'].dtype,df['string'].dtype)
df['string'].dtype

It will output like this:

   float  int   datetime string    
0    1.0    1 2018-03-10    foo
---
float64 int64 datetime64[ns] object
---
dtype('O')

You can interpret the last as Pandas dtype('O') or Pandas object which is Python type string, and this corresponds to Numpy string_, or unicode_ types.

Pandas dtype    Python type     NumPy type          Usage
object          str             string_, unicode_   Text

Like Don Quixote is on ass, Pandas is on Numpy and Numpy understand the underlying architecture of your system and uses the class numpy.dtype for that.

Data type object is an instance of numpy.dtype class that understand the data type more precise including:

Type of the data (integer, float, Python object, etc.)
Size of the data (how many bytes is in e.g. the integer)
Byte order of the data (little-endian or big-endian)
If the data type is structured, an aggregate of other data types, (e.g., describing an array item consisting of an integer and a float)
What are the names of the “fields” of the structure
What is the data-type of each field
Which part of the memory block each field takes
If the data type is a sub-array, what is its shape and data type

In the context of this question dtype belongs to both pands and numpy and in particular dtype('O') means we expect the string.

Here is some code for testing with explanation: If we have the dataset as dictionary

import pandas as pd
import numpy as np
from pandas import Timestamp

data={'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'date': {0: Timestamp('2018-12-12 00:00:00'), 1: Timestamp('2018-12-12 00:00:00'), 2: Timestamp('2018-12-12 00:00:00'), 3: Timestamp('2018-12-12 00:00:00'), 4: Timestamp('2018-12-12 00:00:00')}, 'role': {0: 'Support', 1: 'Marketing', 2: 'Business Development', 3: 'Sales', 4: 'Engineering'}, 'num': {0: 123, 1: 234, 2: 345, 3: 456, 4: 567}, 'fnum': {0: 3.14, 1: 2.14, 2: -0.14, 3: 41.3, 4: 3.14}}
df = pd.DataFrame.from_dict(data) #now we have a dataframe

print(df)
print(df.dtypes)

The last lines will examine the dataframe and note the output:

   id       date                  role  num   fnum
0   1 2018-12-12               Support  123   3.14
1   2 2018-12-12             Marketing  234   2.14
2   3 2018-12-12  Business Development  345  -0.14
3   4 2018-12-12                 Sales  456  41.30
4   5 2018-12-12           Engineering  567   3.14
id               int64
date    datetime64[ns]
role            object
num              int64
fnum           float64
dtype: object

All kind of different dtypes

df.iloc[1,:] = np.nan
df.iloc[2,:] = None

But if we try to set np.nan or None this will not affect the original column dtype. The output will be like this:

print(df)
print(df.dtypes)

    id       date         role    num   fnum
0  1.0 2018-12-12      Support  123.0   3.14
1  NaN        NaT          NaN    NaN    NaN
2  NaN        NaT         None    NaN    NaN
3  4.0 2018-12-12        Sales  456.0  41.30
4  5.0 2018-12-12  Engineering  567.0   3.14
id             float64
date    datetime64[ns]
role            object
num            float64
fnum           float64
dtype: object

So np.nan or None will not change the columns dtype, unless we set the all column rows to np.nan or None. In that case column will become float64 or object respectively.

You may try also setting single rows:

df.iloc[3,:] = 0 # will convert datetime to object only
df.iloc[4,:] = '' # will convert all columns to object

And to note here, if we set string inside a non string column it will become string or object dtype.

Question 4

It means “a python object”, i.e. not one of the builtin scalar types supported by numpy.

np.array([object()]).dtype
=> dtype('O')

Question 5

‘O’ stands for object.

#Loading a csv file as a dataframe
import pandas as pd 
train_df = pd.read_csv('train.csv')
col_name = 'Name of Employee'

#Checking the datatype of column name
train_df[col_name].dtype

#Instead try printing the same thing
print train_df[col_name].dtype

The first line returns: dtype('O')

The line with the print statement returns the following: object

熊猫中的dtype（’O’）是什么？

问题：熊猫中的dtype（’O’）是什么？

回答 0

回答 1

当您`dtype('O')`在数据框内看到这意味着熊猫字符串。

When you see `dtype('O')` inside dataframe this means Pandas string.

回答 2

回答 3

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

7行代码 Python热力图可视化分析缺失数据处理

Python 流程图 — 一键转化代码为流程图

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

将多个功能应用于多个groupby列

如何模拟导入

将matplotlib图例移到轴外使其被图框切断

找不到pg_config可执行文件

DistutilsOptionError：必须提供home或prefix / exec-prefix —不能同时提供

如何可靠地在与Python脚本相同的目录中打开文件

熊猫中的dtype（’O’）是什么？

问题：熊猫中的dtype（’O’）是什么？

回答 0

回答 1

当您dtype('O')在数据框内看到这意味着熊猫字符串。

When you see dtype('O') inside dataframe this means Pandas string.

回答 2

回答 3

相关文章

排行榜展示

文章展示

当您`dtype('O')`在数据框内看到这意味着熊猫字符串。

When you see `dtype('O')` inside dataframe this means Pandas string.