问题:将字符串转换为DataFrame中的float
如何隐藏包含NaN
浮点数的字符串和值的DataFrame列。还有另一列的值为字符串和浮点数;如何将整个列转换为浮点数。
How to covert a DataFrame column containing strings and NaN
values to floats. And there is another column whose values are strings and floats; how to convert this entire column to floats.
回答 0
注意: pd.convert_objects
现在已弃用。您应该使用pd.Series.astype(float)
或pd.to_numeric
其他答案中所述。
在0.11中可用。强制转换(或将其设置为nan),即使astype
失败也会起作用。它也按系列进行排序,因此不会转换为完整的字符串列
In [10]: df = DataFrame(dict(A = Series(['1.0','1']), B = Series(['1.0','foo'])))
In [11]: df
Out[11]:
A B
0 1.0 1.0
1 1 foo
In [12]: df.dtypes
Out[12]:
A object
B object
dtype: object
In [13]: df.convert_objects(convert_numeric=True)
Out[13]:
A B
0 1 1
1 1 NaN
In [14]: df.convert_objects(convert_numeric=True).dtypes
Out[14]:
A float64
B float64
dtype: object
NOTE: pd.convert_objects
has now been deprecated. You should use pd.Series.astype(float)
or pd.to_numeric
as described in other
answers.
This is available in 0.11. Forces conversion (or set’s to nan)
This will work even when astype
will fail; its also series by series
so it won’t convert say a complete string column
In [10]: df = DataFrame(dict(A = Series(['1.0','1']), B = Series(['1.0','foo'])))
In [11]: df
Out[11]:
A B
0 1.0 1.0
1 1 foo
In [12]: df.dtypes
Out[12]:
A object
B object
dtype: object
In [13]: df.convert_objects(convert_numeric=True)
Out[13]:
A B
0 1 1
1 1 NaN
In [14]: df.convert_objects(convert_numeric=True).dtypes
Out[14]:
A float64
B float64
dtype: object
回答 1
你可以试试看df.column_name = df.column_name.astype(float)
。至于这些NaN
值,您需要指定如何转换它们,但是您可以使用该.fillna
方法来进行转换。
例:
In [12]: df
Out[12]:
a b
0 0.1 0.2
1 NaN 0.3
2 0.4 0.5
In [13]: df.a.values
Out[13]: array(['0.1', nan, '0.4'], dtype=object)
In [14]: df.a = df.a.astype(float).fillna(0.0)
In [15]: df
Out[15]:
a b
0 0.1 0.2
1 0.0 0.3
2 0.4 0.5
In [16]: df.a.values
Out[16]: array([ 0.1, 0. , 0.4])
You can try df.column_name = df.column_name.astype(float)
. As for the NaN
values, you need to specify how they should be converted, but you can use the .fillna
method to do it.
Example:
In [12]: df
Out[12]:
a b
0 0.1 0.2
1 NaN 0.3
2 0.4 0.5
In [13]: df.a.values
Out[13]: array(['0.1', nan, '0.4'], dtype=object)
In [14]: df.a = df.a.astype(float).fillna(0.0)
In [15]: df
Out[15]:
a b
0 0.1 0.2
1 0.0 0.3
2 0.4 0.5
In [16]: df.a.values
Out[16]: array([ 0.1, 0. , 0.4])
回答 2
在较新版本的熊猫(0.17及更高版本)中,可以使用to_numeric函数。它允许您转换整个数据框或仅转换单个列。它还使您能够选择如何处理无法转换为数值的内容:
import pandas as pd
s = pd.Series(['1.0', '2', -3])
pd.to_numeric(s)
s = pd.Series(['apple', '1.0', '2', -3])
pd.to_numeric(s, errors='ignore')
pd.to_numeric(s, errors='coerce')
In a newer version of pandas (0.17 and up), you can use to_numeric function. It allows you to convert the whole dataframe or just individual columns. It also gives you an ability to select how to treat stuff that can’t be converted to numeric values:
import pandas as pd
s = pd.Series(['1.0', '2', -3])
pd.to_numeric(s)
s = pd.Series(['apple', '1.0', '2', -3])
pd.to_numeric(s, errors='ignore')
pd.to_numeric(s, errors='coerce')
回答 3
df['MyColumnName'] = df['MyColumnName'].astype('float64')
df['MyColumnName'] = df['MyColumnName'].astype('float64')
回答 4
您必须先将np.nan替换为空字符串(”),然后再转换为float。即:
df['a']=df.a.replace('',np.nan).astype(float)
you have to replace empty strings (”) with np.nan before converting to float. ie:
df['a']=df.a.replace('',np.nan).astype(float)
回答 5
这是一个例子
GHI Temp Power Day_Type
2016-03-15 06:00:00 -7.99999952505459e-7 18.3 0 NaN
2016-03-15 06:01:00 -7.99999952505459e-7 18.2 0 NaN
2016-03-15 06:02:00 -7.99999952505459e-7 18.3 0 NaN
2016-03-15 06:03:00 -7.99999952505459e-7 18.3 0 NaN
2016-03-15 06:04:00 -7.99999952505459e-7 18.3 0 NaN
但是如果这都是字符串值…就像我这样…将所需的列转换为浮点数:
df_inv_29['GHI'] = df_inv_29.GHI.astype(float)
df_inv_29['Temp'] = df_inv_29.Temp.astype(float)
df_inv_29['Power'] = df_inv_29.Power.astype(float)
您的数据框现在将具有浮点值:-)
Here is an example
GHI Temp Power Day_Type
2016-03-15 06:00:00 -7.99999952505459e-7 18.3 0 NaN
2016-03-15 06:01:00 -7.99999952505459e-7 18.2 0 NaN
2016-03-15 06:02:00 -7.99999952505459e-7 18.3 0 NaN
2016-03-15 06:03:00 -7.99999952505459e-7 18.3 0 NaN
2016-03-15 06:04:00 -7.99999952505459e-7 18.3 0 NaN
but if this is all string values…as was in my case…
Convert the desired columns to floats:
df_inv_29['GHI'] = df_inv_29.GHI.astype(float)
df_inv_29['Temp'] = df_inv_29.Temp.astype(float)
df_inv_29['Power'] = df_inv_29.Power.astype(float)
Your dataframe will now have float values :-)