如何将熊猫系列或索引转换为Numpy数组？

Question 1

Do you know how to get the index or column of a DataFrame as a NumPy array or python list?

Question 2

To get a NumPy array, you should use the values attribute:

In [1]: df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c']); df
   A  B
a  1  4
b  2  5
c  3  6

In [2]: df.index.values
Out[2]: array(['a', 'b', 'c'], dtype=object)

This accesses how the data is already stored, so there’s no need for a conversion.
Note: This attribute is also available for many other pandas’ objects.

In [3]: df['A'].values
Out[3]: Out[16]: array([1, 2, 3])

To get the index as a list, call tolist:

In [4]: df.index.tolist()
Out[4]: ['a', 'b', 'c']

And similarly, for columns.

Question 3

You can use df.index to access the index object and then get the values in a list using df.index.tolist(). Similarly, you can use df['col'].tolist() for Series.

Question 4

pandas >= 0.24

Deprecate your usage of `.values` in favour of these methods!

From v0.24.0 onwards, we will have two brand spanking new, preferred methods for obtaining NumPy arrays from Index, Series, and DataFrame objects: they are to_numpy(), and .array. Regarding usage, the docs mention:

We haven’t removed or deprecated Series.values or DataFrame.values, but we highly recommend and using .array or .to_numpy() instead.

See this section of the v0.24.0 release notes for more information.

to_numpy() Method

df.index.to_numpy()
# array(['a', 'b'], dtype=object)

df['A'].to_numpy()
#  array([1, 4])

By default, a view is returned. Any modifications made will affect the original.

v = df.index.to_numpy()
v[0] = -1

df
    A  B
-1  1  2
b   4  5

If you need a copy instead, use to_numpy(copy=True);

v = df.index.to_numpy(copy=True)
v[-1] = -123

df
   A  B
a  1  2
b  4  5

Note that this function also works for DataFrames (while .array does not).

array Attribute
This attribute returns an ExtensionArray object that backs the Index/Series.

pd.__version__
# '0.24.0rc1'

# Setup.
df = pd.DataFrame([[1, 2], [4, 5]], columns=['A', 'B'], index=['a', 'b'])
df

   A  B
a  1  2
b  4  5

df.index.array    
# <PandasArray>
# ['a', 'b']
# Length: 2, dtype: object

df['A'].array
# <PandasArray>
# [1, 4]
# Length: 2, dtype: int64

From here, it is possible to get a list using list:

list(df.index.array)
# ['a', 'b']

list(df['A'].array)
# [1, 4]

or, just directly call .tolist():

df.index.tolist()
# ['a', 'b']

df['A'].tolist()
# [1, 4]

Regarding what is returned, the docs mention,

For Series and Indexes backed by normal NumPy arrays, Series.array will return a new arrays.PandasArray, which is a thin (no-copy) wrapper around a numpy.ndarray. arrays.PandasArray isn’t especially useful on its own, but it does provide the same interface as any extension array defined in pandas or by a third-party library.

So, to summarise, .array will return either

The existing ExtensionArray backing the Index/Series, or
If there is a NumPy array backing the series, a new ExtensionArray object is created as a thin wrapper over the underlying array.

Rationale for adding TWO new methods
These functions were added as a result of discussions under two GitHub issues GH19954 and GH23623.

Specifically, the docs mention the rationale:

[…] with .values it was unclear whether the returned value would be the actual array, some transformation of it, or one of pandas custom arrays (like Categorical). For example, with PeriodIndex, .values generates a new ndarray of period objects each time. […]

These two functions aim to improve the consistency of the API, which is a major step in the right direction.

Lastly, .values will not be deprecated in the current version, but I expect this may happen at some point in the future, so I would urge users to migrate towards the newer API, as soon as you can.

Question 5

If you are dealing with a multi-index dataframe, you may be interested in extracting only the column of one name of the multi-index. You can do this as

df.index.get_level_values('name_sub_index')

and of course name_sub_index must be an element of the FrozenList df.index.names

Question 6

Since pandas v0.13 you can also use get_values:

df.index.get_values()

Question 7

I converted the pandas dataframe to list and then used the basic list.index(). Something like this:

dd = list(zone[0]) #Where zone[0] is some specific column of the table
idx = dd.index(filename[i])

You have you index value as idx.

Question 8

A more recent way to do this is to use the .to_numpy() function.

If I have a dataframe with a column ‘price’, I can convert it as follows:

priceArray = df['price'].to_numpy()

You can also pass the data type, such as float or object, as an argument of the function

Question 9

Below is a simple way to convert dataframe column into numpy array.

df = pd.DataFrame(somedict) 
ytrain = df['label']
ytrain_numpy = np.array([x for x in ytrain['label']])

ytrain_numpy is a numpy array.

I tried with to.numpy() but it gave me the below error: TypeError: no supported conversion for types: (dtype(‘O’),) while doing Binary Relevance classfication using Linear SVC. to.numpy() was converting the dataFrame into numpy array but the inner element’s data type was list because of which the above error was observed.

如何将熊猫系列或索引转换为Numpy数组？

问题：如何将熊猫系列或索引转换为Numpy数组？

回答 0

回答 1

回答 2

熊猫> = 0.24

`.values`不赞成使用这些方法，而推荐使用这些方法！

pandas >= 0.24

Deprecate your usage of `.values` in favour of these methods!

回答 3

回答 4

回答 5

回答 6

回答 7

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

7行代码 Python热力图可视化分析缺失数据处理

Python 流程图 — 一键转化代码为流程图

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

使用Python的re.compile是否值得？

“ yield”关键字有什么作用？

Tensorflow后端的Keras能否被迫随意使用CPU或GPU？

如何验证一个列表是否是另一个列表的子集？

确定整数是否在其他两个整数之间？

如何基于Paddle训练一个98%准确率的抑郁文本预测模型

如何将熊猫系列或索引转换为Numpy数组？

问题：如何将熊猫系列或索引转换为Numpy数组？

回答 0

回答 1

回答 2

熊猫> = 0.24

.values不赞成使用这些方法，而推荐使用这些方法！

pandas >= 0.24

Deprecate your usage of .values in favour of these methods!

回答 3

回答 4

回答 5

回答 6

回答 7

相关文章

排行榜展示

文章展示

`.values`不赞成使用这些方法，而推荐使用这些方法！

Deprecate your usage of `.values` in favour of these methods!