Python 实用宝典

Question 1

创建给定大小的零填充熊猫数据框的最佳方法是什么？

我用过了：

zero_data = np.zeros(shape=(len(data),len(feature_list)))
d = pd.DataFrame(zero_data, columns=feature_list)

有更好的方法吗？

Question 2

What is the best way to create a zero-filled pandas data frame of a given size?

I have used:

zero_data = np.zeros(shape=(len(data),len(feature_list)))
d = pd.DataFrame(zero_data, columns=feature_list)

Is there a better way to do it?

Question 3

您可以尝试以下方法：

d = pd.DataFrame(0, index=np.arange(len(data)), columns=feature_list)

Question 4

You can try this:

d = pd.DataFrame(0, index=np.arange(len(data)), columns=feature_list)

Question 5

我认为最好用numpy做到这一点

import numpy as np
import pandas as pd
d = pd.DataFrame(np.zeros((N_rows, N_cols)))

Question 6

It’s best to do this with numpy in my opinion

import numpy as np
import pandas as pd
d = pd.DataFrame(np.zeros((N_rows, N_cols)))

Question 7

类似于@Shravan，但不使用numpy：

  height = 10
  width = 20
  df_0 = pd.DataFrame(0, index=range(height), columns=range(width))

然后，您可以使用它做任何您想做的事情：

post_instantiation_fcn = lambda x: str(x)
df_ready_for_whatever = df_0.applymap(post_instantiation_fcn)

Question 8

Similar to @Shravan, but without the use of numpy:

  height = 10
  width = 20
  df_0 = pd.DataFrame(0, index=range(height), columns=range(width))

Then you can do whatever you want with it:

post_instantiation_fcn = lambda x: str(x)
df_ready_for_whatever = df_0.applymap(post_instantiation_fcn)

Question 9

如果您希望新数据框具有与现有数据框相同的索引和列，则可以将现有数据框乘以零：

df_zeros = df * 0

Question 10

If you would like the new data frame to have the same index and columns as an existing data frame, you can just multiply the existing data frame by zero:

df_zeros = df * 0

Question 11

如果您已经有一个数据框，这是最快的方法：

In [1]: columns = ["col{}".format(i) for i in range(10)]
In [2]: orig_df = pd.DataFrame(np.ones((10, 10)), columns=columns)
In [3]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
10000 loops, best of 3: 60.2 µs per loop

相比于：

In [4]: %timeit d = pd.DataFrame(0, index = np.arange(10), columns=columns)
10000 loops, best of 3: 110 µs per loop

In [5]: temp = np.zeros((10, 10))
In [6]: %timeit d = pd.DataFrame(temp, columns=columns)
10000 loops, best of 3: 95.7 µs per loop

Question 12

If you already have a dataframe, this is the fastest way:

In [1]: columns = ["col{}".format(i) for i in range(10)]
In [2]: orig_df = pd.DataFrame(np.ones((10, 10)), columns=columns)
In [3]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
10000 loops, best of 3: 60.2 µs per loop

Compare to:

In [4]: %timeit d = pd.DataFrame(0, index = np.arange(10), columns=columns)
10000 loops, best of 3: 110 µs per loop

In [5]: temp = np.zeros((10, 10))
In [6]: %timeit d = pd.DataFrame(temp, columns=columns)
10000 loops, best of 3: 95.7 µs per loop

Question 13

假设有一个模板DataFrame，要在此处填充零值进行复制…

如果您的数据集中没有NaN，那么乘以零可能会更快：

In [19]: columns = ["col{}".format(i) for i in xrange(3000)]                                                                                       

In [20]: indices = xrange(2000)

In [21]: orig_df = pd.DataFrame(42.0, index=indices, columns=columns)

In [22]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
100 loops, best of 3: 12.6 ms per loop

In [23]: %timeit d = orig_df * 0.0
100 loops, best of 3: 7.17 ms per loop

改进取决于DataFrame的大小，但从未发现它会变慢。

只是为了它：

In [24]: %timeit d = orig_df * 0.0 + 1.0
100 loops, best of 3: 13.6 ms per loop

In [25]: %timeit d = pd.eval('orig_df * 0.0 + 1.0')
100 loops, best of 3: 8.36 ms per loop

但：

In [24]: %timeit d = orig_df.copy()
10 loops, best of 3: 24 ms per loop

编辑！！！

假设您有一个使用float64的框架，那么这将是最快的！通过将0.0替换为所需的填充编号，它还可以生成任何值。

In [23]: %timeit d = pd.eval('orig_df > 1.7976931348623157e+308 + 0.0')
100 loops, best of 3: 3.68 ms per loop

根据口味的不同，可以从外部定义nan，并做出通用的解决方案，而与特定的浮点类型无关：

In [39]: nan = np.nan
In [40]: %timeit d = pd.eval('orig_df > nan + 0.0')
100 loops, best of 3: 4.39 ms per loop

Question 14

Assuming having a template DataFrame, which one would like to copy with zero values filled here…

If you have no NaNs in your data set, multiplying by zero can be significantly faster:

In [19]: columns = ["col{}".format(i) for i in xrange(3000)]                                                                                       

In [20]: indices = xrange(2000)

In [21]: orig_df = pd.DataFrame(42.0, index=indices, columns=columns)

In [22]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
100 loops, best of 3: 12.6 ms per loop

In [23]: %timeit d = orig_df * 0.0
100 loops, best of 3: 7.17 ms per loop

Improvement depends on DataFrame size, but never found it slower.

And just for the heck of it:

In [24]: %timeit d = orig_df * 0.0 + 1.0
100 loops, best of 3: 13.6 ms per loop

In [25]: %timeit d = pd.eval('orig_df * 0.0 + 1.0')
100 loops, best of 3: 8.36 ms per loop

But:

In [24]: %timeit d = orig_df.copy()
10 loops, best of 3: 24 ms per loop

EDIT!!!

Assuming you have a frame using float64, this will be the fastest by a huge margin! It is also able to generate any value by replacing 0.0 to the desired fill number.

In [23]: %timeit d = pd.eval('orig_df > 1.7976931348623157e+308 + 0.0')
100 loops, best of 3: 3.68 ms per loop

Depending on taste, one can externally define nan, and do a general solution, irrespective of the particular float type:

In [39]: nan = np.nan
In [40]: %timeit d = pd.eval('orig_df > nan + 0.0')
100 loops, best of 3: 4.39 ms per loop

Python 实用宝典

创建零填充的熊猫数据框

问题：创建零填充的熊猫数据框

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

有趣好用的Python教程