标签归档:size

如何使用Pandas创建随机整数的DataFrame?

问题:如何使用Pandas创建随机整数的DataFrame?

我知道如果我使用randn

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))

给了我我想要的东西,但是带有正态分布的元素。但是,如果我只想要随机整数怎么办?

randint通过提供范围来工作,但不能像提供数组那样randn工作。那么我该如何使用某个范围之间的随机整数呢?

I know that if I use randn,

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))

gives me what I am looking for, but with elements from a normal distribution. But what if I just wanted random integers?

randint works by providing a range, but not an array like randn does. So how do I do this with random integers between some range?


回答 0

numpy.random.randint接受第三个参数(size),您可以在其中指定输出数组的大小。您可以使用它来创建DataFrame

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

此处- np.random.randint(0,100,size=(100, 4))创建一个大小为的输出数组,(100,4)其中的随机整数元素在之间[0,100)


演示-

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

生成:

     A   B   C   D
0   45  88  44  92
1   62  34   2  86
2   85  65  11  31
3   74  43  42  56
4   90  38  34  93
5    0  94  45  10
6   58  23  23  60
..  ..  ..  ..  ..

numpy.random.randint accepts a third argument (size) , in which you can specify the size of the output array. You can use this to create your DataFrame

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

Here – np.random.randint(0,100,size=(100, 4)) – creates an output array of size (100,4) with random integer elements between [0,100) .


Demo –

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

which produces:

     A   B   C   D
0   45  88  44  92
1   62  34   2  86
2   85  65  11  31
3   74  43  42  56
4   90  38  34  93
5    0  94  45  10
6   58  23  23  60
..  ..  ..  ..  ..

回答 1

如今,建议使用NumPy创建随机整数的方法是使用numpy.random.Generator.integers。(文件

import numpy as np
import pandas as pd

rng = np.random.default_rng()
df = pd.DataFrame(rng.integers(0, 100, size=(100, 4)), columns=list('ABCD'))
df
----------------------
      A    B    C    D
 0   58   96   82   24
 1   21    3   35   36
 2   67   79   22   78
 3   81   65   77   94
 4   73    6   70   96
... ...  ...  ...  ...
95   76   32   28   51
96   33   68   54   77
97   76   43   57   43
98   34   64   12   57
99   81   77   32   50
100 rows × 4 columns

The recommended way to create random integers with NumPy these days is to use numpy.random.Generator.integers. (documentation)

import numpy as np
import pandas as pd

rng = np.random.default_rng()
df = pd.DataFrame(rng.integers(0, 100, size=(100, 4)), columns=list('ABCD'))
df
----------------------
      A    B    C    D
 0   58   96   82   24
 1   21    3   35   36
 2   67   79   22   78
 3   81   65   77   94
 4   73    6   70   96
... ...  ...  ...  ...
95   76   32   28   51
96   33   68   54   77
97   76   43   57   43
98   34   64   12   57
99   81   77   32   50
100 rows × 4 columns

Python列表可以有多大?

问题:Python列表可以有多大?

在Python中,列表可以有多大?我需要大约12000个元素的列表。我仍然可以运行列表方法(例如排序等)吗?

In Python, how big can a list get? I need a list of about 12000 elements. Will I still be able to run list methods such as sorting, etc?


回答 0

根据源代码,列表的最大大小为PY_SSIZE_T_MAX/sizeof(PyObject*)

PY_SSIZE_T_MAXpyport.h中定义为((size_t) -1)>>1

在常规的32位系统上,这是(4294967295/2)/ 4或536870912。

因此,在32位系统上,python列表的最大大小为536,870,912个元素。

只要您拥有的元素数量等于或小于此数量,所有列表函数都应正确运行。

According to the source code, the maximum size of a list is PY_SSIZE_T_MAX/sizeof(PyObject*).

PY_SSIZE_T_MAX is defined in pyport.h to be ((size_t) -1)>>1

On a regular 32bit system, this is (4294967295 / 2) / 4 or 536870912.

Therefore the maximum size of a python list on a 32 bit system is 536,870,912 elements.

As long as the number of elements you have is equal or below this, all list functions should operate correctly.


回答 1

Python文档所述

sys.maxsize

平台的Py_ssize_t类型支持的最大正整数,因此列表,字符串,字典和许多其他容器可以具有的最大大小。

在我的计算机(Linux x86_64)中:

>>> import sys
>>> print sys.maxsize
9223372036854775807

As the Python documentation says:

sys.maxsize

The largest positive integer supported by the platform’s Py_ssize_t type, and thus the maximum size lists, strings, dicts, and many other containers can have.

In my computer (Linux x86_64):

>>> import sys
>>> print sys.maxsize
9223372036854775807

回答 2

当然可以。实际上,您可以轻松地自己看到:

l = range(12000)
l = sorted(l, reverse=True)

在我的机器上运行这些行需要:

real    0m0.036s
user    0m0.024s
sys  0m0.004s

但是可以肯定,正如其他人所说。数组越大,操作将越慢。

Sure it is OK. Actually you can see for yourself easily:

l = range(12000)
l = sorted(l, reverse=True)

Running the those lines on my machine took:

real    0m0.036s
user    0m0.024s
sys  0m0.004s

But sure as everyone else said. The larger the array the slower the operations will be.


回答 3

在临时代码中,我创建了包含数百万个元素的列表。我相信Python的列表实现仅受系统上内存量的限制。

此外,尽管列表很大,但列表方法/函数仍应继续工作。

如果您关心性能,那么值得研究一下NumPy之类的库。

In casual code I’ve created lists with millions of elements. I believe that Python’s implementation of lists are only bound by the amount of memory on your system.

In addition, the list methods / functions should continue to work despite the size of the list.

If you care about performance, it might be worthwhile to look into a library such as NumPy.


回答 4

清单的性能特征在Effbot 进行了描述。

Python列表实际上是作为用于快速随机访问的向量实现的,因此容器基本上将容纳与内存中的空间一样多的项目。(您需要用于列表中包含的指针的空间以及在内存中用于指向的对象的空间。)

追加是O(1)(摊销的恒定复杂度),但是,插入/从序列中间删除将需要O(n)(线性复杂度)重新排序,这将随着列表中元素数量的增加而变慢。

您的排序问题更加细微,因为比较操作可能会花费无数的时间。如果您执行的比较缓慢,则需要花费很长时间,尽管这不是Python的list数据类型的错。

反转只需要交换列表中所有指针所需的时间O(n)(由于触摸每个指针一次,所以有必要(线性复杂度))。

Performance characteristics for lists are described on Effbot.

Python lists are actually implemented as vector for fast random access, so the container will basically hold as many items as there is space for in memory. (You need space for pointers contained in the list as well as space in memory for the object(s) being pointed to.)

Appending is O(1) (amortized constant complexity), however, inserting into/deleting from the middle of the sequence will require an O(n) (linear complexity) reordering, which will get slower as the number of elements in your list.

Your sorting question is more nuanced, since the comparison operation can take an unbounded amount of time. If you’re performing really slow comparisons, it will take a long time, though it’s no fault of Python’s list data type.

Reversal just takes the amount of time it required to swap all the pointers in the list (necessarily O(n) (linear complexity), since you touch each pointer once).


回答 5

12000个元素在Python中什么都没有…实际上,只要Python解释器在您的系统上具有内存,元素的数量就可以增加。

12000 elements is nothing in Python… and actually the number of elements can go as far as the Python interpreter has memory on your system.


回答 6

对于不同的系统,它会有所不同(取决于RAM)。最简单的找出方法是

import six six.MAXSIZE 9223372036854775807 这使的最大尺寸listdict太,按照该文件

It varies for different systems (depends on RAM). The easiest way to find out is

import six six.MAXSIZE 9223372036854775807 This gives the max size of list and dict too ,as per the documentation


回答 7

我想说,您仅受可用RAM总量的限制。显然,数组越大,对其进行的操作就越长。

I’d say you’re only limited by the total amount of RAM available. Obviously the larger the array the longer operations on it will take.


回答 8

我是在x64位系统上从这里获得的:win32上的Python 3.7.0b5(v3.7.0b5:abb8802389,2018年5月31日,01:54:01)[MSC v.1913 64位(AMD64)]

I got this from here on a x64 bit system: Python 3.7.0b5 (v3.7.0b5:abb8802389, May 31 2018, 01:54:01) [MSC v.1913 64 bit (AMD64)] on win32


回答 9

列表号没有限制。导致错误的主要原因是RAM。请升级您的内存大小。

There is no limitation of list number. The main reason which causes your error is the RAM. Please upgrade your memory size.


如何使用matplotlib.pyplot更改图例大小

问题:如何使用matplotlib.pyplot更改图例大小

这里有一个简单的问题:我试图使用matplotlib.pyplot较小的图例(即,文本较小)。我正在使用的代码是这样的:

plot.figure()
plot.scatter(k, sum_cf, color='black', label='Sum of Cause Fractions')
plot.scatter(k, data[:, 0],  color='b', label='Dis 1: cf = .6, var = .2')
plot.scatter(k, data[:, 1],  color='r',  label='Dis 2: cf = .2, var = .1')
plot.scatter(k, data[:, 2],  color='g', label='Dis 3: cf = .1, var = .01')
plot.legend(loc=2)

Simple question here: I’m trying to get the size of my legend using matplotlib.pyplot to be smaller (i.e., the text to be smaller). The code I’m using goes something like this:

plot.figure()
plot.scatter(k, sum_cf, color='black', label='Sum of Cause Fractions')
plot.scatter(k, data[:, 0],  color='b', label='Dis 1: cf = .6, var = .2')
plot.scatter(k, data[:, 1],  color='r',  label='Dis 2: cf = .2, var = .1')
plot.scatter(k, data[:, 2],  color='g', label='Dis 3: cf = .1, var = .01')
plot.legend(loc=2)

回答 0

您可以通过调整prop关键字为图例设置单独的字体大小。

plot.legend(loc=2, prop={'size': 6})

这需要对应于matplotlib.font_manager.FontProperties属性的关键字字典。请参阅说明文件的文档

关键字参数:

prop: [ None | FontProperties | dict ]
    A matplotlib.font_manager.FontProperties instance. If prop is a 
    dictionary, a new instance will be created with prop. If None, use
    rc settings.

1.2.1版开始,也可以使用关键字fontsize

You can set an individual font size for the legend by adjusting the prop keyword.

plot.legend(loc=2, prop={'size': 6})

This takes a dictionary of keywords corresponding to matplotlib.font_manager.FontProperties properties. See the documentation for legend:

Keyword arguments:

prop: [ None | FontProperties | dict ]
    A matplotlib.font_manager.FontProperties instance. If prop is a 
    dictionary, a new instance will be created with prop. If None, use
    rc settings.

It is also possible, as of version 1.2.1, to use the keyword fontsize.


回答 1

这应该做

import pylab as plot
params = {'legend.fontsize': 20,
          'legend.handlelength': 2}
plot.rcParams.update(params)

然后再做图。

还有很多其他rcParam,它们也可以在matplotlibrc文件中设置。

大概还可以通过matplotlib.font_manager.FontProperties实例更改它,但是我不知道该怎么做。->请参阅Yann的答案。

This should do

import pylab as plot
params = {'legend.fontsize': 20,
          'legend.handlelength': 2}
plot.rcParams.update(params)

Then do the plot afterwards.

There are a ton of other rcParams, they can also be set in the matplotlibrc file.

Also presumably you can change it passing a matplotlib.font_manager.FontProperties instance but this I don’t know how to do. –> see Yann’s answer.


回答 2

使用 import matplotlib.pyplot as plt

方法1:调用图例时指定字体大小(重复)

plt.legend(fontsize=20) # using a size in points
plt.legend(fontsize="x-large") # using a named size

使用此方法,您可以在创建时为每个图例设置字体大小(允许您拥有多个具有不同字体大小的图例)。但是,每次创建图例时,都必须手动键入所有内容。

(注意:@ Mathias711在他的答案中列出了可用的命名字体大小)

方法2:在rcParams中指定字体大小(方便)

plt.rc('legend',fontsize=20) # using a size in points
plt.rc('legend',fontsize='medium') # using a named size

使用此方法,您可以设置默认的图例字体大小,除非使用方法1另行指定,否则所有图例将自动使用该字体。这意味着您可以在代码开头设置图例字体大小,而不必担心为每个图例设置它。

如果你使用了一个名为大小例如'medium',那么传说中的文本将与全球规模font.sizercParams。改变font.size用途plt.rc(font.size='medium')

using import matplotlib.pyplot as plt

Method 1: specify the fontsize when calling legend (repetitive)

plt.legend(fontsize=20) # using a size in points
plt.legend(fontsize="x-large") # using a named size

With this method you can set the fontsize for each legend at creation (allowing you to have multiple legends with different fontsizes). However, you will have to type everything manually each time you create a legend.

(Note: @Mathias711 listed the available named fontsizes in his answer)

Method 2: specify the fontsize in rcParams (convenient)

plt.rc('legend',fontsize=20) # using a size in points
plt.rc('legend',fontsize='medium') # using a named size

With this method you set the default legend fontsize, and all legends will automatically use that unless you specify otherwise using method 1. This means you can set your legend fontsize at the beginning of your code, and not worry about setting it for each individual legend.

If you use a named size e.g. 'medium', then the legend text will scale with the global font.size in rcParams. To change font.size use plt.rc(font.size='medium')


回答 3

除了点的大小,还有一些命名的fontsizes

xx-small
x-small
small
medium
large
x-large
xx-large

用法:

pyplot.legend(loc=2, fontsize = 'x-small')

There are also a few named fontsizes, apart from the size in points:

xx-small
x-small
small
medium
large
x-large
xx-large

Usage:

pyplot.legend(loc=2, fontsize = 'x-small')

回答 4

有多种设置可用于调整图例大小。我发现最有用的两个是:

  • labelspacing:以字体大小的倍数设置标签条目之间的间距。例如使用10磅字体,legend(..., labelspacing=0.2)会将条目之间的间距减少到2点。我安装的默认值约为0.5。
  • prop:可以完全控制字体大小等。您可以使用设置8点字体legend(..., prop={'size':8})。我安装的默认值约为14点。

此外,图例的文档列出了许多其他填充的和间隔的参数,包括:borderpadhandlelengthhandletextpadborderaxespad,和columnspacing。这些都遵循相同的格式,与labelspacing和area相同,也是fontsize的倍数。

也可以使用matplotlibrc文件将这些值设置为所有图形的默认值。

There are multiple settings for adjusting the legend size. The two I find most useful are:

  • labelspacing: which sets the spacing between label entries in multiples of the font size. For instance with a 10 point font, legend(..., labelspacing=0.2) will reduce the spacing between entries to 2 points. The default on my install is about 0.5.
  • prop: which allows full control of the font size, etc. You can set an 8 point font using legend(..., prop={'size':8}). The default on my install is about 14 points.

In addition, the legend documentation lists a number of other padding and spacing parameters including: borderpad, handlelength, handletextpad, borderaxespad, and columnspacing. These all follow the same form as labelspacing and area also in multiples of fontsize.

These values can also be set as the defaults for all figures using the matplotlibrc file.


回答 5

在我的安装中,FontProperties仅更改文本大小,但它仍然太大且间隔开。我在pyplot.rcParams:中找到了一个参数legend.labelspacing,我猜它被设置为字体大小的一小部分。我已经改变了

pyplot.rcParams.update({'legend.labelspacing':0.25})

我不确定如何将其指定给pyplot.legend函数-传递

prop={'labelspacing':0.25}

要么

prop={'legend.labelspacing':0.25}

返回错误。

On my install, FontProperties only changes the text size, but it’s still too large and spaced out. I found a parameter in pyplot.rcParams: legend.labelspacing, which I’m guessing is set to a fraction of the font size. I’ve changed it with

pyplot.rcParams.update({'legend.labelspacing':0.25})

I’m not sure how to specify it to the pyplot.legend function – passing

prop={'labelspacing':0.25}

or

prop={'legend.labelspacing':0.25}

comes back with an error.


回答 6

plot.legend(loc =’右下角’,decimal_places = 2,fontsize =’11’,title =’嘿’,title_fontsize =’20’)

plot.legend(loc = ‘lower right’, decimal_places = 2, fontsize = ’11’, title = ‘Hey there’, title_fontsize = ’20’)