标签归档:Python

Python分组依据

问题:Python分组依据

假设我有一组数据对,其中索引0是值,索引1是类型:

input = [
          ('11013331', 'KAT'), 
          ('9085267',  'NOT'), 
          ('5238761',  'ETH'), 
          ('5349618',  'ETH'), 
          ('11788544', 'NOT'), 
          ('962142',   'ETH'), 
          ('7795297',  'ETH'), 
          ('7341464',  'ETH'), 
          ('9843236',  'KAT'), 
          ('5594916',  'ETH'), 
          ('1550003',  'ETH')
        ]

我想按它们的类型(按第一个索引字符串)将它们分组,如下所示:

result = [ 
           { 
             type:'KAT', 
             items: ['11013331', '9843236'] 
           },
           {
             type:'NOT', 
             items: ['9085267', '11788544'] 
           },
           {
             type:'ETH', 
             items: ['5238761', '962142', '7795297', '7341464', '5594916', '1550003'] 
           }
         ] 

如何有效地做到这一点?

Assume that I have a set of data pair where index 0 is the value and index 1 is the type:

input = [
          ('11013331', 'KAT'), 
          ('9085267',  'NOT'), 
          ('5238761',  'ETH'), 
          ('5349618',  'ETH'), 
          ('11788544', 'NOT'), 
          ('962142',   'ETH'), 
          ('7795297',  'ETH'), 
          ('7341464',  'ETH'), 
          ('9843236',  'KAT'), 
          ('5594916',  'ETH'), 
          ('1550003',  'ETH')
        ]

I want to group them by their type (by the 1st indexed string) as such:

result = [ 
           { 
             type:'KAT', 
             items: ['11013331', '9843236'] 
           },
           {
             type:'NOT', 
             items: ['9085267', '11788544'] 
           },
           {
             type:'ETH', 
             items: ['5238761', '962142', '7795297', '7341464', '5594916', '1550003'] 
           }
         ] 

How can I achieve this in an efficient way?


回答 0

分两步完成。首先,创建字典。

>>> input = [('11013331', 'KAT'), ('9085267', 'NOT'), ('5238761', 'ETH'), ('5349618', 'ETH'), ('11788544', 'NOT'), ('962142', 'ETH'), ('7795297', 'ETH'), ('7341464', 'ETH'), ('9843236', 'KAT'), ('5594916', 'ETH'), ('1550003', 'ETH')]
>>> from collections import defaultdict
>>> res = defaultdict(list)
>>> for v, k in input: res[k].append(v)
...

然后,将该字典转换为预期的格式。

>>> [{'type':k, 'items':v} for k,v in res.items()]
[{'items': ['9085267', '11788544'], 'type': 'NOT'}, {'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}]

使用itertools.groupby也可以,但是它要求输入首先被排序。

>>> sorted_input = sorted(input, key=itemgetter(1))
>>> groups = groupby(sorted_input, key=itemgetter(1))
>>> [{'type':k, 'items':[x[0] for x in v]} for k, v in groups]
[{'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}, {'items': ['9085267', '11788544'], 'type': 'NOT'}]

请注意,这两个都不遵守键的原始顺序。如果需要保留订单,则需要一个OrderedDict。

>>> from collections import OrderedDict
>>> res = OrderedDict()
>>> for v, k in input:
...   if k in res: res[k].append(v)
...   else: res[k] = [v]
... 
>>> [{'type':k, 'items':v} for k,v in res.items()]
[{'items': ['11013331', '9843236'], 'type': 'KAT'}, {'items': ['9085267', '11788544'], 'type': 'NOT'}, {'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}]

Do it in 2 steps. First, create a dictionary.

>>> input = [('11013331', 'KAT'), ('9085267', 'NOT'), ('5238761', 'ETH'), ('5349618', 'ETH'), ('11788544', 'NOT'), ('962142', 'ETH'), ('7795297', 'ETH'), ('7341464', 'ETH'), ('9843236', 'KAT'), ('5594916', 'ETH'), ('1550003', 'ETH')]
>>> from collections import defaultdict
>>> res = defaultdict(list)
>>> for v, k in input: res[k].append(v)
...

Then, convert that dictionary into the expected format.

>>> [{'type':k, 'items':v} for k,v in res.items()]
[{'items': ['9085267', '11788544'], 'type': 'NOT'}, {'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}]

It is also possible with itertools.groupby but it requires the input to be sorted first.

>>> sorted_input = sorted(input, key=itemgetter(1))
>>> groups = groupby(sorted_input, key=itemgetter(1))
>>> [{'type':k, 'items':[x[0] for x in v]} for k, v in groups]
[{'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}, {'items': ['9085267', '11788544'], 'type': 'NOT'}]

Note both of these do not respect the original order of the keys. You need an OrderedDict if you need to keep the order.

>>> from collections import OrderedDict
>>> res = OrderedDict()
>>> for v, k in input:
...   if k in res: res[k].append(v)
...   else: res[k] = [v]
... 
>>> [{'type':k, 'items':v} for k,v in res.items()]
[{'items': ['11013331', '9843236'], 'type': 'KAT'}, {'items': ['9085267', '11788544'], 'type': 'NOT'}, {'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}]

回答 1

Python的内置itertools模块实际上具有一个groupbyfunction,但是为此,必须首先对要分组的元素进行排序,以使要分组的元素在列表中是连续的:

from operator import itemgetter
sortkeyfn = itemgetter(1)
input = [('11013331', 'KAT'), ('9085267', 'NOT'), ('5238761', 'ETH'), 
 ('5349618', 'ETH'), ('11788544', 'NOT'), ('962142', 'ETH'), ('7795297', 'ETH'), 
 ('7341464', 'ETH'), ('9843236', 'KAT'), ('5594916', 'ETH'), ('1550003', 'ETH')] 
input.sort(key=sortkeyfn)

现在输入看起来像:

[('5238761', 'ETH'), ('5349618', 'ETH'), ('962142', 'ETH'), ('7795297', 'ETH'),
 ('7341464', 'ETH'), ('5594916', 'ETH'), ('1550003', 'ETH'), ('11013331', 'KAT'),
 ('9843236', 'KAT'), ('9085267', 'NOT'), ('11788544', 'NOT')]

groupby返回格式为的2元组序列(key, values_iterator)。我们想要的是将其转换为字典列表,其中“类型”是键,而“项目”是values_iterator返回的元组的第0个元素的列表。像这样:

from itertools import groupby
result = []
for key,valuesiter in groupby(input, key=sortkeyfn):
    result.append(dict(type=key, items=list(v[0] for v in valuesiter)))

现在result包含您想要的字典,如您的问题所述。

但是,您可能会考虑仅对此做出一个单独的dict,按类型键入,每个值都包含值列表。在当前形式中,要查找特定类型的值,必须遍历列表以查找包含匹配的“ type”键的字典,然后从中获取“ items”元素。如果您使用单个词典而不是一个1项词典的列表,则可以通过在主词典中进行单键查找来查找特定类型的项目。使用groupby,这看起来像:

result = {}
for key,valuesiter in groupby(input, key=sortkeyfn):
    result[key] = list(v[0] for v in valuesiter)

result现在包含此字典(这类似于res@KennyTM答案中的中间defaultdict):

{'NOT': ['9085267', '11788544'], 
 'ETH': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 
 'KAT': ['11013331', '9843236']}

(如果您希望将其减少为单层,则可以:

result = dict((key,list(v[0] for v in valuesiter)
              for key,valuesiter in groupby(input, key=sortkeyfn))

或使用新奇的dict-comprehension形式:

result = {key:list(v[0] for v in valuesiter)
              for key,valuesiter in groupby(input, key=sortkeyfn)}

Python’s built-in itertools module actually has a groupby function , but for that the elements to be grouped must first be sorted such that the elements to be grouped are contiguous in the list:

from operator import itemgetter
sortkeyfn = itemgetter(1)
input = [('11013331', 'KAT'), ('9085267', 'NOT'), ('5238761', 'ETH'), 
 ('5349618', 'ETH'), ('11788544', 'NOT'), ('962142', 'ETH'), ('7795297', 'ETH'), 
 ('7341464', 'ETH'), ('9843236', 'KAT'), ('5594916', 'ETH'), ('1550003', 'ETH')] 
input.sort(key=sortkeyfn)

Now input looks like:

[('5238761', 'ETH'), ('5349618', 'ETH'), ('962142', 'ETH'), ('7795297', 'ETH'),
 ('7341464', 'ETH'), ('5594916', 'ETH'), ('1550003', 'ETH'), ('11013331', 'KAT'),
 ('9843236', 'KAT'), ('9085267', 'NOT'), ('11788544', 'NOT')]

groupby returns a sequence of 2-tuples, of the form (key, values_iterator). What we want is to turn this into a list of dicts where the ‘type’ is the key, and ‘items’ is a list of the 0’th elements of the tuples returned by the values_iterator. Like this:

from itertools import groupby
result = []
for key,valuesiter in groupby(input, key=sortkeyfn):
    result.append(dict(type=key, items=list(v[0] for v in valuesiter)))

Now result contains your desired dict, as stated in your question.

You might consider, though, just making a single dict out of this, keyed by type, and each value containing the list of values. In your current form, to find the values for a particular type, you’ll have to iterate over the list to find the dict containing the matching ‘type’ key, and then get the ‘items’ element from it. If you use a single dict instead of a list of 1-item dicts, you can find the items for a particular type with a single keyed lookup into the master dict. Using groupby, this would look like:

result = {}
for key,valuesiter in groupby(input, key=sortkeyfn):
    result[key] = list(v[0] for v in valuesiter)

result now contains this dict (this is similar to the intermediate res defaultdict in @KennyTM’s answer):

{'NOT': ['9085267', '11788544'], 
 'ETH': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 
 'KAT': ['11013331', '9843236']}

(If you want to reduce this to a one-liner, you can:

result = dict((key,list(v[0] for v in valuesiter)
              for key,valuesiter in groupby(input, key=sortkeyfn))

or using the newfangled dict-comprehension form:

result = {key:list(v[0] for v in valuesiter)
              for key,valuesiter in groupby(input, key=sortkeyfn)}

回答 2

我也喜欢熊猫简单的分组。它功能强大,简单,最适合大型数据集

result = pandas.DataFrame(input).groupby(1).groups

I also liked pandas simple grouping. it’s powerful, simple and most adequate for large data set

result = pandas.DataFrame(input).groupby(1).groups


回答 3

此答案类似于@PaulMcG的答案,但不需要对输入进行排序。

对于那些进行函数式编程的人,groupBy可以将其写在一行中(不包括导入!),itertools.groupby与之不同的是,它不需要对输入进行排序:

from functools import reduce # import needed for python3; builtin in python2
from collections import defaultdict

def groupBy(key, seq):
 return reduce(lambda grp, val: grp[key(val)].append(val) or grp, seq, defaultdict(list))

(之所以这样做... or grplambda是因为要reduce()使其正常工作,lambda需要返回其第一个参数;因为list.append()总是返回,None所以or意志总是返回grp。也就是说,它是一个黑客绕过Python的限制,即在拉姆达只能计算一个表达式。)

这将返回一个字典,该字典的键是通过评估给定的函数找到的,其值是按原始顺序列出的原始项目的列表。对于OP的示例,将其称为as groupBy(lambda pair: pair[1], input)将返回此字典:

{'KAT': [('11013331', 'KAT'), ('9843236', 'KAT')],
 'NOT': [('9085267', 'NOT'), ('11788544', 'NOT')],
 'ETH': [('5238761', 'ETH'), ('5349618', 'ETH'), ('962142', 'ETH'), ('7795297', 'ETH'), ('7341464', 'ETH'), ('5594916', 'ETH'), ('1550003', 'ETH')]}

按照@PaulMcG的回答,可以通过将其包装在列表推导中找到OP要求的格式。这样就可以做到:

result = {key: [pair[0] for pair in values],
          for key, values in groupBy(lambda pair: pair[1], input).items()}

This answer is similar to @PaulMcG’s answer but doesn’t require sorting the input.

For those into functional programming, groupBy can be written in one line (not including imports!), and unlike itertools.groupby it doesn’t require the input to be sorted:

from functools import reduce # import needed for python3; builtin in python2
from collections import defaultdict

def groupBy(key, seq):
 return reduce(lambda grp, val: grp[key(val)].append(val) or grp, seq, defaultdict(list))

(The reason for ... or grp in the lambda is that for this reduce() to work, the lambda needs to return its first argument; because list.append() always returns None the or will always return grp. I.e. it’s a hack to get around python’s restriction that a lambda can only evaluate a single expression.)

This returns a dict whose keys are found by evaluating the given function and whose values are a list of the original items in the original order. For the OP’s example, calling this as groupBy(lambda pair: pair[1], input) will return this dict:

{'KAT': [('11013331', 'KAT'), ('9843236', 'KAT')],
 'NOT': [('9085267', 'NOT'), ('11788544', 'NOT')],
 'ETH': [('5238761', 'ETH'), ('5349618', 'ETH'), ('962142', 'ETH'), ('7795297', 'ETH'), ('7341464', 'ETH'), ('5594916', 'ETH'), ('1550003', 'ETH')]}

And as per @PaulMcG’s answer the OP’s requested format can be found by wrapping that in a list comprehension. So this will do it:

result = {key: [pair[0] for pair in values],
          for key, values in groupBy(lambda pair: pair[1], input).items()}

回答 4

以下函数将通过具有任何索引的键快速(无需排序)对任意长度的元组进行分组:

# given a sequence of tuples like [(3,'c',6),(7,'a',2),(88,'c',4),(45,'a',0)],
# returns a dict grouping tuples by idx-th element - with idx=1 we have:
# if merge is True {'c':(3,6,88,4),     'a':(7,2,45,0)}
# if merge is False {'c':((3,6),(88,4)), 'a':((7,2),(45,0))}
def group_by(seqs,idx=0,merge=True):
    d = dict()
    for seq in seqs:
        k = seq[idx]
        v = d.get(k,tuple()) + (seq[:idx]+seq[idx+1:] if merge else (seq[:idx]+seq[idx+1:],))
        d.update({k:v})
    return d

对于您的问题,要分组的键的索引为1,因此:

group_by(input,1)

{'ETH': ('5238761','5349618','962142','7795297','7341464','5594916','1550003'),
 'KAT': ('11013331', '9843236'),
 'NOT': ('9085267', '11788544')}

这不完全是您要求的输出,但也可能满足您的需求。

The following function will quickly (no sorting required) group tuples of any length by a key having any index:

# given a sequence of tuples like [(3,'c',6),(7,'a',2),(88,'c',4),(45,'a',0)],
# returns a dict grouping tuples by idx-th element - with idx=1 we have:
# if merge is True {'c':(3,6,88,4),     'a':(7,2,45,0)}
# if merge is False {'c':((3,6),(88,4)), 'a':((7,2),(45,0))}
def group_by(seqs,idx=0,merge=True):
    d = dict()
    for seq in seqs:
        k = seq[idx]
        v = d.get(k,tuple()) + (seq[:idx]+seq[idx+1:] if merge else (seq[:idx]+seq[idx+1:],))
        d.update({k:v})
    return d

In the case of your question, the index of key you want to group by is 1, therefore:

group_by(input,1)

gives

{'ETH': ('5238761','5349618','962142','7795297','7341464','5594916','1550003'),
 'KAT': ('11013331', '9843236'),
 'NOT': ('9085267', '11788544')}

which is not exactly the output you asked for, but might as well suit your needs.


回答 5

result = []
# Make a set of your "types":
input_set = set([tpl[1] for tpl in input])
>>> set(['ETH', 'KAT', 'NOT'])
# Iterate over the input_set
for type_ in input_set:
    # a dict to gather things:
    D = {}
    # filter all tuples from your input with the same type as type_
    tuples = filter(lambda tpl: tpl[1] == type_, input)
    # write them in the D:
    D["type"] = type_
    D["itmes"] = [tpl[0] for tpl in tuples]
    # append D to results:
    result.append(D)

result
>>> [{'itmes': ['9085267', '11788544'], 'type': 'NOT'}, {'itmes': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'itmes': ['11013331', '9843236'], 'type': 'KAT'}]
result = []
# Make a set of your "types":
input_set = set([tpl[1] for tpl in input])
>>> set(['ETH', 'KAT', 'NOT'])
# Iterate over the input_set
for type_ in input_set:
    # a dict to gather things:
    D = {}
    # filter all tuples from your input with the same type as type_
    tuples = filter(lambda tpl: tpl[1] == type_, input)
    # write them in the D:
    D["type"] = type_
    D["itmes"] = [tpl[0] for tpl in tuples]
    # append D to results:
    result.append(D)

result
>>> [{'itmes': ['9085267', '11788544'], 'type': 'NOT'}, {'itmes': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'itmes': ['11013331', '9843236'], 'type': 'KAT'}]

如何将制表符完成添加到Python Shell?

问题:如何将制表符完成添加到Python Shell?

使用启动django应用程序时python manage.py shell,我得到一个InteractiveConsole shell-我可以使用制表符补全等。

Python 2.5.1 (r251:54863, Apr 15 2008, 22:57:26) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)

当仅使用启动python解释器时python,它不提供制表符补全功能。

有人可以告诉我django在做什么,以便给我一个交互式控制台,或者我需要做些什么来启动没有django应用程序的交互式控制台?

When starting a django application using python manage.py shell, I get an InteractiveConsole shell – I can use tab completion, etc.

Python 2.5.1 (r251:54863, Apr 15 2008, 22:57:26) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)

When just starting a python interpreter using python, it doesn’t offer tab completion.

Can someone tell me what django is doing to give me an interactive console, or what I need to do to start an interactive console without a django app?


回答 0

我可能已经找到了一种方法。

创建一个文件.pythonrc

# ~/.pythonrc
# enable syntax completion
try:
    import readline
except ImportError:
    print("Module readline not available.")
else:
    import rlcompleter
    readline.parse_and_bind("tab: complete")

然后在您的.bashrc文件中,添加

export PYTHONSTARTUP=~/.pythonrc

这似乎有效。

I may have found a way to do it.

Create a file .pythonrc

# ~/.pythonrc
# enable syntax completion
try:
    import readline
except ImportError:
    print("Module readline not available.")
else:
    import rlcompleter
    readline.parse_and_bind("tab: complete")

then in your .bashrc file, add

export PYTHONSTARTUP=~/.pythonrc

That seems to work.


回答 1

我认为django的功能类似于https://docs.python.org/library/rlcompleter.html

如果您想拥有一个非常好的交互式解释器,请查看 IPython

I think django does something like https://docs.python.org/library/rlcompleter.html

If you want to have a really good interactive interpreter have a look at IPython.


回答 2

为了便于记录,本教程将对此进行介绍:http : //docs.python.org/tutorial/interactive.html

For the record, this is covered in the tutorial: http://docs.python.org/tutorial/interactive.html


回答 3

我使用ptpython。 https://github.com/jonathanslenders/ptpython/

ptpython是一个很棒的工具自动完成外壳cmd。安装ptpython非常简单,使用pip工具

pip安装ptpython

对于django shell,应该像这样导入django env

导入操作系统

os.environ.setdefault(“ DJANGO_SETTINGS_MODULE”,“ testweb.settings”)

相信我,这是最好的方式!!!

I use ptpython. https://github.com/jonathanslenders/ptpython/

ptpython is a wonderful tool autocomplete shell cmd. install ptpython is very easy,use pip tool

pip install ptpython

and for django shell,you should import the django env,like this

import os

os.environ.setdefault(“DJANGO_SETTINGS_MODULE”, “testweb.settings”)

Trust me,this is the best way to you!!!


回答 4

修复Windows10 Shell:

  • 点安装pyreadline
  • 点安装ipython [shell]

fix for windows10 shell:

  • pip install pyreadline
  • pip install ipython[shell]

回答 5

看起来python3开箱即用!

It looks like python3 has it out-of box!


回答 6

在Python3中,默认情况下启用此功能。我的系统未readline安装模块。我在Manjaro。我没有在其他Linux发行版(基本版,Ubuntu版,mint版)上遇到此制表符完成问题。

pip安装模块,而输入,它被扔以下错误-

ImportError: libncursesw.so.5: cannot open shared object file: No such file or directory

为了解决这个问题,我跑了

cd /usr/lib ln -s libncursesw.so libncursesw.so.5

这样解决了导入错误。并且,它也使python repl中的制表符完成,而没有对.pythonrcand的任何创建/更改.bashrc

In Python3 this feature is enabled by default. My system didn’t have the module readline installed. I am on Manjaro. I didn’t face this tab completion issue on other linux distributions (elementary, ubuntu, mint).

After pip installing the module, while importing, it was throwing the following error-

ImportError: libncursesw.so.5: cannot open shared object file: No such file or directory

To solve this, I ran-

cd /usr/lib ln -s libncursesw.so libncursesw.so.5

This resolved the import error. And, it also brought the tab completion in the python repl without any creation/changes of .pythonrc and .bashrc.


回答 7

是。它内置于3.6。

fernanr @ gnuruwi〜$ python3.6 Python 3.6.3(默认,Apr 10 2019,14:37:36)[Linux上的GCC 4.8.5 20150623(Red Hat 4.8.5-16)]键入“ help”,“ copyright ”,“信用”或“许可证”以获取更多信息。

导入操作系统。显示所有318种可能性?(y或n)os.CLD_CONTINUED os.O_RDONLY os.ST_NOEXEC os.environ os.getpid(os.readlink(os.spawnvpe(os.CLD_DUMPED os.O_RDWR os.ST_NOSUID os.environb os.getppid(os.getppid( .st

Yes. It’s built in to 3.6.

fernanr@gnuruwi ~ $ python3.6 Python 3.6.3 (default, Apr 10 2019, 14:37:36) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux Type “help”, “copyright”, “credits” or “license” for more information.

import os os. Display all 318 possibilities? (y or n) os.CLD_CONTINUED os.O_RDONLY os.ST_NOEXEC os.environ os.getpid( os.readlink( os.spawnvpe( os.CLD_DUMPED os.O_RDWR os.ST_NOSUID os.environb os.getppid( os.readv( os.st


回答 8

对于旧版本(2.x),上述脚本的工作方式类似于charm :)

fernanr@crsatx4 ~ $ cat .bashrc | grep -i python
#Tab completion for python shell
export PYTHONSTARTUP=~/.pythonrc
fernanr@crsatx4 ~ $ . ~/.bashrc
fernanr@crsatx4 ~ $ echo $?
0
fernanr@crsatx4 ~ $ python2
Python 2.7.5 (default, Jun 11 2019, 14:33:56)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.
Display all 249 possibilities? (y or n)
os.EX_CANTCREAT             os.O_WRONLY                 

For older versions (2.x) above script works like charm :)

fernanr@crsatx4 ~ $ cat .bashrc | grep -i python
#Tab completion for python shell
export PYTHONSTARTUP=~/.pythonrc
fernanr@crsatx4 ~ $ . ~/.bashrc
fernanr@crsatx4 ~ $ echo $?
0
fernanr@crsatx4 ~ $ python2
Python 2.7.5 (default, Jun 11 2019, 14:33:56)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.
Display all 249 possibilities? (y or n)
os.EX_CANTCREAT             os.O_WRONLY                 

如何从熊猫的两列中形成元组列

问题:如何从熊猫的两列中形成元组列

我有一个Pandas DataFrame,我想将’lat’和’long’列组合成一个元组。

<class 'pandas.core.frame.DataFrame'>
Int64Index: 205482 entries, 0 to 209018
Data columns:
Month           205482  non-null values
Reported by     205482  non-null values
Falls within    205482  non-null values
Easting         205482  non-null values
Northing        205482  non-null values
Location        205482  non-null values
Crime type      205482  non-null values
long            205482  non-null values
lat             205482  non-null values
dtypes: float64(4), object(5)

我尝试使用的代码是:

def merge_two_cols(series): 
    return (series['lat'], series['long'])

sample['lat_long'] = sample.apply(merge_two_cols, axis=1)

但是,这返回以下错误:

---------------------------------------------------------------------------
 AssertionError                            Traceback (most recent call last)
<ipython-input-261-e752e52a96e6> in <module>()
      2     return (series['lat'], series['long'])
      3 
----> 4 sample['lat_long'] = sample.apply(merge_two_cols, axis=1)
      5

AssertionError: Block shape incompatible with manager 

我怎么解决这个问题?

I’ve got a Pandas DataFrame and I want to combine the ‘lat’ and ‘long’ columns to form a tuple.

<class 'pandas.core.frame.DataFrame'>
Int64Index: 205482 entries, 0 to 209018
Data columns:
Month           205482  non-null values
Reported by     205482  non-null values
Falls within    205482  non-null values
Easting         205482  non-null values
Northing        205482  non-null values
Location        205482  non-null values
Crime type      205482  non-null values
long            205482  non-null values
lat             205482  non-null values
dtypes: float64(4), object(5)

The code I tried to use was:

def merge_two_cols(series): 
    return (series['lat'], series['long'])

sample['lat_long'] = sample.apply(merge_two_cols, axis=1)

However, this returned the following error:

---------------------------------------------------------------------------
 AssertionError                            Traceback (most recent call last)
<ipython-input-261-e752e52a96e6> in <module>()
      2     return (series['lat'], series['long'])
      3 
----> 4 sample['lat_long'] = sample.apply(merge_two_cols, axis=1)
      5

AssertionError: Block shape incompatible with manager 

How can I solve this problem?


回答 0

适应吧zip。在处理列数据时,它很方便。

df['new_col'] = list(zip(df.lat, df.long))

与使用apply或相比,它不那么复杂且速度更快map。诸如此类的np.dstack速度是的两倍zip,但不会给您元组。

Get comfortable with zip. It comes in handy when dealing with column data.

df['new_col'] = list(zip(df.lat, df.long))

It’s less complicated and faster than using apply or map. Something like np.dstack is twice as fast as zip, but wouldn’t give you tuples.


回答 1

In [10]: df
Out[10]:
          A         B       lat      long
0  1.428987  0.614405  0.484370 -0.628298
1 -0.485747  0.275096  0.497116  1.047605
2  0.822527  0.340689  2.120676 -2.436831
3  0.384719 -0.042070  1.426703 -0.634355
4 -0.937442  2.520756 -1.662615 -1.377490
5 -0.154816  0.617671 -0.090484 -0.191906
6 -0.705177 -1.086138 -0.629708  1.332853
7  0.637496 -0.643773 -0.492668 -0.777344
8  1.109497 -0.610165  0.260325  2.533383
9 -1.224584  0.117668  1.304369 -0.152561

In [11]: df['lat_long'] = df[['lat', 'long']].apply(tuple, axis=1)

In [12]: df
Out[12]:
          A         B       lat      long                             lat_long
0  1.428987  0.614405  0.484370 -0.628298      (0.484370195967, -0.6282975278)
1 -0.485747  0.275096  0.497116  1.047605      (0.497115615839, 1.04760475074)
2  0.822527  0.340689  2.120676 -2.436831      (2.12067574274, -2.43683074367)
3  0.384719 -0.042070  1.426703 -0.634355      (1.42670326172, -0.63435462504)
4 -0.937442  2.520756 -1.662615 -1.377490     (-1.66261469102, -1.37749004179)
5 -0.154816  0.617671 -0.090484 -0.191906  (-0.0904840623396, -0.191905582481)
6 -0.705177 -1.086138 -0.629708  1.332853     (-0.629707821728, 1.33285348929)
7  0.637496 -0.643773 -0.492668 -0.777344   (-0.492667604075, -0.777344111021)
8  1.109497 -0.610165  0.260325  2.533383        (0.26032456699, 2.5333825651)
9 -1.224584  0.117668  1.304369 -0.152561     (1.30436900612, -0.152560909725)
In [10]: df
Out[10]:
          A         B       lat      long
0  1.428987  0.614405  0.484370 -0.628298
1 -0.485747  0.275096  0.497116  1.047605
2  0.822527  0.340689  2.120676 -2.436831
3  0.384719 -0.042070  1.426703 -0.634355
4 -0.937442  2.520756 -1.662615 -1.377490
5 -0.154816  0.617671 -0.090484 -0.191906
6 -0.705177 -1.086138 -0.629708  1.332853
7  0.637496 -0.643773 -0.492668 -0.777344
8  1.109497 -0.610165  0.260325  2.533383
9 -1.224584  0.117668  1.304369 -0.152561

In [11]: df['lat_long'] = df[['lat', 'long']].apply(tuple, axis=1)

In [12]: df
Out[12]:
          A         B       lat      long                             lat_long
0  1.428987  0.614405  0.484370 -0.628298      (0.484370195967, -0.6282975278)
1 -0.485747  0.275096  0.497116  1.047605      (0.497115615839, 1.04760475074)
2  0.822527  0.340689  2.120676 -2.436831      (2.12067574274, -2.43683074367)
3  0.384719 -0.042070  1.426703 -0.634355      (1.42670326172, -0.63435462504)
4 -0.937442  2.520756 -1.662615 -1.377490     (-1.66261469102, -1.37749004179)
5 -0.154816  0.617671 -0.090484 -0.191906  (-0.0904840623396, -0.191905582481)
6 -0.705177 -1.086138 -0.629708  1.332853     (-0.629707821728, 1.33285348929)
7  0.637496 -0.643773 -0.492668 -0.777344   (-0.492667604075, -0.777344111021)
8  1.109497 -0.610165  0.260325  2.533383        (0.26032456699, 2.5333825651)
9 -1.224584  0.117668  1.304369 -0.152561     (1.30436900612, -0.152560909725)

回答 2

熊猫有itertuples方法做到这一点:

list(df[['lat', 'long']].itertuples(index=False, name=None))

Pandas has the itertuples method to do exactly this:

list(df[['lat', 'long']].itertuples(index=False, name=None))

回答 3

我想补充一下df.values.tolist()。(只要您不介意获取列表列而不是元组)

import pandas as pd
import numpy as np

size = int(1e+07)
df = pd.DataFrame({'a': np.random.rand(size), 'b': np.random.rand(size)}) 

%timeit df.values.tolist()
1.47 s ± 38.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit list(zip(df.a,df.b))
1.92 s ± 131 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I’d like to add df.values.tolist(). (as long as you don’t mind to get a column of lists rather than tuples)

import pandas as pd
import numpy as np

size = int(1e+07)
df = pd.DataFrame({'a': np.random.rand(size), 'b': np.random.rand(size)}) 

%timeit df.values.tolist()
1.47 s ± 38.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit list(zip(df.a,df.b))
1.92 s ± 131 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

我可以在GPU上运行Keras模型吗?

问题:我可以在GPU上运行Keras模型吗?

我正在运行Keras模型,提交截止日期为36小时,如果我在cpu上训练我的模型大约需要50个小时,是否可以在gpu上运行Keras?

我正在使用Tensorflow后端,并在未安装anaconda的Jupyter笔记本上运行它。

I’m running a Keras model, with a submission deadline of 36 hours, if I train my model on the cpu it will take approx 50 hours, is there a way to run Keras on gpu?

I’m using Tensorflow backend and running it on my Jupyter notebook, without anaconda installed.


回答 0

是的,您可以在GPU上运行keras模型。几件事您将必须首先检查。

  1. 您的系统具有GPU(Nvidia。因为AMD尚未运行)
  2. 您已经安装了Tensorflow的GPU版本
  3. 您已安装CUDA 安装说明
  4. 验证Tensorflow是否与GPU一起运行,检查GPU是否正常工作

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

要么

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

输出将是这样的:

[
  name: "/cpu:0"device_type: "CPU",
  name: "/gpu:0"device_type: "GPU"
]

完成所有这些操作后,您的模型将在GPU上运行:

要检查keras(> = 2.1.1)是否使用GPU:

from keras import backend as K
K.tensorflow_backend._get_available_gpus()

祝一切顺利。

Yes you can run keras models on GPU. Few things you will have to check first.

  1. your system has GPU (Nvidia. As AMD doesn’t work yet)
  2. You have installed the GPU version of tensorflow
  3. You have installed CUDA installation instructions
  4. Verify that tensorflow is running with GPU check if GPU is working

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

OR

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

output will be something like this:

[
  name: "/cpu:0"device_type: "CPU",
  name: "/gpu:0"device_type: "GPU"
]

Once all this is done your model will run on GPU:

To Check if keras(>=2.1.1) is using GPU:

from keras import backend as K
K.tensorflow_backend._get_available_gpus()

All the best.


回答 1

当然。我想您已经安装了TensorFlow for GPU。

导入keras后,需要添加以下块。我正在使用具有56核心cpu和gpu的计算机。

import keras
import tensorflow as tf


config = tf.ConfigProto( device_count = {'GPU': 1 , 'CPU': 56} ) 
sess = tf.Session(config=config) 
keras.backend.set_session(sess)

当然,这种用法会强制执行我的计算机的最大限制。您可以减少cpu和gpu消耗值。

Sure. I suppose that you have already installed TensorFlow for GPU.

You need to add the following block after importing keras. I am working on a machine which have 56 core cpu, and a gpu.

import keras
import tensorflow as tf


config = tf.ConfigProto( device_count = {'GPU': 1 , 'CPU': 56} ) 
sess = tf.Session(config=config) 
keras.backend.set_session(sess)

Of course, this usage enforces my machines maximum limits. You can decrease cpu and gpu consumption values.


回答 2

2.0兼容答案:虽然上面提到的答案详细说明了如何在Keras Model上使用GPU,但我想说明如何实现Tensorflow Version 2.0

要知道有多少个GPU可用,我们可以使用以下代码:

print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

要找出您的操作和张量分配给哪些设备,请将其tf.debugging.set_log_device_placement(True)作为程序的第一条语句。

启用设备放置日志记录将导致打印任何Tensor分配或操作。例如,运行以下代码:

tf.debugging.set_log_device_placement(True)

# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)

给出如下所示的输出:

在设备/ job:localhost / replica:0 / task:0 / device:GPU:0 tf.Tensor([[22. 28.] [49. 64.]],shape =(2,2)中执行op MatMul dtype = float32)

有关更多信息,请参考此链接

2.0 Compatible Answer: While above mentioned answer explain in detail on how to use GPU on Keras Model, I want to explain how it can be done for Tensorflow Version 2.0.

To know how many GPUs are available, we can use the below code:

print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

To find out which devices your operations and tensors are assigned to, put tf.debugging.set_log_device_placement(True) as the first statement of your program.

Enabling device placement logging causes any Tensor allocations or operations to be printed. For example, running the below code:

tf.debugging.set_log_device_placement(True)

# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)

gives the Output shown below:

Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0 tf.Tensor( [[22. 28.] [49. 64.]], shape=(2, 2), dtype=float32)

For more information, refer this link


回答 3

当然。如果您在Tensorflow或CNTk后端上运行,则代码将默认在GPU设备上运行。但是,如果Theano后端,则可以使用以下代码

Theano标志:

“ THEANO_FLAGS = device = gpu,floatX = float32 python my_keras_script.py”

Of course. if you are running on Tensorflow or CNTk backends, your code will run on your GPU devices defaultly.But if Theano backends, you can use following

Theano flags:

“THEANO_FLAGS=device=gpu,floatX=float32 python my_keras_script.py”


回答 4

在任务管理器中查看脚本是否正在运行GPU。如果不是,请怀疑您的CUDA版本是您所使用的tensorflow版本的正确版本,其他答案已经建议了。

此外,需要使用适用于CUDA版本的适当CUDA DNN库,才能使用tensorflow运行GPU。从此处下载/提取它,并将DLL(例如cudnn64_7.dll)放入CUDA bin文件夹(例如C:\ Program Files \ NVIDIA GPU Computing Toolkit \ CUDA \ v10.1 \ bin)。

See if your script is running GPU in Task manager. If not, suspect your CUDA version is right one for the tensorflow version you are using, as the other answers suggested already.

Additionally, a proper CUDA DNN library for the CUDA version is required to run GPU with tensorflow. Download/extract it from here and put the DLL (e.g., cudnn64_7.dll) into CUDA bin folder (e.g., C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin).


Flask SQLAlchemy查询,指定列名

问题:Flask SQLAlchemy查询,指定列名

如何使用模型在查询中指定所需的列(默认情况下会选择所有列)?我知道如何使用sqlalchmey会话:session.query(self.col1),但是如何使用模型呢?我做不到SomeModel.query()。有办法吗?

How do I specify the column that I want in my query using a model (it selects all columns by default)? I know how to do this with the sqlalchmey session: session.query(self.col1), but how do I do it with with models? I can’t do SomeModel.query(). Is there a way?


回答 0

您可以使用该with_entities()方法来限制要返回结果的列。(文件

result = SomeModel.query.with_entities(SomeModel.col1, SomeModel.col2)

根据您的要求,您可能还会发现递延有用。它们使您可以返回完整的对象,但可以限制导线上的列。

You can use the with_entities() method to restrict which columns you’d like to return in the result. (documentation)

result = SomeModel.query.with_entities(SomeModel.col1, SomeModel.col2)

Depending on your requirements, you may also find deferreds useful. They allow you to return the full object but restrict the columns that come over the wire.


回答 1

session.query().with_entities(SomeModel.col1)

是相同的

session.query(SomeModel.col1)

对于别名,我们可以使用.label()

session.query(SomeModel.col1.label('some alias name'))
session.query().with_entities(SomeModel.col1)

is the same as

session.query(SomeModel.col1)

for alias, we can use .label()

session.query(SomeModel.col1.label('some alias name'))

回答 2

您可以使用load_only函数:

from sqlalchemy.orm import load_only

fields = ['name', 'addr', 'phone', 'url']
companies = session.query(SomeModel).options(load_only(*fields)).all()

You can use load_only function:

from sqlalchemy.orm import load_only

fields = ['name', 'addr', 'phone', 'url']
companies = session.query(SomeModel).options(load_only(*fields)).all()

回答 3

您可以使用Model.query,因为Model(通常是它的基类,尤其是在使用声明性扩展的情况下)已分配Sesssion.query_property。在这种情况下,Model.query相当于Session.query(Model)

我不知道修改查询返回的列的方法(除非使用添加更多的方法add_columns())。
因此,最好的选择是使用Session.query(Model.col1, Model.col2, ...)(如Salil所示)。

You can use Model.query, because the Model (or usually its base class, especially in cases where declarative extension is used) is assigned Sesssion.query_property. In this case the Model.query is equivalent to Session.query(Model).

I am not aware of the way to modify the columns returned by the query (except by adding more using add_columns()).
So your best shot is to use the Session.query(Model.col1, Model.col2, ...) (as already shown by Salil).


回答 4

您可以使用Query.values,Query.values

session.query(SomeModel).values('id', 'user')

You can use Query.values, Query.values

session.query(SomeModel).values('id', 'user')


回答 5

这里的一个例子:

movies = Movie.query.filter(Movie.rating != 0).order_by(desc(Movie.rating)).all()

我在数据库中查询评级为<> 0的电影,然后首先按最高评级对它们进行评级。

在这里看看:在Flask-SQLAlchemy中选择,插入,删除

An example here:

movies = Movie.query.filter(Movie.rating != 0).order_by(desc(Movie.rating)).all()

I query the db for movies with rating <> 0, and then I order them by rating with the higest rating first.

Take a look here: Select, Insert, Delete in Flask-SQLAlchemy


如何在Python中获得监视器分辨率?

问题:如何在Python中获得监视器分辨率?

获得监视器分辨率(最好是在元组中)的最简单方法是什么?

What is the simplest way to get monitor resolution (preferably in a tuple)?


回答 0

在Windows上:

from win32api import GetSystemMetrics

print("Width =", GetSystemMetrics(0))
print("Height =", GetSystemMetrics(1))

如果使用高分辨率屏幕,请确保您的python解释器为HIGHDPIAWARE。

根据这篇文章。

On Windows:

from win32api import GetSystemMetrics

print("Width =", GetSystemMetrics(0))
print("Height =", GetSystemMetrics(1))

If you are working with high resolution screen, make sure your python interpreter is HIGHDPIAWARE.

Based on this post.


回答 1

在Windows中,您还可以将ctypes与GetSystemMetrics()以下命令一起使用:

import ctypes
user32 = ctypes.windll.user32
screensize = user32.GetSystemMetrics(0), user32.GetSystemMetrics(1)

这样您就不需要安装pywin32软件包;它不需要Python本身没有的任何东西。

对于多显示器设置,您可以检索虚拟显示器的宽度和高度的组合:

import ctypes
user32 = ctypes.windll.user32
screensize = user32.GetSystemMetrics(78), user32.GetSystemMetrics(79)

In Windows, you can also use ctypes with GetSystemMetrics():

import ctypes
user32 = ctypes.windll.user32
screensize = user32.GetSystemMetrics(0), user32.GetSystemMetrics(1)

so that you don’t need to install the pywin32 package; it doesn’t need anything that doesn’t come with Python itself.

For multi-monitor setups, you can retrieve the combined width and height of the virtual monitor:

import ctypes
user32 = ctypes.windll.user32
screensize = user32.GetSystemMetrics(78), user32.GetSystemMetrics(79)

回答 2

为此,我创建了一个PyPI模块

pip install screeninfo

代码:

from screeninfo import get_monitors
for m in get_monitors():
    print(str(m))

结果:

monitor(1920x1080+1920+0)
monitor(1920x1080+0+0)

它支持多监视器环境。其目标是成为跨平台的平台。目前,它支持Cygwin和X11,但完全欢迎请求请求。

I created a PyPI module for this reason:

pip install screeninfo

The code:

from screeninfo import get_monitors
for m in get_monitors():
    print(str(m))

Result:

monitor(1920x1080+1920+0)
monitor(1920x1080+0+0)

It supports multi monitor environments. Its goal is to be cross platform; for now it supports Cygwin and X11 but pull requests are totally welcome.


回答 3

如果您使用的是wxWindows,则只需执行以下操作:

import wx

app = wx.App(False) # the wx.App object must be created first.    
print(wx.GetDisplaySize())  # returns a tuple

If you’re using wxWindows, you can simply do:

import wx

app = wx.App(False) # the wx.App object must be created first.    
print(wx.GetDisplaySize())  # returns a tuple

回答 4

直接取自这篇文章的答案:如何在Tkinter中获得屏幕尺寸?

import tkinter as tk

root = tk.Tk()

screen_width = root.winfo_screenwidth()
screen_height = root.winfo_screenheight()

Taken directly from an answer to this post: How to get the screen size in Tkinter?

import tkinter as tk

root = tk.Tk()

screen_width = root.winfo_screenwidth()
screen_height = root.winfo_screenheight()

回答 5

在Windows 8.1上,我无法从ctypes或tk中获得正确的分辨率。其他人对于ctypes也有同样的问题:getsystemmetrics返回错误的屏幕尺寸 要在Windows 8.1上获得高DPI监视器的正确全分辨率,必须调用SetProcessDPIAware并使用以下代码:

import ctypes
user32 = ctypes.windll.user32
user32.SetProcessDPIAware()
[w, h] = [user32.GetSystemMetrics(0), user32.GetSystemMetrics(1)]

以下是完整详细信息:

我发现这是因为Windows正在报告缩放的分辨率。看来python默认是“系统dpi感知”应用程序。DPI感知应用程序的类型在此处列出:http : //msdn.microsoft.com/zh-cn/library/windows/desktop/dn469266%28v=vs.85%29.aspx#dpi_and_the_desktop_scaling_factor

基本上,不是将内容显示在整个监视器分辨率上(这会使字体变小),而是按比例放大内容,直到字体足够大为止。

在我的显示器上,我得到:
物理分辨率:2560 x 1440(220 DPI)
报告的python分辨率:1555 x 875(158 DPI)

每个Windows站点:http : //msdn.microsoft.com/zh-cn/library/aa770067%28v=vs.85%29.aspx 报告的系统有效分辨率的公式为:(reported_px * current_dpi)/(96 dpi )= physical_px

我可以使用以下代码获得正确的全屏分辨率和当前的DPI。请注意,我调用SetProcessDPIAware()来允许程序查看实际分辨率。

import tkinter as tk
root = tk.Tk()

width_px = root.winfo_screenwidth()
height_px = root.winfo_screenheight() 
width_mm = root.winfo_screenmmwidth()
height_mm = root.winfo_screenmmheight() 
# 2.54 cm = in
width_in = width_mm / 25.4
height_in = height_mm / 25.4
width_dpi = width_px/width_in
height_dpi = height_px/height_in 

print('Width: %i px, Height: %i px' % (width_px, height_px))
print('Width: %i mm, Height: %i mm' % (width_mm, height_mm))
print('Width: %f in, Height: %f in' % (width_in, height_in))
print('Width: %f dpi, Height: %f dpi' % (width_dpi, height_dpi))

import ctypes
user32 = ctypes.windll.user32
user32.SetProcessDPIAware()
[w, h] = [user32.GetSystemMetrics(0), user32.GetSystemMetrics(1)]
print('Size is %f %f' % (w, h))

curr_dpi = w*96/width_px
print('Current DPI is %f' % (curr_dpi))    

哪个返回:

Width: 1555 px, Height: 875 px
Width: 411 mm, Height: 232 mm
Width: 16.181102 in, Height: 9.133858 in
Width: 96.099757 dpi, Height: 95.797414 dpi
Size is 2560.000000 1440.000000
Current DPI is 158.045016

我正在使用支持220 DPI的监视器运行Windows 8.1。我的显示比例将我的当前DPI设置为158。

我将使用158来确保我的matplotlib图的大小正确:from pylab import rcParams rcParams [‘figure.dpi’] = curr_dpi

On Windows 8.1 I am not getting the correct resolution from either ctypes or tk. Other people are having this same problem for ctypes: getsystemmetrics returns wrong screen size To get the correct full resolution of a high DPI monitor on windows 8.1, one must call SetProcessDPIAware and use the following code:

import ctypes
user32 = ctypes.windll.user32
user32.SetProcessDPIAware()
[w, h] = [user32.GetSystemMetrics(0), user32.GetSystemMetrics(1)]

Full Details Below:

I found out that this is because windows is reporting a scaled resolution. It appears that python is by default a ‘system dpi aware’ application. Types of DPI aware applications are listed here: http://msdn.microsoft.com/en-us/library/windows/desktop/dn469266%28v=vs.85%29.aspx#dpi_and_the_desktop_scaling_factor

Basically, rather than displaying content the full monitor resolution, which would make fonts tiny, the content is scaled up until the fonts are big enough.

On my monitor I get:
Physical resolution: 2560 x 1440 (220 DPI)
Reported python resolution: 1555 x 875 (158 DPI)

Per this windows site: http://msdn.microsoft.com/en-us/library/aa770067%28v=vs.85%29.aspx The formula for reported system effective resolution is: (reported_px*current_dpi)/(96 dpi) = physical_px

I’m able to get the correct full screen resolution, and current DPI with the below code. Note that I call SetProcessDPIAware() to allow the program to see the real resolution.

import tkinter as tk
root = tk.Tk()

width_px = root.winfo_screenwidth()
height_px = root.winfo_screenheight() 
width_mm = root.winfo_screenmmwidth()
height_mm = root.winfo_screenmmheight() 
# 2.54 cm = in
width_in = width_mm / 25.4
height_in = height_mm / 25.4
width_dpi = width_px/width_in
height_dpi = height_px/height_in 

print('Width: %i px, Height: %i px' % (width_px, height_px))
print('Width: %i mm, Height: %i mm' % (width_mm, height_mm))
print('Width: %f in, Height: %f in' % (width_in, height_in))
print('Width: %f dpi, Height: %f dpi' % (width_dpi, height_dpi))

import ctypes
user32 = ctypes.windll.user32
user32.SetProcessDPIAware()
[w, h] = [user32.GetSystemMetrics(0), user32.GetSystemMetrics(1)]
print('Size is %f %f' % (w, h))

curr_dpi = w*96/width_px
print('Current DPI is %f' % (curr_dpi))    

Which returned:

Width: 1555 px, Height: 875 px
Width: 411 mm, Height: 232 mm
Width: 16.181102 in, Height: 9.133858 in
Width: 96.099757 dpi, Height: 95.797414 dpi
Size is 2560.000000 1440.000000
Current DPI is 158.045016

I am running windows 8.1 with a 220 DPI capable monitor. My display scaling sets my current DPI to 158.

I’ll use the 158 to make sure my matplotlib plots are the right size with: from pylab import rcParams rcParams[‘figure.dpi’] = curr_dpi


回答 6

为了完整起见,Mac OS X

import AppKit
[(screen.frame().size.width, screen.frame().size.height)
    for screen in AppKit.NSScreen.screens()]

将为您提供包含所有屏幕尺寸的元组列表(如果存在多个监视器)

And for completeness, Mac OS X

import AppKit
[(screen.frame().size.width, screen.frame().size.height)
    for screen in AppKit.NSScreen.screens()]

will give you a list of tuples containing all screen sizes (if multiple monitors present)


回答 7

如果Qt专门使用工具箱PySide,则可以执行以下操作:

from PySide import QtGui
import sys

app = QtGui.QApplication(sys.argv)
screen_rect = app.desktop().screenGeometry()
width, height = screen_rect.width(), screen_rect.height()

If you are using the Qt toolkit specifically PySide, you can do the following:

from PySide import QtGui
import sys

app = QtGui.QApplication(sys.argv)
screen_rect = app.desktop().screenGeometry()
width, height = screen_rect.width(), screen_rect.height()

回答 8

使用Linux,最简单的方法是执行bash命令

xrandr | grep '*'

并使用regexp解析其输出。

您也可以通过PyGame做到这一点:http ://www.daniweb.com/forums/thread54881.html

Using Linux, the simplest way is to execute bash command

xrandr | grep '*'

and parse its output using regexp.

Also you can do it through PyGame: http://www.daniweb.com/forums/thread54881.html


回答 9

这是一个快速的Python小程序,将显示有关多显示器设置的信息:

import gtk

window = gtk.Window()

# the screen contains all monitors
screen = window.get_screen()
print "screen size: %d x %d" % (gtk.gdk.screen_width(),gtk.gdk.screen_height())

# collect data about each monitor
monitors = []
nmons = screen.get_n_monitors()
print "there are %d monitors" % nmons
for m in range(nmons):
  mg = screen.get_monitor_geometry(m)
  print "monitor %d: %d x %d" % (m,mg.width,mg.height)
  monitors.append(mg)

# current monitor
curmon = screen.get_monitor_at_window(screen.get_active_window())
x, y, width, height = monitors[curmon]
print "monitor %d: %d x %d (current)" % (curmon,width,height)  

这是其输出的示例:

screen size: 5120 x 1200
there are 3 monitors
monitor 0: 1600 x 1200
monitor 1: 1920 x 1200
monitor 2: 1600 x 1200
monitor 1: 1920 x 1200 (current)

Here is a quick little Python program that will display the information about your multi-monitor setup:

import gtk

window = gtk.Window()

# the screen contains all monitors
screen = window.get_screen()
print "screen size: %d x %d" % (gtk.gdk.screen_width(),gtk.gdk.screen_height())

# collect data about each monitor
monitors = []
nmons = screen.get_n_monitors()
print "there are %d monitors" % nmons
for m in range(nmons):
  mg = screen.get_monitor_geometry(m)
  print "monitor %d: %d x %d" % (m,mg.width,mg.height)
  monitors.append(mg)

# current monitor
curmon = screen.get_monitor_at_window(screen.get_active_window())
x, y, width, height = monitors[curmon]
print "monitor %d: %d x %d (current)" % (curmon,width,height)  

Here’s an example of its output:

screen size: 5120 x 1200
there are 3 monitors
monitor 0: 1600 x 1200
monitor 1: 1920 x 1200
monitor 2: 1600 x 1200
monitor 1: 1920 x 1200 (current)

回答 10

我在以下项目之一中使用get_screen_resolution方法,该方法基本上是一个导入链。您可以根据需要进行修改,方法是删除不需要的部分,并在链中向上移动更可能的端口。

PYTHON_V3 = sys.version_info >= (3,0,0) and sys.version_info < (4,0,0):
#[...]
    def get_screen_resolution(self, measurement="px"):
        """
        Tries to detect the screen resolution from the system.
        @param measurement: The measurement to describe the screen resolution in. Can be either 'px', 'inch' or 'mm'. 
        @return: (screen_width,screen_height) where screen_width and screen_height are int types according to measurement.
        """
        mm_per_inch = 25.4
        px_per_inch =  72.0 #most common
        try: # Platforms supported by GTK3, Fx Linux/BSD
            from gi.repository import Gdk 
            screen = Gdk.Screen.get_default()
            if measurement=="px":
                width = screen.get_width()
                height = screen.get_height()
            elif measurement=="inch":
                width = screen.get_width_mm()/mm_per_inch
                height = screen.get_height_mm()/mm_per_inch
            elif measurement=="mm":
                width = screen.get_width_mm()
                height = screen.get_height_mm()
            else:
                raise NotImplementedError("Handling %s is not implemented." % measurement)
            return (width,height)
        except:
            try: #Probably the most OS independent way
                if PYTHON_V3: 
                    import tkinter 
                else:
                    import Tkinter as tkinter
                root = tkinter.Tk()
                if measurement=="px":
                    width = root.winfo_screenwidth()
                    height = root.winfo_screenheight()
                elif measurement=="inch":
                    width = root.winfo_screenmmwidth()/mm_per_inch
                    height = root.winfo_screenmmheight()/mm_per_inch
                elif measurement=="mm":
                    width = root.winfo_screenmmwidth()
                    height = root.winfo_screenmmheight()
                else:
                    raise NotImplementedError("Handling %s is not implemented." % measurement)
                return (width,height)
            except:
                try: #Windows only
                    from win32api import GetSystemMetrics 
                    width_px = GetSystemMetrics (0)
                    height_px = GetSystemMetrics (1)
                    if measurement=="px":
                        return (width_px,height_px)
                    elif measurement=="inch":
                        return (width_px/px_per_inch,height_px/px_per_inch)
                    elif measurement=="mm":
                        return (width_px/mm_per_inch,height_px/mm_per_inch)
                    else:
                        raise NotImplementedError("Handling %s is not implemented." % measurement)
                except:
                    try: # Windows only
                        import ctypes
                        user32 = ctypes.windll.user32
                        width_px = user32.GetSystemMetrics(0)
                        height_px = user32.GetSystemMetrics(1)
                        if measurement=="px":
                            return (width_px,height_px)
                        elif measurement=="inch":
                            return (width_px/px_per_inch,height_px/px_per_inch)
                        elif measurement=="mm":
                            return (width_px/mm_per_inch,height_px/mm_per_inch)
                        else:
                            raise NotImplementedError("Handling %s is not implemented." % measurement)
                    except:
                        try: # Mac OS X only
                            import AppKit 
                            for screen in AppKit.NSScreen.screens():
                                width_px = screen.frame().size.width
                                height_px = screen.frame().size.height
                                if measurement=="px":
                                    return (width_px,height_px)
                                elif measurement=="inch":
                                    return (width_px/px_per_inch,height_px/px_per_inch)
                                elif measurement=="mm":
                                    return (width_px/mm_per_inch,height_px/mm_per_inch)
                                else:
                                    raise NotImplementedError("Handling %s is not implemented." % measurement)
                        except: 
                            try: # Linux/Unix
                                import Xlib.display
                                resolution = Xlib.display.Display().screen().root.get_geometry()
                                width_px = resolution.width
                                height_px = resolution.height
                                if measurement=="px":
                                    return (width_px,height_px)
                                elif measurement=="inch":
                                    return (width_px/px_per_inch,height_px/px_per_inch)
                                elif measurement=="mm":
                                    return (width_px/mm_per_inch,height_px/mm_per_inch)
                                else:
                                    raise NotImplementedError("Handling %s is not implemented." % measurement)
                            except:
                                try: # Linux/Unix
                                    if not self.is_in_path("xrandr"):
                                        raise ImportError("Cannot read the output of xrandr, if any.")
                                    else:
                                        args = ["xrandr", "-q", "-d", ":0"]
                                        proc = subprocess.Popen(args,stdout=subprocess.PIPE)
                                        for line in iter(proc.stdout.readline,''):
                                            if isinstance(line, bytes):
                                                line = line.decode("utf-8")
                                            if "Screen" in line:
                                                width_px = int(line.split()[7])
                                                height_px = int(line.split()[9][:-1])
                                                if measurement=="px":
                                                    return (width_px,height_px)
                                                elif measurement=="inch":
                                                    return (width_px/px_per_inch,height_px/px_per_inch)
                                                elif measurement=="mm":
                                                    return (width_px/mm_per_inch,height_px/mm_per_inch)
                                                else:
                                                    raise NotImplementedError("Handling %s is not implemented." % measurement)
                                except:
                                    # Failover
                                    screensize = 1366, 768
                                    sys.stderr.write("WARNING: Failed to detect screen size. Falling back to %sx%s" % screensize)
                                    if measurement=="px":
                                        return screensize
                                    elif measurement=="inch":
                                        return (screensize[0]/px_per_inch,screensize[1]/px_per_inch)
                                    elif measurement=="mm":
                                        return (screensize[0]/mm_per_inch,screensize[1]/mm_per_inch)
                                    else:
                                        raise NotImplementedError("Handling %s is not implemented." % measurement)

I am using a get_screen_resolution method in one of my projects like the one below, which is basically an import chain. You can modify this according to Your needs by removing those parts that are not needed and move more likely ports upwards in the chain.

PYTHON_V3 = sys.version_info >= (3,0,0) and sys.version_info < (4,0,0):
#[...]
    def get_screen_resolution(self, measurement="px"):
        """
        Tries to detect the screen resolution from the system.
        @param measurement: The measurement to describe the screen resolution in. Can be either 'px', 'inch' or 'mm'. 
        @return: (screen_width,screen_height) where screen_width and screen_height are int types according to measurement.
        """
        mm_per_inch = 25.4
        px_per_inch =  72.0 #most common
        try: # Platforms supported by GTK3, Fx Linux/BSD
            from gi.repository import Gdk 
            screen = Gdk.Screen.get_default()
            if measurement=="px":
                width = screen.get_width()
                height = screen.get_height()
            elif measurement=="inch":
                width = screen.get_width_mm()/mm_per_inch
                height = screen.get_height_mm()/mm_per_inch
            elif measurement=="mm":
                width = screen.get_width_mm()
                height = screen.get_height_mm()
            else:
                raise NotImplementedError("Handling %s is not implemented." % measurement)
            return (width,height)
        except:
            try: #Probably the most OS independent way
                if PYTHON_V3: 
                    import tkinter 
                else:
                    import Tkinter as tkinter
                root = tkinter.Tk()
                if measurement=="px":
                    width = root.winfo_screenwidth()
                    height = root.winfo_screenheight()
                elif measurement=="inch":
                    width = root.winfo_screenmmwidth()/mm_per_inch
                    height = root.winfo_screenmmheight()/mm_per_inch
                elif measurement=="mm":
                    width = root.winfo_screenmmwidth()
                    height = root.winfo_screenmmheight()
                else:
                    raise NotImplementedError("Handling %s is not implemented." % measurement)
                return (width,height)
            except:
                try: #Windows only
                    from win32api import GetSystemMetrics 
                    width_px = GetSystemMetrics (0)
                    height_px = GetSystemMetrics (1)
                    if measurement=="px":
                        return (width_px,height_px)
                    elif measurement=="inch":
                        return (width_px/px_per_inch,height_px/px_per_inch)
                    elif measurement=="mm":
                        return (width_px/mm_per_inch,height_px/mm_per_inch)
                    else:
                        raise NotImplementedError("Handling %s is not implemented." % measurement)
                except:
                    try: # Windows only
                        import ctypes
                        user32 = ctypes.windll.user32
                        width_px = user32.GetSystemMetrics(0)
                        height_px = user32.GetSystemMetrics(1)
                        if measurement=="px":
                            return (width_px,height_px)
                        elif measurement=="inch":
                            return (width_px/px_per_inch,height_px/px_per_inch)
                        elif measurement=="mm":
                            return (width_px/mm_per_inch,height_px/mm_per_inch)
                        else:
                            raise NotImplementedError("Handling %s is not implemented." % measurement)
                    except:
                        try: # Mac OS X only
                            import AppKit 
                            for screen in AppKit.NSScreen.screens():
                                width_px = screen.frame().size.width
                                height_px = screen.frame().size.height
                                if measurement=="px":
                                    return (width_px,height_px)
                                elif measurement=="inch":
                                    return (width_px/px_per_inch,height_px/px_per_inch)
                                elif measurement=="mm":
                                    return (width_px/mm_per_inch,height_px/mm_per_inch)
                                else:
                                    raise NotImplementedError("Handling %s is not implemented." % measurement)
                        except: 
                            try: # Linux/Unix
                                import Xlib.display
                                resolution = Xlib.display.Display().screen().root.get_geometry()
                                width_px = resolution.width
                                height_px = resolution.height
                                if measurement=="px":
                                    return (width_px,height_px)
                                elif measurement=="inch":
                                    return (width_px/px_per_inch,height_px/px_per_inch)
                                elif measurement=="mm":
                                    return (width_px/mm_per_inch,height_px/mm_per_inch)
                                else:
                                    raise NotImplementedError("Handling %s is not implemented." % measurement)
                            except:
                                try: # Linux/Unix
                                    if not self.is_in_path("xrandr"):
                                        raise ImportError("Cannot read the output of xrandr, if any.")
                                    else:
                                        args = ["xrandr", "-q", "-d", ":0"]
                                        proc = subprocess.Popen(args,stdout=subprocess.PIPE)
                                        for line in iter(proc.stdout.readline,''):
                                            if isinstance(line, bytes):
                                                line = line.decode("utf-8")
                                            if "Screen" in line:
                                                width_px = int(line.split()[7])
                                                height_px = int(line.split()[9][:-1])
                                                if measurement=="px":
                                                    return (width_px,height_px)
                                                elif measurement=="inch":
                                                    return (width_px/px_per_inch,height_px/px_per_inch)
                                                elif measurement=="mm":
                                                    return (width_px/mm_per_inch,height_px/mm_per_inch)
                                                else:
                                                    raise NotImplementedError("Handling %s is not implemented." % measurement)
                                except:
                                    # Failover
                                    screensize = 1366, 768
                                    sys.stderr.write("WARNING: Failed to detect screen size. Falling back to %sx%s" % screensize)
                                    if measurement=="px":
                                        return screensize
                                    elif measurement=="inch":
                                        return (screensize[0]/px_per_inch,screensize[1]/px_per_inch)
                                    elif measurement=="mm":
                                        return (screensize[0]/mm_per_inch,screensize[1]/mm_per_inch)
                                    else:
                                        raise NotImplementedError("Handling %s is not implemented." % measurement)

回答 11

旧问题,但这是缺失的。我是python的新手,所以请告诉我这是否是“不好的”解决方案。此解决方案仅在Windows和MacOS上受支持,并且仅在主屏幕上有效-但问题中未提及os。

通过截屏来测量尺寸。由于屏幕尺寸不应更改,因此只能执行一次。如果您安装了GUI工具包(例如GTK,wx,…),则可以使用更优雅的解决方案。

枕头

pip install Pillow

from PIL import ImageGrab

img = ImageGrab.grab()
print (img.size)

Old question but this is missing. I’m new to python so please tell me if this is a “bad” solution. This solution is supported for Windows and MacOS only and it works just for the main screen – but the os is not mentioned in the question.

Measure the size by taking a screenshot. As the screensize should not change this has to be done only once. There are more elegant solutions if you have a gui toolkit like GTK, wx, … installed.

see Pillow

pip install Pillow

from PIL import ImageGrab

img = ImageGrab.grab()
print (img.size)

回答 12

XWindows版本:

#!/usr/bin/python

import Xlib
import Xlib.display

resolution = Xlib.display.Display().screen().root.get_geometry()
print str(resolution.width) + "x" + str(resolution.height)

XWindows version:

#!/usr/bin/python

import Xlib
import Xlib.display

resolution = Xlib.display.Display().screen().root.get_geometry()
print str(resolution.width) + "x" + str(resolution.height)

回答 13

扩展@ user2366975的答案,以使用Tkinter(Python 2/3中的代码)在多屏幕设置中获取当前屏幕大小:

try:
    # for Python 3
    import tkinter as tk
except ImportError:
    # for Python 2
    import Tkinter as tk


def get_curr_screen_geometry():
    """
    Workaround to get the size of the current screen in a multi-screen setup.

    Returns:
        geometry (str): The standard Tk geometry string.
            [width]x[height]+[left]+[top]
    """
    root = tk.Tk()
    root.update_idletasks()
    root.attributes('-fullscreen', True)
    root.state('iconic')
    geometry = root.winfo_geometry()
    root.destroy()
    return geometry

(应该可以跨平台工作,仅在Linux上进行测试)

Expanding on @user2366975‘s answer, to get the current screen size in a multi-screen setup using Tkinter (code in Python 2/3):

try:
    # for Python 3
    import tkinter as tk
except ImportError:
    # for Python 2
    import Tkinter as tk


def get_curr_screen_geometry():
    """
    Workaround to get the size of the current screen in a multi-screen setup.

    Returns:
        geometry (str): The standard Tk geometry string.
            [width]x[height]+[left]+[top]
    """
    root = tk.Tk()
    root.update_idletasks()
    root.attributes('-fullscreen', True)
    root.state('iconic')
    geometry = root.winfo_geometry()
    root.destroy()
    return geometry

(Should work cross-platform, tested on Linux only)


回答 14

尝试以下代码:

import subprocess
resuls = subprocess.Popen(['xrandr'],stdout=subprocess.PIPE).communicate()[0].split("current")[1].split(",")[0]
width = resuls.split("x")[0].strip()
heigth = resuls.split("x")[1].strip()
print width + "x" + heigth

Try the following code:

import subprocess
resuls = subprocess.Popen(['xrandr'],stdout=subprocess.PIPE).communicate()[0].split("current")[1].split(",")[0]
width = resuls.split("x")[0].strip()
heigth = resuls.split("x")[1].strip()
print width + "x" + heigth

回答 15

如果您已安装PyQt4,请尝试以下代码:

from PyQt4 import QtGui
import sys

MyApp = QtGui.QApplication(sys.argv)
V = MyApp.desktop().screenGeometry()
h = V.height()
w = V.width()
print("The screen resolution (width X height) is the following:")
print(str(w) + "X" + str(h))

对于PyQt5,以下操作将起作用:

from PyQt5 import QtWidgets
import sys

MyApp = QtWidgets.QApplication(sys.argv)
V = MyApp.desktop().screenGeometry()
h = V.height()
w = V.width()
print("The screen resolution (width X height) is the following:")
print(str(w) + "X" + str(h))

In case you have PyQt4 installed, try the following code:

from PyQt4 import QtGui
import sys

MyApp = QtGui.QApplication(sys.argv)
V = MyApp.desktop().screenGeometry()
h = V.height()
w = V.width()
print("The screen resolution (width X height) is the following:")
print(str(w) + "X" + str(h))

For PyQt5, the following will work:

from PyQt5 import QtWidgets
import sys

MyApp = QtWidgets.QApplication(sys.argv)
V = MyApp.desktop().screenGeometry()
h = V.height()
w = V.width()
print("The screen resolution (width X height) is the following:")
print(str(w) + "X" + str(h))

回答 16

一个跨平台的简单方法是使用几乎所有python版本都随附的TKinter,因此您无需安装任何东西:

import tkinter
root = tkinter.Tk()
root.withdraw()
WIDTH, HEIGHT = root.winfo_screenwidth(), root.winfo_screenheight()

A cross platform and easy way to do this is by using TKinter that comes with nearly all the python versions so you don’t have to install anything:

import tkinter
root = tkinter.Tk()
root.withdraw()
WIDTH, HEIGHT = root.winfo_screenwidth(), root.winfo_screenheight()

回答 17

使用Linux使用regexp代替第一行,并取出当前的分辨率值。

显示器当前分辨率:0

>>> screen = os.popen("xrandr -q -d :0").readlines()[0]
>>> print screen
Screen 0: minimum 320 x 200, current 1920 x 1080, maximum 1920 x 1920
>>> width = screen.split()[7]
>>> print width
1920
>>> height = screen.split()[9][:-1]
>>> print height
1080
>>> print "Current resolution is %s x %s" % (width,height)
Current resolution is 1920 x 1080

这是在xrandr 1.3.5上完成的,我不知道其他版本的输出是否不同,但这应该可以很容易地弄清楚。

Using Linux Instead of regexp take the first line and take out the current resolution values.

Current resolution of display :0

>>> screen = os.popen("xrandr -q -d :0").readlines()[0]
>>> print screen
Screen 0: minimum 320 x 200, current 1920 x 1080, maximum 1920 x 1920
>>> width = screen.split()[7]
>>> print width
1920
>>> height = screen.split()[9][:-1]
>>> print height
1080
>>> print "Current resolution is %s x %s" % (width,height)
Current resolution is 1920 x 1080

This was done on xrandr 1.3.5, I don’t know if the output is different on other versions, but this should make it easy to figure out.


回答 18

要获得每个像素的位数:

import ctypes
user32 = ctypes.windll.user32
gdi32 = ctypes.windll.gdi32

screensize = (user32.GetSystemMetrics(0), user32.GetSystemMetrics(1))
print "screensize =%s"%(str(screensize))
dc = user32.GetDC(None);

screensize = (gdi32.GetDeviceCaps(dc,8), gdi32.GetDeviceCaps(dc,10), gdi32.GetDeviceCaps(dc,12))
print "screensize =%s"%(str(screensize))
screensize = (gdi32.GetDeviceCaps(dc,118), gdi32.GetDeviceCaps(dc,117), gdi32.GetDeviceCaps(dc,12))
print "screensize =%s"%(str(screensize))

gdi32中的参数:

#/// Vertical height of entire desktop in pixels
#DESKTOPVERTRES = 117,
#/// Horizontal width of entire desktop in pixels
#DESKTOPHORZRES = 118,
#/// Horizontal width in pixels
#HORZRES = 8,
#/// Vertical height in pixels
#VERTRES = 10,
#/// Number of bits per pixel
#BITSPIXEL = 12,

To get bits per pixel:

import ctypes
user32 = ctypes.windll.user32
gdi32 = ctypes.windll.gdi32

screensize = (user32.GetSystemMetrics(0), user32.GetSystemMetrics(1))
print "screensize =%s"%(str(screensize))
dc = user32.GetDC(None);

screensize = (gdi32.GetDeviceCaps(dc,8), gdi32.GetDeviceCaps(dc,10), gdi32.GetDeviceCaps(dc,12))
print "screensize =%s"%(str(screensize))
screensize = (gdi32.GetDeviceCaps(dc,118), gdi32.GetDeviceCaps(dc,117), gdi32.GetDeviceCaps(dc,12))
print "screensize =%s"%(str(screensize))

parameters in gdi32:

#/// Vertical height of entire desktop in pixels
#DESKTOPVERTRES = 117,
#/// Horizontal width of entire desktop in pixels
#DESKTOPHORZRES = 118,
#/// Horizontal width in pixels
#HORZRES = 8,
#/// Vertical height in pixels
#VERTRES = 10,
#/// Number of bits per pixel
#BITSPIXEL = 12,

回答 19

尝试pyautogui:

import pyautogui
resolution = pyautogui.size()
print(resolution) 

Try pyautogui:

import pyautogui
resolution = pyautogui.size()
print(resolution) 

回答 20

使用的另一个版本xrandr

import re
from subprocess import run, PIPE

output = run(['xrandr'], stdout=PIPE).stdout.decode()
result = re.search(r'current (\d+) x (\d+)', output)
width, height = map(int, result.groups()) if result else (800, 600)

Another version using xrandr:

import re
from subprocess import run, PIPE

output = run(['xrandr'], stdout=PIPE).stdout.decode()
result = re.search(r'current (\d+) x (\d+)', output)
width, height = map(int, result.groups()) if result else (800, 600)

回答 21

使用pygame

import pygame
pygame.init()
infos = pygame.display.Info()
screen_size = (infos.current_w, infos.current_h)

[1]

但是,如果尝试将窗口设置为屏幕大小,则可能只想执行以下操作:

pygame.display.set_mode((0,0),pygame.FULLSCREEN)

将您的显示设置为全屏模式。[2]

Using pygame:

import pygame
pygame.init()
infos = pygame.display.Info()
screen_size = (infos.current_w, infos.current_h)

[1]

However, if you’re trying to set your window to the size of the screen, you might just want to do:

pygame.display.set_mode((0,0),pygame.FULLSCREEN)

to set your display to fullscreen mode. [2]


回答 22

您可以使用PyMouse。要获取屏幕尺寸,只需使用screen_size()属性:

from pymouse import PyMouse
m = PyMouse()
a = m.screen_size()

a将返回一个元组,(X, Y)其中X水平位置和Y垂直位置。

链接到文档中的功能。

You could use PyMouse. To get the screen size just use the screen_size() attribute:

from pymouse import PyMouse
m = PyMouse()
a = m.screen_size()

a will return a tuple, (X, Y), where X is the horizontal position and Y is the vertical position.

Link to function in documentation.


回答 23

如果您使用的是Windows操作系统,则可以使用OS模块来获取它:

import os
cmd = 'wmic desktopmonitor get screenheight, screenwidth'
size_tuple = tuple(map(int,os.popen(cmd).read().split()[-2::]))

它将返回一个元组(Y,X),其中Y是垂直大小,X是水平大小。此代码适用于Python 2和Python 3

If you are working on Windows OS, you can use OS module to get it:

import os
cmd = 'wmic desktopmonitor get screenheight, screenwidth'
size_tuple = tuple(map(int,os.popen(cmd).read().split()[-2::]))

It will return a tuple (Y,X) where Y is the vertical size and X is the horizontal size. This code works on Python 2 and Python 3


回答 24

在Linux上,我们可以使用子流程模块

import subprocess
cmd = ['xrandr']
cmd2 = ['grep', '*']
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
p2 = subprocess.Popen(cmd2, stdin=p.stdout, stdout=subprocess.PIPE)
p.stdout.close()

resolution_string, junk = p2.communicate()
resolution = resolution_string.split()[0]
resolution = resolution.decode("utf-8") 
width = int(resolution.split("x")[0].strip())
heigth = int(resolution.split("x")[1].strip())

On Linux we can use subprocess module

import subprocess
cmd = ['xrandr']
cmd2 = ['grep', '*']
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
p2 = subprocess.Popen(cmd2, stdin=p.stdout, stdout=subprocess.PIPE)
p.stdout.close()

resolution_string, junk = p2.communicate()
resolution = resolution_string.split()[0]
resolution = resolution.decode("utf-8") 
width = int(resolution.split("x")[0].strip())
heigth = int(resolution.split("x")[1].strip())

回答 25

对于视网膜屏幕来说有点麻烦,我使用tkinter来获取假尺寸,使用Pilllow抓取来获取实际尺寸:

import tkinter
root = tkinter.Tk()
resolution_width = root.winfo_screenwidth()
resolution_height = root.winfo_screenheight()
image = ImageGrab.grab()
real_width, real_height = image.width, image.height
ratio_width = real_width / resolution_width
ratio_height = real_height/ resolution_height

It’s a little troublesome for retina screen, i use tkinter to get the fake size, use pilllow grab to get real size :

import tkinter
root = tkinter.Tk()
resolution_width = root.winfo_screenwidth()
resolution_height = root.winfo_screenheight()
image = ImageGrab.grab()
real_width, real_height = image.width, image.height
ratio_width = real_width / resolution_width
ratio_height = real_height/ resolution_height

回答 26

对于更高版本的PyGtk:

import gi
gi.require_version("Gdk", "3.0")
from gi.repository import Gdk

display = Gdk.Display.get_default()
n_monitors = display.get_n_monitors()
print("there are %d monitors" % n_monitors)
for m in range(n_monitors):
  monitor = display.get_monitor(m)
  geometry = monitor.get_geometry()
  print("monitor %d: %d x %d" % (m, geometry.width, geometry.height))

For later versions of PyGtk:

import gi
gi.require_version("Gdk", "3.0")
from gi.repository import Gdk

display = Gdk.Display.get_default()
n_monitors = display.get_n_monitors()
print("there are %d monitors" % n_monitors)
for m in range(n_monitors):
  monitor = display.get_monitor(m)
  geometry = monitor.get_geometry()
  print("monitor %d: %d x %d" % (m, geometry.width, geometry.height))

回答 27

对于Linux,您可以使用以下命令:

import gi
gi.require_version("Gdk", "3.0")
from gi.repository import Gdk

s = Gdk.Screen.get_default()
screen_width = s.get_width()
screen_height = s.get_height()
print(screen_width)
print(screen_height)

For Linux, you can use this:

import gi
gi.require_version("Gdk", "3.0")
from gi.repository import Gdk

s = Gdk.Screen.get_default()
screen_width = s.get_width()
screen_height = s.get_height()
print(screen_width)
print(screen_height)

回答 28

使用pynput库的实用程序脚本。在此处发布参考:

 
from pynput.mouse import Controller as MouseController

def get_screen_size():
    """Utility function to get screen resolution"""

    mouse = MouseController()

    width = height = 0

    def _reset_mouse_position():
        # Move the mouse to the top left of 
        # the screen
        mouse.position = (0, 0)

    # Reset mouse position
    _reset_mouse_position()

    count = 0
    while 1:
        count += 1
        mouse.move(count, 0)
        
        # Get the current position of the mouse
        left = mouse.position[0]

        # If the left doesn't change anymore, then
        # that's the screen resolution's width
        if width == left:
            # Add the last pixel
            width += 1

            # Reset count for use for height
            count = 0
            break

        # On each iteration, assign the left to 
        # the width
        width = left
    
    # Reset mouse position
    _reset_mouse_position()

    while 1:
        count += 1
        mouse.move(0, count)

        # Get the current position of the mouse
        right = mouse.position[1]

        # If the right doesn't change anymore, then
        # that's the screen resolution's height
        if height == right:
            # Add the last pixel
            height += 1
            break

        # On each iteration, assign the right to 
        # the height
        height = right

    return width, height

>>> get_screen_size()
(1920, 1080)

Utility script using pynput library. Posting here for ref.:

 
from pynput.mouse import Controller as MouseController

def get_screen_size():
    """Utility function to get screen resolution"""

    mouse = MouseController()

    width = height = 0

    def _reset_mouse_position():
        # Move the mouse to the top left of 
        # the screen
        mouse.position = (0, 0)

    # Reset mouse position
    _reset_mouse_position()

    count = 0
    while 1:
        count += 1
        mouse.move(count, 0)
        
        # Get the current position of the mouse
        left = mouse.position[0]

        # If the left doesn't change anymore, then
        # that's the screen resolution's width
        if width == left:
            # Add the last pixel
            width += 1

            # Reset count for use for height
            count = 0
            break

        # On each iteration, assign the left to 
        # the width
        width = left
    
    # Reset mouse position
    _reset_mouse_position()

    while 1:
        count += 1
        mouse.move(0, count)

        # Get the current position of the mouse
        right = mouse.position[1]

        # If the right doesn't change anymore, then
        # that's the screen resolution's height
        if height == right:
            # Add the last pixel
            height += 1
            break

        # On each iteration, assign the right to 
        # the height
        height = right

    return width, height

>>> get_screen_size()
(1920, 1080)

如何估算熊猫的DataFrame需要多少内存?

问题:如何估算熊猫的DataFrame需要多少内存?

我一直在想…如果我正在将400MB的csv文件读入熊猫数据帧(使用read_csv或read_table),是否有任何方法可以估算出这将需要多少内存?只是试图更好地了解数据帧和内存…

I have been wondering… If I am reading, say, a 400MB csv file into a pandas dataframe (using read_csv or read_table), is there any way to guesstimate how much memory this will need? Just trying to get a better feel of data frames and memory…


回答 0

df.memory_usage() 将返回每列占用多少:

>>> df.memory_usage()

Row_ID            20906600
Household_ID      20906600
Vehicle           20906600
Calendar_Year     20906600
Model_Year        20906600
...

要包含索引,请传递index=True

因此,要获得整体内存消耗:

>>> df.memory_usage(index=True).sum()
731731000

此外,传递deep=True将启用更准确的内存使用情况报告,该报告说明了所包含对象的全部使用情况。

这是因为内存使用量不包括非数组元素if占用的内存deep=False(默认情况下)。

df.memory_usage() will return how many bytes each column occupies:

>>> df.memory_usage()

Row_ID            20906600
Household_ID      20906600
Vehicle           20906600
Calendar_Year     20906600
Model_Year        20906600
...

To include indexes, pass index=True.

So to get overall memory consumption:

>>> df.memory_usage(index=True).sum()
731731000

Also, passing deep=True will enable a more accurate memory usage report, that accounts for the full usage of the contained objects.

This is because memory usage does not include memory consumed by elements that are not components of the array if deep=False (default case).


回答 1

这是不同方法的比较- sys.getsizeof(df)最简单。

对于此示例,df是一个具有814行,11列(2个整数,9个对象)的数据帧-从427kb shapefile中读取

sys.getsizeof(df)

>>>导入系统
>>> sys.getsizeof(df)
(给出的结果以字节为单位)
462456

df.memory_usage()

>>> df.memory_usage()
...
(以8字节/行列出每一列)

>>> df.memory_usage()。sum()
71712
(大约行*列* 8字节)

>>> df.memory_usage(deep = True)
(列出每列的全部内存使用情况)

>>> df.memory_usage(deep = True).sum()
(给出的结果以字节为单位)
462432

df.info()

将数据框信息打印到标准输出。从技术上讲,它们是千字节(KiB),而不是千字节-正如文档字符串所说,“内存使用情况以人类可读的单位(以2为基数的表示形式)显示”。因此,要获取字节将乘以1024,例如451.6 KiB = 462,438字节。

>>> df.info()
...
内存使用量:70.0+ KB

>>> df.info(memory_usage ='deep')
...
内存使用量:451.6 KB

Here’s a comparison of the different methods – sys.getsizeof(df) is simplest.

For this example, df is a dataframe with 814 rows, 11 columns (2 ints, 9 objects) – read from a 427kb shapefile

sys.getsizeof(df)

>>> import sys
>>> sys.getsizeof(df)
(gives results in bytes)
462456

df.memory_usage()

>>> df.memory_usage()
...
(lists each column at 8 bytes/row)

>>> df.memory_usage().sum()
71712
(roughly rows * cols * 8 bytes)

>>> df.memory_usage(deep=True)
(lists each column's full memory usage)

>>> df.memory_usage(deep=True).sum()
(gives results in bytes)
462432

df.info()

Prints dataframe info to stdout. Technically these are kibibytes (KiB), not kilobytes – as the docstring says, “Memory usage is shown in human-readable units (base-2 representation).” So to get bytes would multiply by 1024, e.g. 451.6 KiB = 462,438 bytes.

>>> df.info()
...
memory usage: 70.0+ KB

>>> df.info(memory_usage='deep')
...
memory usage: 451.6 KB

回答 2

我想我可以带一些更多的数据来讨论。

我对此问题进行了一系列测试。

通过使用python resource包,我得到了进程的内存使用情况。

通过将csv写入StringIO缓冲区,我可以轻松地以字节为单位测量它的大小。

我进行了两个实验,每个实验创建20个数据框,这些数据框的大小在10,000行和1,000,000行之间递增。两者都有10列。

在第一个实验中,我仅在数据集中使用浮点数。

与csv文件相比,这是内存随行数变化的方式。(以兆字节为单位)

第二个实验我采用了相同的方法,但是数据集中的数据仅包含短字符串。

似乎csv的大小与数据帧的大小之间的关系可以相差很多,但是内存中的大小将始终以2-3的倍数增大(对于本实验中的帧大小)

我希望通过更多实验来完成此答案,如果您想让我尝试一些特别的事情,请发表评论。

I thought I would bring some more data to the discussion.

I ran a series of tests on this issue.

By using the python resource package I got the memory usage of my process.

And by writing the csv into a StringIO buffer, I could easily measure the size of it in bytes.

I ran two experiments, each one creating 20 dataframes of increasing sizes between 10,000 lines and 1,000,000 lines. Both having 10 columns.

In the first experiment I used only floats in my dataset.

This is how the memory increased in comparison to the csv file as a function of the number of lines. (Size in Megabytes)

The second experiment I had the same approach, but the data in the dataset consisted of only short strings.

It seems that the relation of the size of the csv and the size of the dataframe can vary quite a lot, but the size in memory will always be bigger by a factor of 2-3 (for the frame sizes in this experiment)

I would love to complete this answer with more experiments, please comment if you want me to try something special.


回答 3

您必须反向执行此操作。

In [4]: DataFrame(randn(1000000,20)).to_csv('test.csv')

In [5]: !ls -ltr test.csv
-rw-rw-r-- 1 users 399508276 Aug  6 16:55 test.csv

从技术上讲,内存与此有关(包括索引)

In [16]: df.values.nbytes + df.index.nbytes + df.columns.nbytes
Out[16]: 168000160

内存为168MB,文件大小为400MB,1M行包含20个浮点数

DataFrame(randn(1000000,20)).to_hdf('test.h5','df')

!ls -ltr test.h5
-rw-rw-r-- 1 users 168073944 Aug  6 16:57 test.h5

作为二进制HDF5文件写入时,更加紧凑

In [12]: DataFrame(randn(1000000,20)).to_hdf('test.h5','df',complevel=9,complib='blosc')

In [13]: !ls -ltr test.h5
-rw-rw-r-- 1 users 154727012 Aug  6 16:58 test.h5

数据是随机的,因此压缩没有太大帮助

You have to do this in reverse.

In [4]: DataFrame(randn(1000000,20)).to_csv('test.csv')

In [5]: !ls -ltr test.csv
-rw-rw-r-- 1 users 399508276 Aug  6 16:55 test.csv

Technically memory is about this (which includes the indexes)

In [16]: df.values.nbytes + df.index.nbytes + df.columns.nbytes
Out[16]: 168000160

So 168MB in memory with a 400MB file, 1M rows of 20 float columns

DataFrame(randn(1000000,20)).to_hdf('test.h5','df')

!ls -ltr test.h5
-rw-rw-r-- 1 users 168073944 Aug  6 16:57 test.h5

MUCH more compact when written as a binary HDF5 file

In [12]: DataFrame(randn(1000000,20)).to_hdf('test.h5','df',complevel=9,complib='blosc')

In [13]: !ls -ltr test.h5
-rw-rw-r-- 1 users 154727012 Aug  6 16:58 test.h5

The data was random, so compression doesn’t help too much


回答 4

如果知道dtype数组的,则可以直接计算存储数据所需的字节数+ Python对象本身的字节数。numpy数组的有用属性是nbytes。您可以DataFrame通过执行以下操作从熊猫数组中获取字节数

nbytes = sum(block.values.nbytes for block in df.blocks.values())

objectdtype数组为每个对象存储8个字节(对象dtype数组存储指向opaque的指针PyObject),因此如果csv中有字符串,则需要考虑read_csv将这些字符串转换为objectdtype数组并相应地调整计算的情况。

编辑:

有关的更多详细信息,请参见numpy标量类型页面object dtype。由于仅存储一个引用,因此您还需要考虑数组中对象的大小。如该页面所述,对象数组在某种程度上类似于Python list对象。

If you know the dtypes of your array then you can directly compute the number of bytes that it will take to store your data + some for the Python objects themselves. A useful attribute of numpy arrays is nbytes. You can get the number of bytes from the arrays in a pandas DataFrame by doing

nbytes = sum(block.values.nbytes for block in df.blocks.values())

object dtype arrays store 8 bytes per object (object dtype arrays store a pointer to an opaque PyObject), so if you have strings in your csv you need to take into account that read_csv will turn those into object dtype arrays and adjust your calculations accordingly.

EDIT:

See the numpy scalar types page for more details on the object dtype. Since only a reference is stored you need to take into account the size of the object in the array as well. As that page says, object arrays are somewhat similar to Python list objects.


回答 5

就在这里。熊猫会将您的数据存储在二维numpy ndarray结构中,并按dtypes将其分组。ndarray基本上是带有小标头的原始C数据数组。因此,您可以通过将dtype其包含的大小乘以数组的大小来估算其大小。

例如:如果您有1000行2 列np.int32和5 np.float64列,则DataFrame将具有np.int32一个2×1000 np.float64数组和一个5×1000 数组,即:

4bytes * 2 * 1000 + 8bytes * 5 * 1000 = 48000字节

Yes there is. Pandas will store your data in 2 dimensional numpy ndarray structures grouping them by dtypes. ndarray is basically a raw C array of data with a small header. So you can estimate it’s size just by multiplying the size of the dtype it contains with the dimensions of the array.

For example: if you have 1000 rows with 2 np.int32 and 5 np.float64 columns, your DataFrame will have one 2×1000 np.int32 array and one 5×1000 np.float64 array which is:

4bytes*2*1000 + 8bytes*5*1000 = 48000 bytes


回答 6

我相信这可以为python中的任何对象提供内存中的大小。需要检查熊猫和numpy的内部

>>> import sys
#assuming the dataframe to be df 
>>> sys.getsizeof(df) 
59542497

This I believe this gives the in-memory size any object in python. Internals need to be checked with regard to pandas and numpy

>>> import sys
#assuming the dataframe to be df 
>>> sys.getsizeof(df) 
59542497

如何在IPython Notebook中打开交互式matplotlib窗口?

问题:如何在IPython Notebook中打开交互式matplotlib窗口?

我正在使用IPython,--pylab=inline有时想快速切换到交互式可缩放的matplotlib GUI来查看图(在终端Python控制台中绘制图时会弹出的图)。我该怎么办?最好不要离开或重新启动笔记本。

IPy笔记本中的内联绘图的问题在于它们的分辨率有限,我无法放大以查看一些较小的部分。使用从终端启动的maptlotlib GUI,我可以选择要放大的图形矩形,并相应地调整轴。我尝试过

from matplotlib import interactive
interactive(True)

interactive(False)

但这什么也没做。我在网上也找不到任何提示。

I am using IPython with --pylab=inline and would sometimes like to quickly switch to the interactive, zoomable matplotlib GUI for viewing plots (the one that pops up when you plot something in a terminal Python console). How could I do that? Preferably without leaving or restarting my notebook.

The problem with inline plots in IPy notebook is that they are of a limited resolution and I can’t zoom into them to see some smaller parts. With the maptlotlib GUI that starts from a terminal, I can select a rectangle of the graph that I want to zoom into and the axes adjust accordingly. I tried experimenting with

from matplotlib import interactive
interactive(True)

and

interactive(False)

but that didn’t do anything. I couldn’t find any hint online either.


回答 0

根据文档,您应该能够像这样来回切换:

In [2]: %matplotlib inline 
In [3]: plot(...)

In [4]: %matplotlib qt  # wx, gtk, osx, tk, empty uses default
In [5]: plot(...) 

然后会弹出一个常规绘图窗口(可能需要在笔记本计算机上重新启动)。

我希望这有帮助。

According to the documentation, you should be able to switch back and forth like this:

In [2]: %matplotlib inline 
In [3]: plot(...)

In [4]: %matplotlib qt  # wx, gtk, osx, tk, empty uses default
In [5]: plot(...) 

and that will pop up a regular plot window (a restart on the notebook may be necessary).

I hope this helps.


回答 1

如果您要做的只是从内联图切换到交互式图,然后再切换回去(以便可以平移/缩放),则最好使用%matplotlib magic。

#interactive plotting in separate window
%matplotlib qt 

然后返回html

#normal charts inside notebooks
%matplotlib inline 

%pylab magic会导入很多其他内容,甚至可能导致冲突。它执行“从pylab导入*”。

您还可以使用新的笔记本后端(在matplotlib 1.4中添加):

#interactive charts inside notebooks, matplotlib 1.4+
%matplotlib notebook 

如果您想在图表中增加交互性,可以查看mpld3bokeh。mpld3很棒,如果您没有大量数据点(例如<5k +),并且您想要使用普通的matplotlib语法,但与%matplotlib notebook相比,则具有更多的交互性。Bokeh可以处理大量数据,但是您需要学习它的语法,因为它是一个单独的库。

你也可以签出pivottablejs(pip installivottablejs)

from pivottablejs import pivot_ui
pivot_ui(df)

不管是多么酷的交互式数据探索,它都完全会破坏可重复性。它发生在我身上,所以一旦我感觉到数据,我就尝试只在早期就使用它,并切换到纯内联matplotlib / seaborn。

If all you want to do is to switch from inline plots to interactive and back (so that you can pan/zoom), it is better to use %matplotlib magic.

#interactive plotting in separate window
%matplotlib qt 

and back to html

#normal charts inside notebooks
%matplotlib inline 

%pylab magic imports a bunch of other things and may even result in a conflict. It does “from pylab import *”.

You also can use new notebook backend (added in matplotlib 1.4):

#interactive charts inside notebooks, matplotlib 1.4+
%matplotlib notebook 

If you want to have more interactivity in your charts, you can look at mpld3 and bokeh. mpld3 is great, if you don’t have ton’s of data points (e.g. <5k+) and you want to use normal matplotlib syntax, but more interactivity, compared to %matplotlib notebook . Bokeh can handle lots of data, but you need to learn it’s syntax as it is a separate library.

Also you can check out pivottablejs (pip install pivottablejs)

from pivottablejs import pivot_ui
pivot_ui(df)

However cool interactive data exploration is, it can totally mess with reproducibility. It has happened to me, so I try to use it only at the very early stage and switch to pure inline matplotlib/seaborn, once I got the feel for the data.


回答 2

从matplotlib 1.4.0开始,现在有一个用于笔记本的交互式后端

%matplotlib notebook

有一些版本的IPython尚未注册该别名,回退是:

%matplotlib nbagg

如果那不起作用,请更新您的IPython。

要玩这个游戏,请转到tmpnb.org

并粘贴

%matplotlib notebook

import pandas as pd
import numpy as np
import matplotlib

from matplotlib import pyplot as plt
import seaborn as sns

ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()

df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index,
                  columns=['A', 'B', 'C', 'D'])
df = df.cumsum()
df.plot(); plt.legend(loc='best')    

进入代码单元(或仅修改现有的python演示笔记本)

Starting with matplotlib 1.4.0 there is now an an interactive backend for use in the notebook

%matplotlib notebook

There are a few version of IPython which do not have that alias registered, the fall back is:

%matplotlib nbagg

If that does not work update you IPython.

To play with this, goto tmpnb.org

and paste

%matplotlib notebook

import pandas as pd
import numpy as np
import matplotlib

from matplotlib import pyplot as plt
import seaborn as sns

ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()

df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index,
                  columns=['A', 'B', 'C', 'D'])
df = df.cumsum()
df.plot(); plt.legend(loc='best')    

into a code cell (or just modify the existing python demo notebook)


回答 3

更好的解决方案可能是图表库。它使您能够使用出色的Highcharts javascript库制作精美的交互式绘图。Highcharts使用HTMLsvg标记,因此您的所有图表实际上都是矢量图像。

一些功能:

  • 您可以下载.png,.jpg和.svg格式的矢量图,因此永远不会遇到分辨率问题
  • 交互式图表(缩放,滑动,将鼠标悬停在点上,…)
  • 在IPython笔记本中可用
  • 使用异步绘图功能可同时探索数百个数据结构。

免责声明:我是图书馆的开发人员

A better solution for your problem might be the Charts library. It enables you to use the excellent Highcharts javascript library to make beautiful and interactive plots. Highcharts uses the HTML svg tag so all your charts are actually vector images.

Some features:

  • Vector plots which you can download in .png, .jpg and .svg formats so you will never run into resolution problems
  • Interactive charts (zoom, slide, hover over points, …)
  • Usable in an IPython notebook
  • Explore hundreds of data structures at the same time using the asynchronous plotting capabilities.

Disclaimer: I’m the developer of the library


回答 4

我在2011年5月28日从www.continuum.io/downloads的Anaconda的“ jupyter QTConsole”中使用ipython。

这是一个使用ipython magic在一个单独的窗口和一个内联绘图模式之间来回切换的示例。

>>> import matplotlib.pyplot as plt

# data to plot
>>> x1 = [x for x in range(20)]

# Show in separate window
>>> %matplotlib
>>> plt.plot(x1)
>>> plt.close() 

# Show in console window
>>> %matplotlib inline
>>> plt.plot(x1)
>>> plt.close() 

# Show in separate window
>>> %matplotlib
>>> plt.plot(x1)
>>> plt.close() 

# Show in console window
>>> %matplotlib inline
>>> plt.plot(x1)
>>> plt.close() 

# Note: the %matplotlib magic above causes:
#      plt.plot(...) 
# to implicitly include a:
#      plt.show()
# after the command.
#
# (Not sure how to turn off this behavior
# so that it matches behavior without using %matplotlib magic...)
# but its ok for interactive work...

I’m using ipython in “jupyter QTConsole” from Anaconda at www.continuum.io/downloads on 5/28/20117.

Here’s an example to flip back and forth between a separate window and an inline plot mode using ipython magic.

>>> import matplotlib.pyplot as plt

# data to plot
>>> x1 = [x for x in range(20)]

# Show in separate window
>>> %matplotlib
>>> plt.plot(x1)
>>> plt.close() 

# Show in console window
>>> %matplotlib inline
>>> plt.plot(x1)
>>> plt.close() 

# Show in separate window
>>> %matplotlib
>>> plt.plot(x1)
>>> plt.close() 

# Show in console window
>>> %matplotlib inline
>>> plt.plot(x1)
>>> plt.close() 

# Note: the %matplotlib magic above causes:
#      plt.plot(...) 
# to implicitly include a:
#      plt.show()
# after the command.
#
# (Not sure how to turn off this behavior
# so that it matches behavior without using %matplotlib magic...)
# but its ok for interactive work...

回答 5

重新启动内核并清除输出(如果不是从新笔记本开始),然后运行

%matplotlib tk

有关更多信息,请转到使用matplotlib进行绘图

Restart kernel and clear output (if not starting with new notebook), then run

%matplotlib tk

For more info go to Plotting with matplotlib


回答 6

您可以使用

%matplotlib qt

如果出现错误,ImportError: Failed to import any qt binding则将PyQt5安装为:pip install PyQt5它对我有用。

You can use

%matplotlib qt

If you got the error ImportError: Failed to import any qt binding then install PyQt5 as: pip install PyQt5 and it works for me.


Python从导入的模块中模拟函数

问题:Python从导入的模块中模拟函数

我想了解如何@patch从导入的模块执行功能。

这是我到目前为止的位置。

app / mocking.py:

from app.my_module import get_user_name

def test_method():
  return get_user_name()

if __name__ == "__main__":
  print "Starting Program..."
  test_method()

app / my_module / __ init__.py:

def get_user_name():
  return "Unmocked User"

测试/模拟测试.py:

import unittest
from app.mocking import test_method 

def mock_get_user():
  return "Mocked This Silly"

@patch('app.my_module.get_user_name')
class MockingTestTestCase(unittest.TestCase):

  def test_mock_stubs(self, mock_method):
    mock_method.return_value = 'Mocked This Silly')
    ret = test_method()
    self.assertEqual(ret, 'Mocked This Silly')

if __name__ == '__main__':
  unittest.main()

这不符合我的预期。“已修补”模块仅返回的未模拟值get_user_name。如何模拟要导入到被测命名空间中的其他包中的方法?

I want to understand how to @patch a function from an imported module.

This is where I am so far.

app/mocking.py:

from app.my_module import get_user_name

def test_method():
  return get_user_name()

if __name__ == "__main__":
  print "Starting Program..."
  test_method()

app/my_module/__init__.py:

def get_user_name():
  return "Unmocked User"

test/mock-test.py:

import unittest
from app.mocking import test_method 

def mock_get_user():
  return "Mocked This Silly"

@patch('app.my_module.get_user_name')
class MockingTestTestCase(unittest.TestCase):

  def test_mock_stubs(self, mock_method):
    mock_method.return_value = 'Mocked This Silly')
    ret = test_method()
    self.assertEqual(ret, 'Mocked This Silly')

if __name__ == '__main__':
  unittest.main()

This does not work as I would expect. The “patched” module simply returns the unmocked value of get_user_name. How do I mock methods from other packages that I am importing into a namespace under test?


回答 0

当您patchunittest.mock包中使用装饰器时,您未在修补命名空间,而是从(在这种情况下app.my_module.get_user_name)导入模块,而是在被测试的命名空间中对其进行修补app.mocking.get_user_name

为此,请Mock尝试以下类似方法:

from mock import patch
from app.mocking import test_method 

class MockingTestTestCase(unittest.TestCase):

    @patch('app.mocking.get_user_name')
    def test_mock_stubs(self, test_patch):
        test_patch.return_value = 'Mocked This Silly'
        ret = test_method()
        self.assertEqual(ret, 'Mocked This Silly')

标准库文档中包含一个有用的部分对此进行了描述。

When you are using the patch decorator from the unittest.mock package you are not patching the namespace the module is imported from (in this case app.my_module.get_user_name) you are patching it in the namespace under test app.mocking.get_user_name.

To do the above with Mock try something like the below:

from mock import patch
from app.mocking import test_method 

class MockingTestTestCase(unittest.TestCase):

    @patch('app.mocking.get_user_name')
    def test_mock_stubs(self, test_patch):
        test_patch.return_value = 'Mocked This Silly'
        ret = test_method()
        self.assertEqual(ret, 'Mocked This Silly')

The standard library documentation includes a useful section describing this.


回答 1

尽管Matti John的答案解决了您的问题(也为我提供了帮助,谢谢!),但是,我建议将本地的“ get_user_name”函数替换为模拟的函数。这将允许您控制何时替换功能以及何时不替换功能。同样,这将允许您在同一测试中进行多次替换。为此,请以类似的方式使用“ with”语句:

from mock import patch

class MockingTestTestCase(unittest.TestCase):

    def test_mock_stubs(self):
        with patch('app.mocking.get_user_name', return_value = 'Mocked This Silly'):
            ret = test_method()
            self.assertEqual(ret, 'Mocked This Silly')

While Matti John’s answer solves your issue (and helped me too, thanks!), I would, however, suggest localizing the replacement of the original ‘get_user_name’ function with the mocked one. This will allow you to control when the function is replaced and when it isn’t. Also, this will allow you to make several replacements in the same test. In order to do so, use the ‘with’ statment in a pretty simillar manner:

from mock import patch

class MockingTestTestCase(unittest.TestCase):

    def test_mock_stubs(self):
        with patch('app.mocking.get_user_name', return_value = 'Mocked This Silly'):
            ret = test_method()
            self.assertEqual(ret, 'Mocked This Silly')

python pandas删除重复的列

问题:python pandas删除重复的列

从数据框中删除重复列的最简单方法是什么?

我正在通过以下方式读取具有重复列的文本文件:

import pandas as pd

df=pd.read_table(fname)

列名是:

Time, Time Relative, N2, Time, Time Relative, H2, etc...

所有“时间”和“相对时间”列均包含相同的数据。我想要:

Time, Time Relative, N2, H2

我所有的删除,删除等尝试,例如:

df=df.T.drop_duplicates().T

导致唯一值索引错误:

Reindexing only valid with uniquely valued index objects

很抱歉成为熊猫的菜鸟。任何建议,将不胜感激。


额外细节

熊猫版本:0.9.0
Python版本:2.7.3
Windows 7
(通过Pythonxy 2.7.3.0安装)

数据文件(注意:在实际文件中,列由制表符分隔,此处它们由4个空格分隔):

Time    Time Relative [s]    N2[%]    Time    Time Relative [s]    H2[ppm]
2/12/2013 9:20:55 AM    6.177    9.99268e+001    2/12/2013 9:20:55 AM    6.177    3.216293e-005    
2/12/2013 9:21:06 AM    17.689    9.99296e+001    2/12/2013 9:21:06 AM    17.689    3.841667e-005    
2/12/2013 9:21:18 AM    29.186    9.992954e+001    2/12/2013 9:21:18 AM    29.186    3.880365e-005    
... etc ...
2/12/2013 2:12:44 PM    17515.269    9.991756+001    2/12/2013 2:12:44 PM    17515.269    2.800279e-005    
2/12/2013 2:12:55 PM    17526.769    9.991754e+001    2/12/2013 2:12:55 PM    17526.769    2.880386e-005
2/12/2013 2:13:07 PM    17538.273    9.991797e+001    2/12/2013 2:13:07 PM    17538.273    3.131447e-005

What is the easiest way to remove duplicate columns from a dataframe?

I am reading a text file that has duplicate columns via:

import pandas as pd

df=pd.read_table(fname)

The column names are:

Time, Time Relative, N2, Time, Time Relative, H2, etc...

All the Time and Time Relative columns contain the same data. I want:

Time, Time Relative, N2, H2

All my attempts at dropping, deleting, etc such as:

df=df.T.drop_duplicates().T

Result in uniquely valued index errors:

Reindexing only valid with uniquely valued index objects

Sorry for being a Pandas noob. Any Suggestions would be appreciated.


Additional Details

Pandas version: 0.9.0
Python Version: 2.7.3
Windows 7
(installed via Pythonxy 2.7.3.0)

data file (note: in the real file, columns are separated by tabs, here they are separated by 4 spaces):

Time    Time Relative [s]    N2[%]    Time    Time Relative [s]    H2[ppm]
2/12/2013 9:20:55 AM    6.177    9.99268e+001    2/12/2013 9:20:55 AM    6.177    3.216293e-005    
2/12/2013 9:21:06 AM    17.689    9.99296e+001    2/12/2013 9:21:06 AM    17.689    3.841667e-005    
2/12/2013 9:21:18 AM    29.186    9.992954e+001    2/12/2013 9:21:18 AM    29.186    3.880365e-005    
... etc ...
2/12/2013 2:12:44 PM    17515.269    9.991756+001    2/12/2013 2:12:44 PM    17515.269    2.800279e-005    
2/12/2013 2:12:55 PM    17526.769    9.991754e+001    2/12/2013 2:12:55 PM    17526.769    2.880386e-005
2/12/2013 2:13:07 PM    17538.273    9.991797e+001    2/12/2013 2:13:07 PM    17538.273    3.131447e-005

回答 0

有一个解决方案。如果某些列名重复并且您希望删除它们,则适用此规则:

df = df.loc[:,~df.columns.duplicated()]

这个怎么运作:

假设数据框的列是 ['alpha','beta','alpha']

df.columns.duplicated()返回一个布尔数组:a TrueFalse每列。如果是,False则该列名称在该点之前是唯一的;如果是,True则该列名称在前面已重复。例如,使用给定的示例,返回值为[False,False,True]

Pandas允许使用布尔值建立索引,从而仅选择True值。由于我们要保留不重复的列,因此需要翻转上面的布尔数组(即[True, True, False] = ~[False,False,True]

最后,df.loc[:,[True,True,False]]使用上述索引功能仅选择非重复列。

注意:以上内容仅检查列名称,而不检查列值。

There’s a one line solution to the problem. This applies if some column names are duplicated and you wish to remove them:

df = df.loc[:,~df.columns.duplicated()]

How it works:

Suppose the columns of the data frame are ['alpha','beta','alpha']

df.columns.duplicated() returns a boolean array: a True or False for each column. If it is False then the column name is unique up to that point, if it is True then the column name is duplicated earlier. For example, using the given example, the returned value would be [False,False,True].

Pandas allows one to index using boolean values whereby it selects only the True values. Since we want to keep the unduplicated columns, we need the above boolean array to be flipped (ie [True, True, False] = ~[False,False,True])

Finally, df.loc[:,[True,True,False]] selects only the non-duplicated columns using the aforementioned indexing capability.

Note: the above only checks columns names, not column values.


回答 1

听起来您已经知道唯一的列名。如果是这样,那就df = df['Time', 'Time Relative', 'N2']行得通。

如果没有,您的解决方案应该可以工作:

In [101]: vals = np.random.randint(0,20, (4,3))
          vals
Out[101]:
array([[ 3, 13,  0],
       [ 1, 15, 14],
       [14, 19, 14],
       [19,  5,  1]])

In [106]: df = pd.DataFrame(np.hstack([vals, vals]), columns=['Time', 'H1', 'N2', 'Time Relative', 'N2', 'Time'] )
          df
Out[106]:
   Time  H1  N2  Time Relative  N2  Time
0     3  13   0              3  13     0
1     1  15  14              1  15    14
2    14  19  14             14  19    14
3    19   5   1             19   5     1

In [107]: df.T.drop_duplicates().T
Out[107]:
   Time  H1  N2
0     3  13   0
1     1  15  14
2    14  19  14
3    19   5   1

您可能有一些特定于您的数据的数据。如果您可以提供更多有关数据的详细信息,我们可以提供更多帮助。

编辑: 就像安迪所说,问题可能出在重复的列标题上。

对于示例表文件“ dummy.csv”,我组成了:

Time    H1  N2  Time    N2  Time Relative
3   13  13  3   13  0
1   15  15  1   15  14
14  19  19  14  19  14
19  5   5   19  5   1

使用read_table给出唯一的列并正常工作:

In [151]: df2 = pd.read_table('dummy.csv')
          df2
Out[151]:
         Time  H1  N2  Time.1  N2.1  Time Relative
      0     3  13  13       3    13              0
      1     1  15  15       1    15             14
      2    14  19  19      14    19             14
      3    19   5   5      19     5              1
In [152]: df2.T.drop_duplicates().T
Out[152]:
             Time  H1  Time Relative
          0     3  13              0
          1     1  15             14
          2    14  19             14
          3    19   5              1  

如果您的版本不适合您,则可以破解一个解决方案以使其独特:

In [169]: df2 = pd.read_table('dummy.csv', header=None)
          df2
Out[169]:
              0   1   2     3   4              5
        0  Time  H1  N2  Time  N2  Time Relative
        1     3  13  13     3  13              0
        2     1  15  15     1  15             14
        3    14  19  19    14  19             14
        4    19   5   5    19   5              1
In [171]: from collections import defaultdict
          col_counts = defaultdict(int)
          col_ix = df2.first_valid_index()
In [172]: cols = []
          for col in df2.ix[col_ix]:
              cnt = col_counts[col]
              col_counts[col] += 1
              suf = '_' + str(cnt) if cnt else ''
              cols.append(col + suf)
          cols
Out[172]:
          ['Time', 'H1', 'N2', 'Time_1', 'N2_1', 'Time Relative']
In [174]: df2.columns = cols
          df2 = df2.drop([col_ix])
In [177]: df2
Out[177]:
          Time  H1  N2 Time_1 N2_1 Time Relative
        1    3  13  13      3   13             0
        2    1  15  15      1   15            14
        3   14  19  19     14   19            14
        4   19   5   5     19    5             1
In [178]: df2.T.drop_duplicates().T
Out[178]:
          Time  H1 Time Relative
        1    3  13             0
        2    1  15            14
        3   14  19            14
        4   19   5             1 

It sounds like you already know the unique column names. If that’s the case, then df = df['Time', 'Time Relative', 'N2'] would work.

If not, your solution should work:

In [101]: vals = np.random.randint(0,20, (4,3))
          vals
Out[101]:
array([[ 3, 13,  0],
       [ 1, 15, 14],
       [14, 19, 14],
       [19,  5,  1]])

In [106]: df = pd.DataFrame(np.hstack([vals, vals]), columns=['Time', 'H1', 'N2', 'Time Relative', 'N2', 'Time'] )
          df
Out[106]:
   Time  H1  N2  Time Relative  N2  Time
0     3  13   0              3  13     0
1     1  15  14              1  15    14
2    14  19  14             14  19    14
3    19   5   1             19   5     1

In [107]: df.T.drop_duplicates().T
Out[107]:
   Time  H1  N2
0     3  13   0
1     1  15  14
2    14  19  14
3    19   5   1

You probably have something specific to your data that’s messing it up. We could give more help if there’s more details you could give us about the data.

Edit: Like Andy said, the problem is probably with the duplicate column titles.

For a sample table file ‘dummy.csv’ I made up:

Time    H1  N2  Time    N2  Time Relative
3   13  13  3   13  0
1   15  15  1   15  14
14  19  19  14  19  14
19  5   5   19  5   1

using read_table gives unique columns and works properly:

In [151]: df2 = pd.read_table('dummy.csv')
          df2
Out[151]:
         Time  H1  N2  Time.1  N2.1  Time Relative
      0     3  13  13       3    13              0
      1     1  15  15       1    15             14
      2    14  19  19      14    19             14
      3    19   5   5      19     5              1
In [152]: df2.T.drop_duplicates().T
Out[152]:
             Time  H1  Time Relative
          0     3  13              0
          1     1  15             14
          2    14  19             14
          3    19   5              1  

If your version doesn’t let your, you can hack together a solution to make them unique:

In [169]: df2 = pd.read_table('dummy.csv', header=None)
          df2
Out[169]:
              0   1   2     3   4              5
        0  Time  H1  N2  Time  N2  Time Relative
        1     3  13  13     3  13              0
        2     1  15  15     1  15             14
        3    14  19  19    14  19             14
        4    19   5   5    19   5              1
In [171]: from collections import defaultdict
          col_counts = defaultdict(int)
          col_ix = df2.first_valid_index()
In [172]: cols = []
          for col in df2.ix[col_ix]:
              cnt = col_counts[col]
              col_counts[col] += 1
              suf = '_' + str(cnt) if cnt else ''
              cols.append(col + suf)
          cols
Out[172]:
          ['Time', 'H1', 'N2', 'Time_1', 'N2_1', 'Time Relative']
In [174]: df2.columns = cols
          df2 = df2.drop([col_ix])
In [177]: df2
Out[177]:
          Time  H1  N2 Time_1 N2_1 Time Relative
        1    3  13  13      3   13             0
        2    1  15  15      1   15            14
        3   14  19  19     14   19            14
        4   19   5   5     19    5             1
In [178]: df2.T.drop_duplicates().T
Out[178]:
          Time  H1 Time Relative
        1    3  13             0
        2    1  15            14
        3   14  19            14
        4   19   5             1 

回答 2

对于大型DataFrame,转置效率很低。这是一个替代方案:

def duplicate_columns(frame):
    groups = frame.columns.to_series().groupby(frame.dtypes).groups
    dups = []
    for t, v in groups.items():
        dcols = frame[v].to_dict(orient="list")

        vs = dcols.values()
        ks = dcols.keys()
        lvs = len(vs)

        for i in range(lvs):
            for j in range(i+1,lvs):
                if vs[i] == vs[j]: 
                    dups.append(ks[i])
                    break

    return dups       

像这样使用它:

dups = duplicate_columns(frame)
frame = frame.drop(dups, axis=1)

编辑

一种高效的内存版本,可像其他任何值一样对待nans:

from pandas.core.common import array_equivalent

def duplicate_columns(frame):
    groups = frame.columns.to_series().groupby(frame.dtypes).groups
    dups = []

    for t, v in groups.items():

        cs = frame[v].columns
        vs = frame[v]
        lcs = len(cs)

        for i in range(lcs):
            ia = vs.iloc[:,i].values
            for j in range(i+1, lcs):
                ja = vs.iloc[:,j].values
                if array_equivalent(ia, ja):
                    dups.append(cs[i])
                    break

    return dups

Transposing is inefficient for large DataFrames. Here is an alternative:

def duplicate_columns(frame):
    groups = frame.columns.to_series().groupby(frame.dtypes).groups
    dups = []
    for t, v in groups.items():
        dcols = frame[v].to_dict(orient="list")

        vs = dcols.values()
        ks = dcols.keys()
        lvs = len(vs)

        for i in range(lvs):
            for j in range(i+1,lvs):
                if vs[i] == vs[j]: 
                    dups.append(ks[i])
                    break

    return dups       

Use it like this:

dups = duplicate_columns(frame)
frame = frame.drop(dups, axis=1)

Edit

A memory efficient version that treats nans like any other value:

from pandas.core.common import array_equivalent

def duplicate_columns(frame):
    groups = frame.columns.to_series().groupby(frame.dtypes).groups
    dups = []

    for t, v in groups.items():

        cs = frame[v].columns
        vs = frame[v]
        lcs = len(cs)

        for i in range(lcs):
            ia = vs.iloc[:,i].values
            for j in range(i+1, lcs):
                ja = vs.iloc[:,j].values
                if array_equivalent(ia, ja):
                    dups.append(cs[i])
                    break

    return dups

回答 3

如果我没有记错的话,下面的操作可以解决问题,而不会出现转置解决方案的内存问题,并且行数少于@kalu函数,并且保留所有类似名称的列中的第一列。

Cols = list(df.columns)
for i,item in enumerate(df.columns):
    if item in df.columns[:i]: Cols[i] = "toDROP"
df.columns = Cols
df = df.drop("toDROP",1)

If I’m not mistaken, the following does what was asked without the memory problems of the transpose solution and with fewer lines than @kalu ‘s function, keeping the first of any similarly named columns.

Cols = list(df.columns)
for i,item in enumerate(df.columns):
    if item in df.columns[:i]: Cols[i] = "toDROP"
df.columns = Cols
df = df.drop("toDROP",1)

回答 4

看来您在正确的道路上。这是您要寻找的一线客:

df.reset_index().T.drop_duplicates().T

但是,由于没有示例数据帧会产生引用的错误消息Reindexing only valid with uniquely valued index objects,因此很难确切说明解决问题的方法。如果恢复原始索引对您很重要,请执行以下操作:

original_index = df.index.names
df.reset_index().T.drop_duplicates().reset_index(original_index).T

It looks like you were on the right path. Here is the one-liner you were looking for:

df.reset_index().T.drop_duplicates().T

But since there is no example data frame that produces the referenced error message Reindexing only valid with uniquely valued index objects, it is tough to say exactly what would solve the problem. if restoring the original index is important to you do this:

original_index = df.index.names
df.reset_index().T.drop_duplicates().reset_index(original_index).T

回答 5

第一步:-读取第一行,即删除所有重复的列。

第二步:-最后仅读取该列。

cols = pd.read_csv("file.csv", header=None, nrows=1).iloc[0].drop_duplicates()
df = pd.read_csv("file.csv", usecols=cols)

First step:- Read first row i.e all columns the remove all duplicate columns.

Second step:- Finally read only that columns.

cols = pd.read_csv("file.csv", header=None, nrows=1).iloc[0].drop_duplicates()
df = pd.read_csv("file.csv", usecols=cols)

回答 6

我遇到了这个问题,第一个答案提供的衬里效果很好。但是,我的麻烦之处在于该列的第二个副本包含所有数据。第一份没有。

解决方案是通过切换否定运算符拆分一个数据帧来创建两个数据帧。拥有两个数据框后,我使用lsuffix。这样,我就可以引用和删除没有数据的列。

-E

I ran into this problem where the one liner provided by the first answer worked well. However, I had the extra complication where the second copy of the column had all of the data. The first copy did not.

The solution was to create two data frames by splitting the one data frame by toggling the negation operator. Once I had the two data frames, I ran a join statement using the lsuffix. This way, I could then reference and delete the column without the data.

– E


回答 7

下面的方法将识别重复列,以查看最初构建数据框时出了什么问题。

dupes = pd.DataFrame(df.columns)
dupes[dupes.duplicated()]

The way below will identify dupe columns to review what is going wrong building the dataframe originally.

dupes = pd.DataFrame(df.columns)
dupes[dupes.duplicated()]

回答 8

通过其值删除重复列的快速简便方法:

df = df.T.drop_duplicates()。T

更多信息:Pandas DataFrame drop_duplicates manual

Fast and easy way to drop the duplicated columns by their values:

df = df.T.drop_duplicates().T

More info: Pandas DataFrame drop_duplicates manual .