标签归档:Python

在Python中使用字符串格式打印元组

问题:在Python中使用字符串格式打印元组

所以,我有这个问题。我有元组(1,2,3),应该以字符串格式打印。例如。

tup = (1,2,3)
print "this is a tuple %something" % (tup)

这应该打印带有括号的元组表示形式,例如

这是一个元组(1,2,3)

但是我得到了TypeError: not all arguments converted during string formatting

我到底该怎么做?Kinda在这里输了,所以如果你们能指出我正确的方向:)

So, i have this problem. I got tuple (1,2,3) which i should print with string formatting. eg.

tup = (1,2,3)
print "this is a tuple %something" % (tup)

and this should print tuple representation with brackets, like

This is a tuple (1,2,3)

But I get TypeError: not all arguments converted during string formatting instead.

How in the world am I able to do this? Kinda lost here so if you guys could point me to a right direction :)


回答 0

>>> thetuple = (1, 2, 3)
>>> print "this is a tuple: %s" % (thetuple,)
this is a tuple: (1, 2, 3)

用感兴趣的元组作为唯一项(即(thetuple,)零件)制作单例元组是这里的关键。

>>> thetuple = (1, 2, 3)
>>> print "this is a tuple: %s" % (thetuple,)
this is a tuple: (1, 2, 3)

Making a singleton tuple with the tuple of interest as the only item, i.e. the (thetuple,) part, is the key bit here.


回答 1

请注意,该%语法已过时。使用str.format,它更简单易读:

t = 1,2,3
print 'This is a tuple {0}'.format(t)

Note that the % syntax is obsolete. Use str.format, which is simpler and more readable:

t = 1,2,3
print 'This is a tuple {0}'.format(t)

回答 2

上面给出的许多答案都是正确的。正确的方法是:

>>> thetuple = (1, 2, 3)
>>> print "this is a tuple: %s" % (thetuple,)
this is a tuple: (1, 2, 3)

但是,关于'%'String运算符是否已过时存在争议。正如许多人指出的那样,这绝对不是过时的,因为'%'String运算符更容易将String语句与列表数据组合在一起。

例:

>>> tup = (1,2,3)
>>> print "First: %d, Second: %d, Third: %d" % tup
First: 1, Second: 2, Third: 3

但是,使用该.format()函数将得到一个冗长的语句。

例:

>>> tup = (1,2,3)
>>> print "First: %d, Second: %d, Third: %d" % tup
>>> print 'First: {}, Second: {}, Third: {}'.format(1,2,3)
>>> print 'First: {0[0]}, Second: {0[1]}, Third: {0[2]}'.format(tup)

First: 1, Second: 2, Third: 3
First: 1, Second: 2, Third: 3
First: 1, Second: 2, Third: 3

进一步,'%'串操作者也为我们验证数据类型如有用%s%d%i,而.format()只支持两个转换标志'!s''!r'

Many answers given above were correct. The right way to do it is:

>>> thetuple = (1, 2, 3)
>>> print "this is a tuple: %s" % (thetuple,)
this is a tuple: (1, 2, 3)

However, there was a dispute over if the '%' String operator is obsolete. As many have pointed out, it is definitely not obsolete, as the '%' String operator is easier to combine a String statement with a list data.

Example:

>>> tup = (1,2,3)
>>> print "First: %d, Second: %d, Third: %d" % tup
First: 1, Second: 2, Third: 3

However, using the .format() function, you will end up with a verbose statement.

Example:

>>> tup = (1,2,3)
>>> print "First: %d, Second: %d, Third: %d" % tup
>>> print 'First: {}, Second: {}, Third: {}'.format(1,2,3)
>>> print 'First: {0[0]}, Second: {0[1]}, Third: {0[2]}'.format(tup)

First: 1, Second: 2, Third: 3
First: 1, Second: 2, Third: 3
First: 1, Second: 2, Third: 3

Further more, '%' string operator also useful for us to validate the data type such as %s, %d, %i, while .format() only support two conversion flags: '!s' and '!r'.


回答 3

>>> tup = (1, 2, 3)
>>> print "Here it is: %s" % (tup,)
Here it is: (1, 2, 3)
>>>

请注意,这(tup,)是一个包含元组的元组。外部元组是%运算符的参数。内部元组是其内容,它实际上是打印的。

(tup)是括号中的表达式,计算时结果为tup

(tup,)后跟逗号的是一个元组,它tup仅包含一个成员。

>>> tup = (1, 2, 3)
>>> print "Here it is: %s" % (tup,)
Here it is: (1, 2, 3)
>>>

Note that (tup,) is a tuple containing a tuple. The outer tuple is the argument to the % operator. The inner tuple is its content, which is actually printed.

(tup) is an expression in brackets, which when evaluated results in tup.

(tup,) with the trailing comma is a tuple, which contains tup as is only member.


回答 4

这不使用字符串格式,但是您应该能够:

print 'this is a tuple ', (1, 2, 3)

如果您确实要使用字符串格式设置:

print 'this is a tuple %s' % str((1, 2, 3))
# or
print 'this is a tuple %s' % ((1, 2, 3),)

请注意,这假设您使用的Python版本低于3.0。

This doesn’t use string formatting, but you should be able to do:

print 'this is a tuple ', (1, 2, 3)

If you really want to use string formatting:

print 'this is a tuple %s' % str((1, 2, 3))
# or
print 'this is a tuple %s' % ((1, 2, 3),)

Note, this assumes you are using a Python version earlier than 3.0.


回答 5

t = (1, 2, 3)

# the comma (,) concatenates the strings and adds a space
print "this is a tuple", (t)

# format is the most flexible way to do string formatting
print "this is a tuple {0}".format(t)

# classic string formatting
# I use it only when working with older Python versions
print "this is a tuple %s" % repr(t)
print "this is a tuple %s" % str(t)
t = (1, 2, 3)

# the comma (,) concatenates the strings and adds a space
print "this is a tuple", (t)

# format is the most flexible way to do string formatting
print "this is a tuple {0}".format(t)

# classic string formatting
# I use it only when working with older Python versions
print "this is a tuple %s" % repr(t)
print "this is a tuple %s" % str(t)

回答 6

即使这个问题已经很老了并且有很多不同的答案,我仍然想添加恕我直言,这是最“ pythonic”的,也是可读/简洁的答案。

由于tuple锑已经正确地显示了常规打印方法,因此这是用于分别打印元组中的每个元素的附加功能,因为方嘉俊(Fong Kah Chun)已正确显示了该方法。%s语法。

有趣的是它在评论被只提到,但使用一个星号操作来解压缩所述元组产生完全的灵活性和可读性使用str.format方法分别打印元组元素时

tup = (1, 2, 3)
print('Element(s) of the tuple: One {0}, two {1}, three {2}'.format(*tup))

这也避免了在打印单元素元组时打印尾随逗号,这由Jacob CUI绕过replace。(即使恕我直言,如果要在打印时保留类型表示形式,则尾部逗号表示形式也是正确的):

tup = (1, )
print('Element(s) of the tuple: One {0}'.format(*tup))

Even though this question is quite old and has many different answers, I’d still like to add the imho most “pythonic” and also readable/concise answer.

Since the general tuple printing method is already shown correctly by Antimony, this is an addition for printing each element in a tuple separately, as Fong Kah Chun has shown correctly with the %s syntax.

Interestingly it has been only mentioned in a comment, but using an asterisk operator to unpack the tuple yields full flexibility and readability using the str.format method when printing tuple elements separately.

tup = (1, 2, 3)
print('Element(s) of the tuple: One {0}, two {1}, three {2}'.format(*tup))

This also avoids printing a trailing comma when printing a single-element tuple, as circumvented by Jacob CUI with replace. (Even though imho the trailing comma representation is correct if wanting to preserve the type representation when printing):

tup = (1, )
print('Element(s) of the tuple: One {0}'.format(*tup))

回答 7

我认为最好的方法是:

t = (1,2,3)

print "This is a tuple: %s" % str(t)

如果您熟悉printf样式格式,则Python支持其自己的版本。在Python中,这是通过将“%”运算符应用于字符串(模运算符的重载)来完成的,该运算符采用任何字符串并对其应用printf样式格式。

在我们的例子中,我们告诉它打印“ This is a tuple:”,然后添加一个字符串“%s”,对于实际的字符串,我们传入该tuple的字符串表示形式(通过调用str( t))。

如果您不熟悉printf样式格式,我强烈建议您学习,因为它非常标准。大多数语言都以某种方式支持它。

I think the best way to do this is:

t = (1,2,3)

print "This is a tuple: %s" % str(t)

If you’re familiar with printf style formatting, then Python supports its own version. In Python, this is done using the “%” operator applied to strings (an overload of the modulo operator), which takes any string and applies printf-style formatting to it.

In our case, we are telling it to print “This is a tuple: “, and then adding a string “%s”, and for the actual string, we’re passing in a string representation of the tuple (by calling str(t)).

If you’re not familiar with printf style formatting, I highly suggest learning, since it’s very standard. Most languages support it in one way or another.


回答 8

请注意,如果元组只有一项,则会在末尾添加逗号。例如:

t = (1,)
print 'this is a tuple {}'.format(t)

您会得到:

'this is a tuple (1,)'

在某些情况下,例如,您想获取要在mysql查询字符串中使用的带引号的列表,例如

SELECT name FROM students WHERE name IN ('Tom', 'Jerry');

您需要考虑在格式化后使用replace(’,)’,’)’)删除尾部逗号,因为元组可能只有1个项目,例如(’Tom’,),因此需要删除尾部逗号:

query_string = 'SELECT name FROM students WHERE name IN {}'.format(t).replace(',)', ')')

请建议您是否有适当的方法在输出中删除此逗号。

Please note a trailing comma will be added if the tuple only has one item. e.g:

t = (1,)
print 'this is a tuple {}'.format(t)

and you’ll get:

'this is a tuple (1,)'

in some cases e.g. you want to get a quoted list to be used in mysql query string like

SELECT name FROM students WHERE name IN ('Tom', 'Jerry');

you need to consider to remove the tailing comma use replace(‘,)’, ‘)’) after formatting because it’s possible that the tuple has only 1 item like (‘Tom’,), so the tailing comma needs to be removed:

query_string = 'SELECT name FROM students WHERE name IN {}'.format(t).replace(',)', ')')

Please suggest if you have decent way of removing this comma in the output.


回答 9

除了其他答案中提出的方法外,从Python 3.6开始,您还可以使用文字字符串插值(f-strings):

>>> tup = (1,2,3)
>>> print(f'this is a tuple {tup}')
this is a tuple (1, 2, 3)

Besides the methods proposed in the other answers, since Python 3.6 you can also use Literal String Interpolation (f-strings):

>>> tup = (1,2,3)
>>> print(f'this is a tuple {tup}')
this is a tuple (1, 2, 3)

回答 10

试试这个来获得答案:

>>>d = ('1', '2') 
>>> print("Value: %s" %(d))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: not all arguments converted during string formatting

如果我们在()中只放入一个元组,它本身就会构成一个元组:

>>> (d)
('1', '2')

这意味着上面的打印语句看起来像:print(“ Value:%s”%(’1’,’2’))这是一个错误!

因此:

>>> (d,)
(('1', '2'),)
>>> 

上面的内容将正确输入打印的参数。

Try this to get an answer:

>>>d = ('1', '2') 
>>> print("Value: %s" %(d))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: not all arguments converted during string formatting

If we put only-one tuple inside (), it makes a tuple itself:

>>> (d)
('1', '2')

This means the above print statement will look like: print(“Value: %s” %(‘1’, ‘2’)) which is an error!

Hence:

>>> (d,)
(('1', '2'),)
>>> 

Above will be fed correctly to the print’s arguments.


回答 11

对于python 3

tup = (1,2,3)
print("this is a tuple %s" % str(tup))

For python 3

tup = (1,2,3)
print("this is a tuple %s" % str(tup))

回答 12

您也可以尝试这个。

tup = (1,2,3)
print("this is a tuple {something}".format(something=tup))

您不能使用%something(tup)包装,并用元组拆包概念只是因为。

You can try this one as well;

tup = (1,2,3)
print("this is a tuple {something}".format(something=tup))

You can’t use %something with (tup) just because of packing and unpacking concept with tuple.


回答 13

谈话很便宜,请向您显示代码:

>>> tup = (10, 20, 30)
>>> i = 50
>>> print '%d      %s'%(i,tup)
50  (10, 20, 30)
>>> print '%s'%(tup,)
(10, 20, 30)
>>> 

Talk is cheap, show you the code:

>>> tup = (10, 20, 30)
>>> i = 50
>>> print '%d      %s'%(i,tup)
50  (10, 20, 30)
>>> print '%s'%(tup,)
(10, 20, 30)
>>> 

seek()函数?

问题:seek()函数?

请在这里原谅我的困惑,但是我已经阅读了有关python中seek()函数的文档(必须使用它之后),尽管它对我有所帮助,但我对它的实际含义还是有些困惑,但任何解释都很多谢谢,谢谢。

Please excuse my confusion here but I have read the documentation regarding the seek() function in python (after having to use it) and although it helped me I am still a bit confused on the actual meaning of what it does, any explanations are much appreciated, thank you.


回答 0

关于seek()没有太多的担心。

首先,对打开的文件进行操作时非常有用。

重要的是要注意其语法如下:

fp.seek(offset, from_what)

fp您正在使用的文件指针在哪里;offset表示您将移动多少个职位;from_what定义您的参考点:

  • 0:表示您的参考点是文件的开头
  • 1:表示参考点是当前文件位置
  • 2:表示您的参考点是文件的结尾

如果省略,则from_what默认为0。

永远不要忘记,在管理文件时,该文件中总会存在您当前正在处理的位置。刚打开时,该位置是文件的开头,但是当您使用它时,可能会前进。
seek当您需要walk打开该文件时,对您很有用,就像您要进入的路径一样。

Regarding seek() there’s not too much to worry about.

First of all, it is useful when operating over an open file.

It’s important to note that its syntax is as follows:

fp.seek(offset, from_what)

where fp is the file pointer you’re working with; offset means how many positions you will move; from_what defines your point of reference:

  • 0: means your reference point is the beginning of the file
  • 1: means your reference point is the current file position
  • 2: means your reference point is the end of the file

if omitted, from_what defaults to 0.

Never forget that when managing files, there’ll always be a position inside that file where you are currently working on. When just open, that position is the beginning of the file, but as you work with it, you may advance.
seek will be useful to you when you need to walk along that open file, just as a path you are traveling into.


回答 1

当您打开文件时,系统指向文件的开头。您所做的任何读取或写入都将从一开始就发生。一个seek()操作将该指针移动到文件的其他部分,以便您可以在该位置进行读取或写入。

因此,如果您想读取整个文件但跳过前20个字节,请打开文件,seek(20)移至要开始读取的位置,然后继续读取文件。

或者说您想每隔10个字节读取一次,您可以编写一个循环seek(9, 1)(相对于当前位置向前移动9个read(1)字节)(重复读取一个字节)。

When you open a file, the system points to the beginning of the file. Any read or write you do will happen from the beginning. A seek() operation moves that pointer to some other part of the file so you can read or write at that place.

So, if you want to read the whole file but skip the first 20 bytes, open the file, seek(20) to move to where you want to start reading, then continue with reading the file.

Or say you want to read every 10th byte, you could write a loop that does seek(9, 1) (moves 9 bytes forward relative to the current positions), read(1) (reads one byte), repeat.


回答 2

seek期望的功能的偏移以字节为单位。

Ascii文件示例:

因此,如果您的文本文件包含以下内容:

simple.txt

abc

您可以跳过1个字节来跳过第一个字符,如下所示:

fp = open('simple.txt', 'r')
fp.seek(1)
print fp.readline()
>>> bc

二进制文件示例收集宽度:

fp = open('afile.png', 'rb')
fp.seek(16)
print 'width: {0}'.format(struct.unpack('>i', fp.read(4))[0])
print 'height: ', struct.unpack('>i', fp.read(4))[0]

注意:调用后,read您将更改读数头的位置,其作用类似于seek

The seek function expect’s an offset in bytes.

Ascii File Example:

So if you have a text file with the following content:

simple.txt

abc

You can jump 1 byte to skip over the first character as following:

fp = open('simple.txt', 'r')
fp.seek(1)
print fp.readline()
>>> bc

Binary file example gathering width :

fp = open('afile.png', 'rb')
fp.seek(16)
print 'width: {0}'.format(struct.unpack('>i', fp.read(4))[0])
print 'height: ', struct.unpack('>i', fp.read(4))[0]

Note: Once you call read you are changing the position of the read-head, which act’s like seek.


回答 3

对于字符串,请忘记使用WHENCE:使用f.seek(0)放置在文件的开头,使用f.seek(len(f)+1)放置在文件的结尾。使用open(file,“ r +”)读取/写入文件中的任何位置。如果使用“ a +”,则无论将光标放在何处,都只能在文件末尾写入(附加)。

For strings, forget about using WHENCE: use f.seek(0) to position at beginning of file and f.seek(len(f)+1) to position at the end of file. Use open(file, “r+”) to read/write anywhere in a file. If you use “a+” you’ll only be able to write (append) at the end of the file regardless of where you position the cursor.


在Python字符串中转义正则表达式特殊字符

问题:在Python字符串中转义正则表达式特殊字符

Python是否具有可用来在正则表达式中转义特殊字符的函数?

例如,I'm "stuck" :\应成为I\'m \"stuck\" :\\

Does Python have a function that I can use to escape special characters in a regular expression?

For example, I'm "stuck" :\ should become I\'m \"stuck\" :\\.


回答 0

re.escape

>>> import re
>>> re.escape(r'\ a.*$')
'\\\\\\ a\\.\\*\\$'
>>> print(re.escape(r'\ a.*$'))
\\\ a\.\*\$
>>> re.escape('www.stackoverflow.com')
'www\\.stackoverflow\\.com'
>>> print(re.escape('www.stackoverflow.com'))
www\.stackoverflow\.com

在这里重复:

re.escape(字符串)

返回所有非字母数字加反斜杠的字符串;如果您想匹配其中可能包含正则表达式元字符的任意文字字符串,这将很有用。

从Python 3.7 re.escape()开始,更改为仅转义对正则表达式操作有意义的字符。

Use re.escape

>>> import re
>>> re.escape(r'\ a.*$')
'\\\\\\ a\\.\\*\\$'
>>> print(re.escape(r'\ a.*$'))
\\\ a\.\*\$
>>> re.escape('www.stackoverflow.com')
'www\\.stackoverflow\\.com'
>>> print(re.escape('www.stackoverflow.com'))
www\.stackoverflow\.com

Repeating it here:

re.escape(string)

Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

As of Python 3.7 re.escape() was changed to escape only characters which are meaningful to regex operations.


回答 1

我很惊讶没有人提到通过re.sub()以下方式使用正则表达式:

import re
print re.sub(r'([\"])',    r'\\\1', 'it\'s "this"')  # it's \"this\"
print re.sub(r"([\'])",    r'\\\1', 'it\'s "this"')  # it\'s "this"
print re.sub(r'([\" \'])', r'\\\1', 'it\'s "this"')  # it\'s\ \"this\"

重要注意事项:

  • 搜索模式中,包括\您要查找的字符。你会使用\逃脱你的角色,所以你需要逃避 为好。
  • 搜索模式周围加上括号,例如([\"]),以便替换 模式在找到的字符添加\到其前面时可以使用该字符。(这就是 \1作用:使用第一个带括号的组的值。)
  • r前面r'([\"])'意味着它是一个原始字符串。原始字符串使用不同的规则来转义反斜杠。要([\"])以纯字符串形式编写,您需要将所有反斜杠加倍,并写入'([\\"])'。在编写正则表达式时,原始字符串更友好。
  • 替代模式,你需要转义\从先于一个取代基的反斜杠,例如区分\1,因此r'\\\1'。写 的是作为一个普通的字符串,你需要'\\\\\\1'-大家都不希望发生。

I’m surprised no one has mentioned using regular expressions via re.sub():

import re
print re.sub(r'([\"])',    r'\\\1', 'it\'s "this"')  # it's \"this\"
print re.sub(r"([\'])",    r'\\\1', 'it\'s "this"')  # it\'s "this"
print re.sub(r'([\" \'])', r'\\\1', 'it\'s "this"')  # it\'s\ \"this\"

Important things to note:

  • In the search pattern, include \ as well as the character(s) you’re looking for. You’re going to be using \ to escape your characters, so you need to escape that as well.
  • Put parentheses around the search pattern, e.g. ([\"]), so that the substitution pattern can use the found character when it adds \ in front of it. (That’s what \1 does: uses the value of the first parenthesized group.)
  • The r in front of r'([\"])' means it’s a raw string. Raw strings use different rules for escaping backslashes. To write ([\"]) as a plain string, you’d need to double all the backslashes and write '([\\"])'. Raw strings are friendlier when you’re writing regular expressions.
  • In the substitution pattern, you need to escape \ to distinguish it from a backslash that precedes a substitution group, e.g. \1, hence r'\\\1'. To write that as a plain string, you’d need '\\\\\\1' — and nobody wants that.

回答 2

使用repr()[1:-1]。在这种情况下,双引号不需要转义。[-1:1]切片是从开头和结尾删除单引号。

>>> x = raw_input()
I'm "stuck" :\
>>> print x
I'm "stuck" :\
>>> print repr(x)[1:-1]
I\'m "stuck" :\\

或者,也许您只是想转义一个短语以粘贴到您的程序中?如果是这样,请执行以下操作:

>>> raw_input()
I'm "stuck" :\
'I\'m "stuck" :\\'

Use repr()[1:-1]. In this case, the double quotes don’t need to be escaped. The [-1:1] slice is to remove the single quote from the beginning and the end.

>>> x = raw_input()
I'm "stuck" :\
>>> print x
I'm "stuck" :\
>>> print repr(x)[1:-1]
I\'m "stuck" :\\

Or maybe you just want to escape a phrase to paste into your program? If so, do this:

>>> raw_input()
I'm "stuck" :\
'I\'m "stuck" :\\'

回答 3

如上所述,答案取决于您的情况。如果要转义正则表达式的字符串,则应使用re.escape()。但是,如果要转义一组特定的字符,请使用此lambda函数:

>>> escape = lambda s, escapechar, specialchars: "".join(escapechar + c if c in specialchars or c == escapechar else c for c in s)
>>> s = raw_input()
I'm "stuck" :\
>>> print s
I'm "stuck" :\
>>> print escape(s, "\\", ['"'])
I'm \"stuck\" :\\

As it was mentioned above, the answer depends on your case. If you want to escape a string for a regular expression then you should use re.escape(). But if you want to escape a specific set of characters then use this lambda function:

>>> escape = lambda s, escapechar, specialchars: "".join(escapechar + c if c in specialchars or c == escapechar else c for c in s)
>>> s = raw_input()
I'm "stuck" :\
>>> print s
I'm "stuck" :\
>>> print escape(s, "\\", ['"'])
I'm \"stuck\" :\\

回答 4

这并不难:

def escapeSpecialCharacters ( text, characters ):
    for character in characters:
        text = text.replace( character, '\\' + character )
    return text

>>> escapeSpecialCharacters( 'I\'m "stuck" :\\', '\'"' )
'I\\\'m \\"stuck\\" :\\'
>>> print( _ )
I\'m \"stuck\" :\

It’s not that hard:

def escapeSpecialCharacters ( text, characters ):
    for character in characters:
        text = text.replace( character, '\\' + character )
    return text

>>> escapeSpecialCharacters( 'I\'m "stuck" :\\', '\'"' )
'I\\\'m \\"stuck\\" :\\'
>>> print( _ )
I\'m \"stuck\" :\

回答 5

如果只想替换某些字符,则可以使用以下命令:

import re

print re.sub(r'([\.\\\+\*\?\[\^\]\$\(\)\{\}\!\<\>\|\:\-])', r'\\\1', "example string.")

If you only want to replace some characters you could use this:

import re

print re.sub(r'([\.\\\+\*\?\[\^\]\$\(\)\{\}\!\<\>\|\:\-])', r'\\\1', "example string.")

如何在Python中创建一组集?

问题:如何在Python中创建一组集?

我正在尝试在Python中设置一组。我不知道该怎么做。

从空集开始xx

xx = set([])
# Now we have some other set, for example
elements = set([2,3,4])
xx.add(elements)

但我明白了

TypeError: unhashable type: 'list'

要么

TypeError: unhashable type: 'set'

Python中可能有一组集合吗?

我正在处理大量集合,但我希望不必处理重复的集合(集合A1,集合A2,….的集合B,如果Ai = Aj,则“将取消”两个集合)

I’m trying to make a set of sets in Python. I can’t figure out how to do it.

Starting with the empty set xx:

xx = set([])
# Now we have some other set, for example
elements = set([2,3,4])
xx.add(elements)

but I get

TypeError: unhashable type: 'list'

or

TypeError: unhashable type: 'set'

Is it possible to have a set of sets in Python?

I am dealing with a large collection of sets and I want to be able to not have to deal duplicate sets (a set B of sets A1, A2, …., An would “cancel” two sets if Ai = Aj)


回答 0

Python的抱怨是因为内部set对象是可变的,因此不可散列。解决方案是frozenset用于内部集,以表明您无意修改它们。

Python’s complaining because the inner set objects are mutable and thus not hashable. The solution is to use frozenset for the inner sets, to indicate that you have no intention of modifying them.


回答 1

人们已经提到您可以使用Frozenset()做到这一点,所以我将添加一个代码来实现此目的:

例如,您要从以下列表列表中创建一组集合:

t = [[], [1, 2], [5], [1, 2, 5], [1, 2, 3, 4], [1, 2, 3, 6]]

您可以通过以下方式创建集合:

t1 = set(frozenset(i) for i in t)

People already mentioned that you can do this with a frozenset(), so I will just add a code how to achieve this:

For example you want to create a set of sets from the following list of lists:

t = [[], [1, 2], [5], [1, 2, 5], [1, 2, 3, 4], [1, 2, 3, 6]]

you can create your set in the following way:

t1 = set(frozenset(i) for i in t)

回答 2

frozenset在内部使用。


回答 3

所以我有完全相同的问题。我想制作一个可以作为一组集合使用的数据结构。问题在于集合必须包含不可变的对象。因此,您可以做的只是将其作为一组元组。对我来说很好!

A = set()
A.add( (2,3,4) )##adds the element
A.add( (2,3,4) )##does not add the same element
A.add( (2,3,5) )##adds the element, because it is different!

So I had the exact same problem. I wanted to make a data structure that works as a set of sets. The problem is that the sets must contain immutable objects. So, what you can do is simply make it as a set of tuples. That worked fine for me!

A = set()
A.add( (2,3,4) )##adds the element
A.add( (2,3,4) )##does not add the same element
A.add( (2,3,5) )##adds the element, because it is different!

回答 4

截至2020年,Python官方文档建议使用frozenset表示集合集。

As of 2020, the official Python documentation advise using frozenset to represent sets of sets.


无法通过套接字’/tmp/mysql.sock连接到本地MySQL服务器

问题:无法通过套接字’/tmp/mysql.sock连接到本地MySQL服务器

当我在测试套件中尝试连接到本地MySQL服务器时,它失败并显示以下错误:

OperationalError: (2002, "Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)")

但是,我始终可以通过运行命令行mysql程序连接到MySQL 。A ps aux | grep mysql显示服务器正在运行,并 stat /tmp/mysql.sock确认套接字存在。此外,如果我在except该异常的子句中打开调试器,则可以使用完全相同的参数可靠地进行连接。

这个问题可以相当可靠地重现,但是似乎不是100%,因为每当我遇到一个蓝色月亮时,我的测试套件实际上都运行了而没有遇到此错误。当我尝试使用sudo dtruss它时,它没有复制。

所有的客户端代码都在Python中,尽管我不知道这是如何相关的。

切换为使用主机127.0.0.1会产生错误:

DatabaseError: Can't connect to MySQL server on '127.0.0.1' (61)

When I attempted to connect to a local MySQL server during my test suite, it fails with the error:

OperationalError: (2002, "Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)")

However, I’m able to at all times, connect to MySQL by running the command line mysql program. A ps aux | grep mysql shows the server is running, and stat /tmp/mysql.sock confirm that the socket exists. Further, if I open a debugger in except clause of that exception, I’m able to reliably connect with the exact same parameters.

This issue reproduces fairly reliably, however it doesn’t appear to be 100%, because every once in a blue moon, my test suite does in fact run without hitting this error. When I attempted to run with sudo dtruss it did not reproduce.

All the client code is in Python, though I can’t figure how that’d be relevant.

Switching to use host 127.0.0.1 produces the error:

DatabaseError: Can't connect to MySQL server on '127.0.0.1' (61)

回答 0

sudo /usr/local/mysql/support-files/mysql.server start 

这对我有用。但是,如果这不起作用,请确保mysqld正在运行并尝试连接。

sudo /usr/local/mysql/support-files/mysql.server start 

This worked for me. However, if this doesnt work then make sure that mysqld is running and try connecting.


回答 1

MySQL手册的相关部分在这里。我首先要完成列出的调试步骤。

另外,请记住,在这种情况下,localhost和127.0.0.1是不同的:

  • 如果host设置为localhost,则使用套接字或管道。
  • 如果将host设置为127.0.0.1,则客户端将被强制使用TCP / IP。

因此,例如,您可以检查数据库是否正在侦听TCP连接vi netstat -nlp。它似乎正在侦听TCP连接,因为您说这mysql -h 127.0.0.1很好。要检查是否可以通过套接字连接到数据库,请使用mysql -h localhost

如果以上方法均无济于事,那么您可能需要发布有关MySQL配置,实例化连接的确切方式等的更多详细信息。

The relevant section of the MySQL manual is here. I’d start by going through the debugging steps listed there.

Also, remember that localhost and 127.0.0.1 are not the same thing in this context:

  • If host is set to localhost, then a socket or pipe is used.
  • If host is set to 127.0.0.1, then the client is forced to use TCP/IP.

So, for example, you can check if your database is listening for TCP connections vi netstat -nlp. It seems likely that it IS listening for TCP connections because you say that mysql -h 127.0.0.1 works just fine. To check if you can connect to your database via sockets, use mysql -h localhost.

If none of this helps, then you probably need to post more details about your MySQL config, exactly how you’re instantiating the connection, etc.


回答 2

对我来说,问题是我没有运行mysql服务器。首先运行服务器,然后执行mysql

$ mysql.server start
$ mysql -h localhost -u root -p

For me the problem was I wasn’t running MySQL Server. Run server first and then execute mysql.

$ mysql.server start
$ mysql -h localhost -u root -p

回答 3

当我的开发人员安装了堆栈管理器(如MAMP)并在非标准位置预先安装了MySQL时,就已经在我的商店中看到这种情况。

在您的终端运行

mysql_config --socket

这将为您提供袜子文件的路径。走那条路,并在您的数据库主机参数中使用它。

您需要做的就是指出您的

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'test',
        'USER': 'test',
        'PASSWORD': 'test',
        'HOST': '/Applications/MAMP/tmp/mysql/mysql.sock',
        'PORT': '',
    },
}

注意

which mysql_config如果您以某种方式在计算机上安装了多个mysql服务器实例,则也可以运行,您可能连接了错误的实例。

I’ve seen this happen at my shop when my devs have a stack manager like MAMP installed that comes preconfigured with MySQL installed in a non standard place.

at your terminal run

mysql_config --socket

that will give you your path to the sock file. take that path and use it in your DATABASES HOST paramater.

What you need to do is point your

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'test',
        'USER': 'test',
        'PASSWORD': 'test',
        'HOST': '/Applications/MAMP/tmp/mysql/mysql.sock',
        'PORT': '',
    },
}

NOTE

also run which mysql_config if you somehow have multiple instances of mysql server installed on the machine you may be connecting to the wrong one.


回答 4

我只是将HOSTfrom 更改为localhost127.0.0.1并且效果很好:

# settings.py of Django project
...

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'db_name',
        'USER': 'username',
        'PASSWORD': 'password',
        'HOST': '127.0.0.1',
        'PORT': '',
},
...

I just changed the HOST from localhost to 127.0.0.1 and it works fine:

# settings.py of Django project
...

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'db_name',
        'USER': 'username',
        'PASSWORD': 'password',
        'HOST': '127.0.0.1',
        'PORT': '',
},
...

回答 5

何时,如果您在Mac OSx中丢失了守护进程mysql,但在其他路径中(例如在private / var中存在),请执行以下命令

1)

ln -s /private/var/mysql/mysql.sock /tmp/mysql.sock

2)使用以下命令重新启动与mysql的连接:

mysql -u username -p -h host databasename

也适用于mariadb

When, if you lose your daemon mysql in mac OSx but is present in other path for exemple in private/var do the following command

1)

ln -s /private/var/mysql/mysql.sock /tmp/mysql.sock

2) restart your connexion to mysql with :

mysql -u username -p -h host databasename

works also for mariadb


回答 6

在终端中运行以下cmd

/ usr / local / mysql / bin / mysqld_safe

然后重新启动机器以生效。有用!!

Run the below cmd in terminal

/usr/local/mysql/bin/mysqld_safe

Then restart the machine to take effect. It works!!


回答 7

使用lsof命令检查mysql进程的打开文件数。

增加打开文件的限制,然后再次运行。

Check number of open files for the mysql process using lsof command.

Increase the open files limit and run again.


回答 8

在尝试了其中一些解决方案但没有成功之后,这对我有用:

  1. 重启系统
  2. mysql.server启动
  3. 成功!

After attempting a few of these solutions and not having any success, this is what worked for me:

  1. Restart system
  2. mysql.server start
  3. Success!

回答 9

这可能是以下问题之一。

  1. 错误的mysql锁定。解决方案:您必须通过以下方式找出正确的mysql套接字,

mysqladmin -p变量| grep套接字

然后将其放在您的数据库连接代码中:

pymysql.connect(db='db', user='user', passwd='pwd', unix_socket="/tmp/mysql.sock")

/tmp/mysql.sock是grep返回的

2.不正确的mysql端口解决方案:您必须找出正确的mysql端口:

mysqladmin -p variables | grep port

然后在您的代码中:

pymysql.connect(db='db', user='user', passwd='pwd', host='localhost', port=3306)

3306是从grep返回的端口

我认为第一种选择可以解决您的问题。

This may be one of following problems.

  1. Incorrect mysql lock. solution: You have to find out the correct mysql socket by,

mysqladmin -p variables | grep socket

and then put it in your db connection code:

pymysql.connect(db='db', user='user', passwd='pwd', unix_socket="/tmp/mysql.sock")

/tmp/mysql.sock is the returned from grep

2.Incorrect mysql port solution: You have to find out the correct mysql port:

mysqladmin -p variables | grep port

and then in your code:

pymysql.connect(db='db', user='user', passwd='pwd', host='localhost', port=3306)

3306 is the port returned from the grep

I think first option will resolve your problem.


回答 10

对于通过自制软件从5.7升级到8.0的用户,此错误很可能是由于升级未完成引起的。就我而言,mysql.server start出现以下错误:

错误!服务器退出而不更新PID文件

然后,我通过检查了日志文件cat /usr/local/var/mysql/YOURS.err | tail -n 50

InnoDB:不支持崩溃后升级。

如果您在同一条船上,请首先mysql@5.7通过自制程序安装,停止服务器,然后再次启动8.0系统。

brew install mysql@5.7

/usr/local/opt/mysql@5.7/bin/mysql.server start
/usr/local/opt/mysql@5.7/bin/mysql.server stop

然后,

mysql.server start

这将使您的MySQL(8.0)再次正常工作。

To those who upgraded from 5.7 to 8.0 via homebrew, this error is likely caused by the upgrade not being complete. In my case, mysql.server start got me the following error:

ERROR! The server quit without updating PID file

I then checked the log file via cat /usr/local/var/mysql/YOURS.err | tail -n 50, and found the following:

InnoDB: Upgrade after a crash is not supported.

If you are on the same boat, first install mysql@5.7 via homebrew, stop the server, and then start the 8.0 system again.

brew install mysql@5.7

/usr/local/opt/mysql@5.7/bin/mysql.server start
/usr/local/opt/mysql@5.7/bin/mysql.server stop

Then,

mysql.server start

This would get your MySQL (8.0) working again.


回答 11

我想我前一段时间也看到了相同的行为,但记不清细节了。
在我们的案例中,问题在于测试运行程序相对于所需的第一次数据库交互(例如,通过在settings.py或某个__init__.py中导入模块)初始化数据库连接的时刻。我将尝试挖掘更多信息,但这可能已经为您解决了。

I think i saw this same behavior some time ago, but can’t remember the details.
In our case, the problem was the moment the testrunner initialises database connections relative to first database interaction required, for instance, by import of a module in settings.py or some __init__.py. I’ll try to digg up some more info, but this might already ring a bell for your case.


回答 12

确保您的/ etc / hosts包含127.0.0.1 localhost其中,并且应该工作正常

Make sure your /etc/hosts has 127.0.0.1 localhost in it and it should work fine


回答 13

我对此有两个偷偷摸摸的猜想

主题1

调查无法访问/tmp/mysql.sock文件的可能性。设置MySQL数据库时,通常将套接字文件站点放入/var/lib/mysql。如果您以身份登录到mysql root@localhost,则您的OS会话需要访问该/tmp文件夹。确保/tmp在操作系统中具有正确的访问权限。另外,请确保sudo用户始终可以读取文件/tmp

主题2

通过访问MySQL 127.0.0.1如果您不注意,则可能会引起一些混乱。怎么样?

在命令行中,如果使用来连接到MySQL 127.0.0.1,则可能需要指定TCP / IP协议。

mysql -uroot -p -h127.0.0.1 --protocol=tcp

或尝试使用DNS名称

mysql -uroot -p -hDNSNAME

这将绕过以身份登录root@localhost,但请确保已root@'127.0.0.1'定义。

下次连接到MySQL时,运行以下命令:

SELECT USER(),CURRENT_USER();

这给你什么?

如果这些函数返回的值相同,则说明您正在按预期进行连接和身份验证。如果值不同,则可能需要创建相应的user root@127.0.0.1

I have two sneaky conjectures on this one

CONJECTURE #1

Look into the possibility of not being able to access the /tmp/mysql.sock file. When I setup MySQL databases, I normally let the socket file site in /var/lib/mysql. If you login to mysql as root@localhost, your OS session needs access to the /tmp folder. Make sure /tmp has the correct access rights in the OS. Also, make sure the sudo user can always read file in /tmp.

CONJECTURE #2

Accessing mysql via 127.0.0.1 can cause some confusion if you are not paying attention. How?

From the command line, if you connect to MySQL with 127.0.0.1, you may need to specify the TCP/IP protocol.

mysql -uroot -p -h127.0.0.1 --protocol=tcp

or try the DNS name

mysql -uroot -p -hDNSNAME

This will bypass logging in as root@localhost, but make sure you have root@'127.0.0.1' defined.

Next time you connect to MySQL, run this:

SELECT USER(),CURRENT_USER();

What does this give you?

  • USER() reports how you attempted to authenticate in MySQL
  • CURRENT_USER() reports how you were allowed to authenticate in MySQL

If these functions return with the same values, then you are connecting and authenticating as expected. If the values are different, you may need to create the corresponding user root@127.0.0.1.


回答 14

遇到了同样的问题。原来mysqld已经停止运行了(我在Mac OSX上)。我重新启动它,错误消失了。

我发现mysqld该链接在很大程度上没有运行: http //dev.mysql.com/doc/refman/5.6/en/can-not-connect-to-server.html

注意第一个技巧!

Had this same problem. Turned out mysqld had stopped running (I’m on Mac OSX). I restarted it and the error went away.

I figured out that mysqld was not running largely because of this link: http://dev.mysql.com/doc/refman/5.6/en/can-not-connect-to-server.html

Notice the first tip!


回答 15

如果出现如下错误:

django.db.utils.OperationalError: (2002, "Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)")

然后只要找到您的mysqld.sock文件位置并将其添加到“主机”即可。

就像我在Linux上使用xampp一样,所以我的mysqld.sock文件在另一个位置。因此它不适用于“ /var/run/mysqld/mysqld.sock

DATABASES = {

    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'asd',
        'USER' : 'root',
        'PASSWORD' : '',
        'HOST' : '/opt/lampp/var/mysql/mysql.sock',
        'PORT' : ''
    }
}

if you get an error like below :

django.db.utils.OperationalError: (2002, "Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)")

Then just find your mysqld.sock file location and add it to “HOST”.

Like i am using xampp on linux so my mysqld.sock file is in another location. so it is not working for ‘/var/run/mysqld/mysqld.sock

DATABASES = {

    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'asd',
        'USER' : 'root',
        'PASSWORD' : '',
        'HOST' : '/opt/lampp/var/mysql/mysql.sock',
        'PORT' : ''
    }
}

回答 16

如果my.cnf中的设置不正确,请检查您的mysql是否未达到最大连接数,或是否未处于某种引导循环中。

使用ps aux | grep mysql检查PID是否正在更改。

Check that your mysql has not reached maximum connections, or is not in some sort of booting loop as happens quite often if the settings are incorrect in my.cnf.

Use ps aux | grep mysql to check if the PID is changing.


回答 17

在网上环顾四周时间太长,无济于事。尝试从命令行键入mysql提示符后,我继续收到此消息:

错误2002(HY000):无法通过套接字’/tmp/mysql.sock’连接到本地MySQL服务器(2)

这是因为我的本地mysql服务器不再运行。为了重新启动服务器,我导航到

shell> cd /user/local/bin

我的mysql.server所在的位置。在这里,只需键入:

shell> mysql.server start

这将重新启动本地mysql服务器。

从那里您可以根据需要重置root密码。

mysql> UPDATE mysql.user SET Password=PASSWORD('MyNewPass')
->                   WHERE User='root';
mysql> FLUSH PRIVILEGES;

Looked around online too long not to contribute. After trying to type in the mysql prompt from the command line, I was continuing to receive this message:

ERROR 2002 (HY000): Can’t connect to local MySQL server through socket ‘/tmp/mysql.sock’ (2)

This was due to the fact that my local mysql server was no longer running. In order to restart the server, I navigated to

shell> cd /user/local/bin

where my mysql.server was located. From here, simply type:

shell> mysql.server start

This will relaunch the local mysql server.

From there you can reset the root password if need be..

mysql> UPDATE mysql.user SET Password=PASSWORD('MyNewPass')
->                   WHERE User='root';
mysql> FLUSH PRIVILEGES;

回答 18

我必须首先找到所有进程ID,以杀死mysql的所有实例:

ps aux | grep MySQL的

然后杀死他们:

杀死-9 {pid}

然后:

mysql.server启动

为我工作。

I had to kill off all instances of mysql by first finding all the process IDs:

ps aux | grep mysql

And then killing them off:

kill -9 {pid}

Then:

mysql.server start

Worked for me.


回答 19

套接字位于/ tmp中。在Unix系统上,由于/ tmp上的模式和所有权,这可能会引起一些问题。但是,只要您告诉我们可以正常使用mysql连接,我想这对您的系统来说不是问题。首要检查应该是将mysql.sock放置在一个更中性的目录中。

问题“随机”发生(或并非每次都发生),这一事实让我认为这可能是服务器问题。

  • 您的/ tmp是位于标准磁盘上,还是位于奇异的挂载上(如RAM中)?

  • / tmp是空的吗?

  • iotop遇到问题时,是否表明您有问题?

The socket is located in /tmp. On Unix system, due to modes & ownerships on /tmp, this could cause some problem. But, as long as you tell us that you CAN use your mysql connexion normally, I guess it is not a problem on your system. A primal check should be to relocate mysql.sock in a more neutral directory.

The fact that the problem occurs “randomly” (or not every time) let me think that it could be a server problem.

  • Is your /tmp located on a standard disk, or on an exotic mount (like in the RAM) ?

  • Is your /tmp empty ?

  • Does iotopshow you something wrong when you encounter the problem ?


回答 20

在“管理数据库连接”对话框中配置数据库连接。选择“标准(TCP / IP)”作为连接方法。

有关更多详细信息,请参见此页面 http://dev.mysql.com/doc/workbench/en/wb-manage-db-connections.html

根据另一页,即使您指定localhost,也会使用套接字文件。

如果您未指定主机名或指定特殊主机名localhost,则使用Unix套接字文件。

它还显示了如何通过运行以下命令来检查服务器:

如果mysqld进程正在运行,则可以通过尝试以下命令进行检查。您的设置中的端口号或Unix套接字文件名可能不同。host_ip代表运行服务器的计算机的IP地址。

shell> mysqladmin version 
shell> mysqladmin variables 
shell> mysqladmin -h `hostname` version variables 
shell> mysqladmin -h `hostname` --port=3306 version 
shell> mysqladmin -h host_ip version 
shell> mysqladmin --protocol=SOCKET --socket=/tmp/mysql.sock version

Configure your DB connection in the ‘Manage DB Connections dialog. Select ‘Standard (TCP/IP)’ as connection method.

See this page for more details http://dev.mysql.com/doc/workbench/en/wb-manage-db-connections.html

According to this other page a socket file is used even if you specify localhost.

A Unix socket file is used if you do not specify a host name or if you specify the special host name localhost.

It also shows how to check on your server by running these commands:

If a mysqld process is running, you can check it by trying the following commands. The port number or Unix socket file name might be different in your setup. host_ip represents the IP address of the machine where the server is running.

shell> mysqladmin version 
shell> mysqladmin variables 
shell> mysqladmin -h `hostname` version variables 
shell> mysqladmin -h `hostname` --port=3306 version 
shell> mysqladmin -h host_ip version 
shell> mysqladmin --protocol=SOCKET --socket=/tmp/mysql.sock version

回答 21

在ubuntu14.04中,您可以执行此操作以解决此问题。

zack@zack:~/pycodes/python-scraping/chapter5$ **mysqladmin -p variables|grep socket**
Enter password: 
| socket                                            | ***/var/run/mysqld/mysqld.sock***                                                                                            |
zack@zack:~/pycodes/python-scraping/chapter5$***ln -s  /var/run/mysqld/mysqld.sock /tmp/mysql.sock***
zack@zack:~/pycodes/python-scraping/chapter5$ ll /tmp/mysql.sock 
lrwxrwxrwx 1 zack zack 27 11 29 13:08 /tmp/mysql.sock -> /var/run/mysqld/mysqld.sock=

in ubuntu14.04 you can do this to slove this problem.

zack@zack:~/pycodes/python-scraping/chapter5$ **mysqladmin -p variables|grep socket**
Enter password: 
| socket                                            | ***/var/run/mysqld/mysqld.sock***                                                                                            |
zack@zack:~/pycodes/python-scraping/chapter5$***ln -s  /var/run/mysqld/mysqld.sock /tmp/mysql.sock***
zack@zack:~/pycodes/python-scraping/chapter5$ ll /tmp/mysql.sock 
lrwxrwxrwx 1 zack zack 27 11月 29 13:08 /tmp/mysql.sock -> /var/run/mysqld/mysqld.sock=

回答 22

对我来说,我确定mysqld已启动,并且命令行mysql可以正常工作。但是httpd服务器显示了问题(无法通过套接字连接到mysql)。

我使用mysqld_safe&启动了该服务。

最后,我发现当我使用服务mysqld start启动mysqld服务时,出现了问题(selinux权限问题),当我修复了selinux问题,并使用“ service mysqld start”启动mysqld时,httpd连接问题消失了。但是,当我使用mysqld_safe&启动mysqld时,可以使用mysqld。(mysql客户端可以正常工作)。但是与httpd连接时仍然存在问题。

For me, I’m sure mysqld is started, and command line mysql can work properly. But the httpd server show the issue(can’t connect to mysql through socket).

I started the service with mysqld_safe&.

finally, I found when I start the mysqld service with service mysqld start, there are issues(selinux permission issue), and when I fix the selinux issue, and start the mysqld with “service mysqld start”, the httpd connection issue disappear. But when I start the mysqld with mysqld_safe&, mysqld can be worked. (mysql client can work properly). But there are still issue when connect with httpd.


回答 23

如果与套接字相关,请阅读此文件

/etc/mysql/my.cnf

并查看什么是标准插座位置。就像这样的一行:

socket = /var/run/mysqld/mysqld.sock

现在为您的shell创建一个别名,例如:

alias mysql="mysql --socket=/var/run/mysqld/mysqld.sock"

这样,您就不需要root特权。

If it’s socket related read this file

/etc/mysql/my.cnf

and see what is the standard socket location. It’s a line like:

socket = /var/run/mysqld/mysqld.sock

now create an alias for your shell like:

alias mysql="mysql --socket=/var/run/mysqld/mysqld.sock"

This way you don’t need root privileges.


回答 24

只需尝试运行 mysqld

在Mac上,这对我不起作用。如果不起作用/usr/local/var/mysql/<your_name>.err,请尝试查看详细的错误日志。

Simply try to run mysqld.

This was what was not working for me on mac. If it doesn’t work try go to /usr/local/var/mysql/<your_name>.err to see detailed error logs.


回答 25

# shell script ,ignore the first 
$ $(dirname `which mysql`)\/mysql.server start

可能会有帮助。

# shell script ,ignore the first 
$ $(dirname `which mysql`)\/mysql.server start

May be helpful.


回答 26

使用通过Homebrew安装的适用于MySQL 8.0.19的MacOS Mojave 10.14.6

  • sudo find / -name my.cnf
  • 文件位于 /usr/local/etc/my.cnf

工作了一段时间,然后最终返回了错误。卸载MySQL的Homebrew版本并直接从.dmg文件安装此处

从那时起,我们一直很高兴地建立联系。

Using MacOS Mojave 10.14.6 for MySQL 8.0.19 installed via Homebrew

  • Ran sudo find / -name my.cnf
  • File found at /usr/local/etc/my.cnf

Worked for a time then eventually the error returned. Uninstalled the Homebrew version of MySQL and installed the .dmg file directly from here

Happily connecting since then.


回答 27

就我而言,帮助编辑文件/etc/mysql/mysql.conf.d/mysqld.cnf并替换以下行:

socket      = /var/run/mysqld/mysqld.sock

socket      = /tmp/mysql.sock

然后,我重新启动服务器,它工作正常。有趣的是,如果我将线路恢复原状并重新启动,它仍然可以工作。

In my case what helped was to edit the file /etc/mysql/mysql.conf.d/mysqld.cnfand replace the line:

socket      = /var/run/mysqld/mysqld.sock

with

socket      = /tmp/mysql.sock

Then I restarted the server and it worked fine. The funny thing is that if I put back the line as it was before and restarted it still worked..


回答 28

我最近也遇到过类似的问题。经历了许多答案。我按照以下步骤操作。

  1. 更改/etc/my.cnf中的套接字路径(因为我反复出现/tmp/mysql.sock错误)引用以更改套接字路径
  2. 运行mysqld_safe以重新启动服务器,因为这是在出现错误时重新启动的推荐方法。引用mysqld_safe

I had faced similar problem recently. Went through many answers. I got it working by following steps.

  1. change the socket path in /etc/my.cnf (as i was repeatedly getting error with /tmp/mysql.sock ) reference to change the socket path
  2. run mysqld_safe to restart the server as it is the recommended way to restart in case of errors. reference to mysqld_safe

回答 29

对我而言,mysql服务器未运行。因此,我通过启动了mysql服务器

mysql.server start

然后

mysql_secure_installation

保护服务器,现在我可以通过访问MySQL服务器

sudo mysql -uroot -p

For me, the mysql server was not running. So, i started the mysql server through

mysql.server start

then

mysql_secure_installation

to secure the server and now I can visit the MySQL server through

sudo mysql -uroot -p


字典与对象-哪个更有效,为什么?

问题:字典与对象-哪个更有效,为什么?

在内存使用和CPU消耗方面,在Python中更有效的方法是-字典还是对象?

背景: 我必须将大量数据加载到Python中。我创建了一个只是字段容器的对象。创建4M实例并将其放入字典中大约需要10分钟和约6GB的内存。字典准备就绪后,只需眨眼即可访问。

示例: 为了检查性能,我编写了两个简单的程序,它们执行相同的操作-一个使用对象,另一个使用字典:

对象(执行时间〜18sec):

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

字典(执行时间约12秒):

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

问题: 我做错什么了吗?字典比对象快?如果确实字典表现更好,有人可以解释为什么吗?

What is more efficient in Python in terms of memory usage and CPU consumption – Dictionary or Object?

Background: I have to load huge amount of data into Python. I created an object that is just a field container. Creating 4M instances and putting them into a dictionary took about 10 minutes and ~6GB of memory. After dictionary is ready, accessing it is a blink of an eye.

Example: To check the performance I wrote two simple programs that do the same – one is using objects, other dictionary:

Object (execution time ~18sec):

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

Dictionary (execution time ~12sec):

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

Question: Am I doing something wrong or dictionary is just faster than object? If indeed dictionary performs better, can somebody explain why?


回答 0

您是否尝试过使用__slots__

文档中

默认情况下,新旧类的实例都有用于属性存储的字典。这浪费了具有很少实例变量的对象的空间。创建大量实例时,空间消耗会变得非常大。

可以通过__slots__在新式类定义中进行定义来覆盖默认值。该__slots__声明采用一系列实例变量,并且在每个实例中仅保留足够的空间来容纳每个变量的值。因为__dict__未为每个实例创建空间,所以节省了空间。

那么,这样既节省时间又节省内存吗?

比较计算机上的三种方法:

test_slots.py:

class Obj(object):
  __slots__ = ('i', 'l')
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_obj.py:

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_dict.py:

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

test_namedtuple.py(在2.6中受支持):

import collections

Obj = collections.namedtuple('Obj', 'i l')

all = {}
for i in range(1000000):
  all[i] = Obj(i, [])

运行基准测试(使用CPython 2.5):

$ lshw | grep product | head -n 1
          product: Intel(R) Pentium(R) M processor 1.60GHz
$ python --version
Python 2.5
$ time python test_obj.py && time python test_dict.py && time python test_slots.py 

real    0m27.398s (using 'normal' object)
real    0m16.747s (using __dict__)
real    0m11.777s (using __slots__)

使用CPython 2.6.2,包括命名的元组测试:

$ python --version
Python 2.6.2
$ time python test_obj.py && time python test_dict.py && time python test_slots.py && time python test_namedtuple.py 

real    0m27.197s (using 'normal' object)
real    0m17.657s (using __dict__)
real    0m12.249s (using __slots__)
real    0m12.262s (using namedtuple)

因此,是的(不是很意外),使用__slots__是一种性能优化。使用命名元组的性能与相似__slots__

Have you tried using __slots__?

From the documentation:

By default, instances of both old and new-style classes have a dictionary for attribute storage. This wastes space for objects having very few instance variables. The space consumption can become acute when creating large numbers of instances.

The default can be overridden by defining __slots__ in a new-style class definition. The __slots__ declaration takes a sequence of instance variables and reserves just enough space in each instance to hold a value for each variable. Space is saved because __dict__ is not created for each instance.

So does this save time as well as memory?

Comparing the three approaches on my computer:

test_slots.py:

class Obj(object):
  __slots__ = ('i', 'l')
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_obj.py:

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_dict.py:

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

test_namedtuple.py (supported in 2.6):

import collections

Obj = collections.namedtuple('Obj', 'i l')

all = {}
for i in range(1000000):
  all[i] = Obj(i, [])

Run benchmark (using CPython 2.5):

$ lshw | grep product | head -n 1
          product: Intel(R) Pentium(R) M processor 1.60GHz
$ python --version
Python 2.5
$ time python test_obj.py && time python test_dict.py && time python test_slots.py 

real    0m27.398s (using 'normal' object)
real    0m16.747s (using __dict__)
real    0m11.777s (using __slots__)

Using CPython 2.6.2, including the named tuple test:

$ python --version
Python 2.6.2
$ time python test_obj.py && time python test_dict.py && time python test_slots.py && time python test_namedtuple.py 

real    0m27.197s (using 'normal' object)
real    0m17.657s (using __dict__)
real    0m12.249s (using __slots__)
real    0m12.262s (using namedtuple)

So yes (not really a surprise), using __slots__ is a performance optimization. Using a named tuple has similar performance to __slots__.


回答 1

对象中的属性访问使用幕后的字典访问-因此,使用属性访问会增加额外的开销。另外,在对象情况下,由于例如额外的内存分配和代码执行(例如__init__方法的执行),您将承担额外的开销。

在您的代码中,如果o是一个Obj实例,o.attr则等效于o.__dict__['attr']少量的额外开销。

Attribute access in an object uses dictionary access behind the scenes – so by using attribute access you are adding extra overhead. Plus in the object case, you are incurring additional overhead because of e.g. additional memory allocations and code execution (e.g. of the __init__ method).

In your code, if o is an Obj instance, o.attr is equivalent to o.__dict__['attr'] with a small amount of extra overhead.


回答 2

您是否考虑过使用namedtuple?(python 2.4 / 2.5的链接

这是表示结构化数据的新标准方式,可为您提供元组的性能和类的便利性。

与字典相比,它的唯一缺点是(如元组)它不具有创建后更改属性的能力。

Have you considered using a namedtuple? (link for python 2.4/2.5)

It’s the new standard way of representing structured data that gives you the performance of a tuple and the convenience of a class.

It’s only downside compared with dictionaries is that (like tuples) it doesn’t give you the ability to change attributes after creation.


回答 3

这是python 3.6.1的@hughdbrown答案的副本,我将计数增加了5倍,并在每次运行结束时添加了一些代码来测试python进程的内存占用量。

在不愿接受投票的人之前,请注意,这种计算对象大小的方法并不准确。

from datetime import datetime
import os
import psutil

process = psutil.Process(os.getpid())


ITER_COUNT = 1000 * 1000 * 5

RESULT=None

def makeL(i):
    # Use this line to negate the effect of the strings on the test 
    # return "Python is smart and will only create one string with this line"

    # Use this if you want to see the difference with 5 million unique strings
    return "This is a sample string %s" % i

def timeit(method):
    def timed(*args, **kw):
        global RESULT
        s = datetime.now()
        RESULT = method(*args, **kw)
        e = datetime.now()

        sizeMb = process.memory_info().rss / 1024 / 1024
        sizeMbStr = "{0:,}".format(round(sizeMb, 2))

        print('Time Taken = %s, \t%s, \tSize = %s' % (e - s, method.__name__, sizeMbStr))

    return timed

class Obj(object):
    def __init__(self, i):
       self.i = i
       self.l = makeL(i)

class SlotObj(object):
    __slots__ = ('i', 'l')
    def __init__(self, i):
       self.i = i
       self.l = makeL(i)

from collections import namedtuple
NT = namedtuple("NT", ["i", 'l'])

@timeit
def profile_dict_of_nt():
    return [NT(i=i, l=makeL(i)) for i in range(ITER_COUNT)]

@timeit
def profile_list_of_nt():
    return dict((i, NT(i=i, l=makeL(i))) for i in range(ITER_COUNT))

@timeit
def profile_dict_of_dict():
    return dict((i, {'i': i, 'l': makeL(i)}) for i in range(ITER_COUNT))

@timeit
def profile_list_of_dict():
    return [{'i': i, 'l': makeL(i)} for i in range(ITER_COUNT)]

@timeit
def profile_dict_of_obj():
    return dict((i, Obj(i)) for i in range(ITER_COUNT))

@timeit
def profile_list_of_obj():
    return [Obj(i) for i in range(ITER_COUNT)]

@timeit
def profile_dict_of_slot():
    return dict((i, SlotObj(i)) for i in range(ITER_COUNT))

@timeit
def profile_list_of_slot():
    return [SlotObj(i) for i in range(ITER_COUNT)]

profile_dict_of_nt()
profile_list_of_nt()
profile_dict_of_dict()
profile_list_of_dict()
profile_dict_of_obj()
profile_list_of_obj()
profile_dict_of_slot()
profile_list_of_slot()

这些是我的结果

Time Taken = 0:00:07.018720,    provile_dict_of_nt,     Size = 951.83
Time Taken = 0:00:07.716197,    provile_list_of_nt,     Size = 1,084.75
Time Taken = 0:00:03.237139,    profile_dict_of_dict,   Size = 1,926.29
Time Taken = 0:00:02.770469,    profile_list_of_dict,   Size = 1,778.58
Time Taken = 0:00:07.961045,    profile_dict_of_obj,    Size = 1,537.64
Time Taken = 0:00:05.899573,    profile_list_of_obj,    Size = 1,458.05
Time Taken = 0:00:06.567684,    profile_dict_of_slot,   Size = 1,035.65
Time Taken = 0:00:04.925101,    profile_list_of_slot,   Size = 887.49

我的结论是:

  1. 插槽具有最佳的内存占用,并且速度合理。
  2. dict是最快的,但使用最多的内存。

Here is a copy of @hughdbrown answer for python 3.6.1, I’ve made the count 5x larger and added some code to test the memory footprint of the python process at the end of each run.

Before the downvoters have at it, Be advised that this method of counting the size of objects is not accurate.

from datetime import datetime
import os
import psutil

process = psutil.Process(os.getpid())


ITER_COUNT = 1000 * 1000 * 5

RESULT=None

def makeL(i):
    # Use this line to negate the effect of the strings on the test 
    # return "Python is smart and will only create one string with this line"

    # Use this if you want to see the difference with 5 million unique strings
    return "This is a sample string %s" % i

def timeit(method):
    def timed(*args, **kw):
        global RESULT
        s = datetime.now()
        RESULT = method(*args, **kw)
        e = datetime.now()

        sizeMb = process.memory_info().rss / 1024 / 1024
        sizeMbStr = "{0:,}".format(round(sizeMb, 2))

        print('Time Taken = %s, \t%s, \tSize = %s' % (e - s, method.__name__, sizeMbStr))

    return timed

class Obj(object):
    def __init__(self, i):
       self.i = i
       self.l = makeL(i)

class SlotObj(object):
    __slots__ = ('i', 'l')
    def __init__(self, i):
       self.i = i
       self.l = makeL(i)

from collections import namedtuple
NT = namedtuple("NT", ["i", 'l'])

@timeit
def profile_dict_of_nt():
    return [NT(i=i, l=makeL(i)) for i in range(ITER_COUNT)]

@timeit
def profile_list_of_nt():
    return dict((i, NT(i=i, l=makeL(i))) for i in range(ITER_COUNT))

@timeit
def profile_dict_of_dict():
    return dict((i, {'i': i, 'l': makeL(i)}) for i in range(ITER_COUNT))

@timeit
def profile_list_of_dict():
    return [{'i': i, 'l': makeL(i)} for i in range(ITER_COUNT)]

@timeit
def profile_dict_of_obj():
    return dict((i, Obj(i)) for i in range(ITER_COUNT))

@timeit
def profile_list_of_obj():
    return [Obj(i) for i in range(ITER_COUNT)]

@timeit
def profile_dict_of_slot():
    return dict((i, SlotObj(i)) for i in range(ITER_COUNT))

@timeit
def profile_list_of_slot():
    return [SlotObj(i) for i in range(ITER_COUNT)]

profile_dict_of_nt()
profile_list_of_nt()
profile_dict_of_dict()
profile_list_of_dict()
profile_dict_of_obj()
profile_list_of_obj()
profile_dict_of_slot()
profile_list_of_slot()

And these are my results

Time Taken = 0:00:07.018720,    provile_dict_of_nt,     Size = 951.83
Time Taken = 0:00:07.716197,    provile_list_of_nt,     Size = 1,084.75
Time Taken = 0:00:03.237139,    profile_dict_of_dict,   Size = 1,926.29
Time Taken = 0:00:02.770469,    profile_list_of_dict,   Size = 1,778.58
Time Taken = 0:00:07.961045,    profile_dict_of_obj,    Size = 1,537.64
Time Taken = 0:00:05.899573,    profile_list_of_obj,    Size = 1,458.05
Time Taken = 0:00:06.567684,    profile_dict_of_slot,   Size = 1,035.65
Time Taken = 0:00:04.925101,    profile_list_of_slot,   Size = 887.49

My conclusion is:

  1. Slots have the best memory footprint and are reasonable on speed.
  2. dicts are the fastest, but use the most memory.

回答 4

from datetime import datetime

ITER_COUNT = 1000 * 1000

def timeit(method):
    def timed(*args, **kw):
        s = datetime.now()
        result = method(*args, **kw)
        e = datetime.now()

        print method.__name__, '(%r, %r)' % (args, kw), e - s
        return result
    return timed

class Obj(object):
    def __init__(self, i):
       self.i = i
       self.l = []

class SlotObj(object):
    __slots__ = ('i', 'l')
    def __init__(self, i):
       self.i = i
       self.l = []

@timeit
def profile_dict_of_dict():
    return dict((i, {'i': i, 'l': []}) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_dict():
    return [{'i': i, 'l': []} for i in xrange(ITER_COUNT)]

@timeit
def profile_dict_of_obj():
    return dict((i, Obj(i)) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_obj():
    return [Obj(i) for i in xrange(ITER_COUNT)]

@timeit
def profile_dict_of_slotobj():
    return dict((i, SlotObj(i)) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_slotobj():
    return [SlotObj(i) for i in xrange(ITER_COUNT)]

if __name__ == '__main__':
    profile_dict_of_dict()
    profile_list_of_dict()
    profile_dict_of_obj()
    profile_list_of_obj()
    profile_dict_of_slotobj()
    profile_list_of_slotobj()

结果:

hbrown@hbrown-lpt:~$ python ~/Dropbox/src/StackOverflow/1336791.py 
profile_dict_of_dict ((), {}) 0:00:08.228094
profile_list_of_dict ((), {}) 0:00:06.040870
profile_dict_of_obj ((), {}) 0:00:11.481681
profile_list_of_obj ((), {}) 0:00:10.893125
profile_dict_of_slotobj ((), {}) 0:00:06.381897
profile_list_of_slotobj ((), {}) 0:00:05.860749
from datetime import datetime

ITER_COUNT = 1000 * 1000

def timeit(method):
    def timed(*args, **kw):
        s = datetime.now()
        result = method(*args, **kw)
        e = datetime.now()

        print method.__name__, '(%r, %r)' % (args, kw), e - s
        return result
    return timed

class Obj(object):
    def __init__(self, i):
       self.i = i
       self.l = []

class SlotObj(object):
    __slots__ = ('i', 'l')
    def __init__(self, i):
       self.i = i
       self.l = []

@timeit
def profile_dict_of_dict():
    return dict((i, {'i': i, 'l': []}) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_dict():
    return [{'i': i, 'l': []} for i in xrange(ITER_COUNT)]

@timeit
def profile_dict_of_obj():
    return dict((i, Obj(i)) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_obj():
    return [Obj(i) for i in xrange(ITER_COUNT)]

@timeit
def profile_dict_of_slotobj():
    return dict((i, SlotObj(i)) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_slotobj():
    return [SlotObj(i) for i in xrange(ITER_COUNT)]

if __name__ == '__main__':
    profile_dict_of_dict()
    profile_list_of_dict()
    profile_dict_of_obj()
    profile_list_of_obj()
    profile_dict_of_slotobj()
    profile_list_of_slotobj()

Results:

hbrown@hbrown-lpt:~$ python ~/Dropbox/src/StackOverflow/1336791.py 
profile_dict_of_dict ((), {}) 0:00:08.228094
profile_list_of_dict ((), {}) 0:00:06.040870
profile_dict_of_obj ((), {}) 0:00:11.481681
profile_list_of_obj ((), {}) 0:00:10.893125
profile_dict_of_slotobj ((), {}) 0:00:06.381897
profile_list_of_slotobj ((), {}) 0:00:05.860749

回答 5

没问题。
您有没有其他属性的数据(没有方法,没有任何东西)。因此,您有一个数据容器(在本例中为字典)。

我通常更喜欢在数据建模方面进行思考。如果存在巨大的性能问题,那么我可以放弃抽象中的某些内容,但是只有非常好的理由。
编程是关于管理复杂性的,维护正确的抽象常常是实现这种结果的最有用的方法之一。

关于物体变慢的原因,我认为您的测量不正确。
您在for循环内执行的分配太少,因此看到的实例化dict(本机对象)和“ custom”对象所需的时间不同。尽管从语言角度看它们是相同的,但它们的实现却大不相同。
之后,两者的分配时间应几乎相同,因为最终成员将保留在词典中。

There is no question.
You have data, with no other attributes (no methods, nothing). Hence you have a data container (in this case, a dictionary).

I usually prefer to think in terms of data modeling. If there is some huge performance issue, then I can give up something in the abstraction, but only with very good reasons.
Programming is all about managing complexity, and the maintaining the correct abstraction is very often one of the most useful way to achieve such result.

About the reasons an object is slower, I think your measurement is not correct.
You are performing too little assignments inside the for loop, and therefore what you see there is the different time necessary to instantiate a dict (intrinsic object) and a “custom” object. Although from the language perspective they are the same, they have quite a different implementation.
After that, the assignment time should be almost the same for both, as in the end members are maintained inside a dictionary.


回答 6

如果数据结构不应包含参考周期,则还有另一种减少内存使用的方法。

让我们比较两个类:

class DataItem:
    __slots__ = ('name', 'age', 'address')
    def __init__(self, name, age, address):
        self.name = name
        self.age = age
        self.address = address

$ pip install recordclass

>>> from recordclass import structclass
>>> DataItem2 = structclass('DataItem', 'name age address')
>>> inst = DataItem('Mike', 10, 'Cherry Street 15')
>>> inst2 = DataItem2('Mike', 10, 'Cherry Street 15')
>>> print(inst2)
>>> print(sys.getsizeof(inst), sys.getsizeof(inst2))
DataItem(name='Mike', age=10, address='Cherry Street 15')
64 40

由于structclass基于类的类不支持循环垃圾收集,在这种情况下不需要,因此成为可能。

__slots__基于类的类相比,还有一个优点:您可以添加额外的属性:

>>> DataItem3 = structclass('DataItem', 'name age address', usedict=True)
>>> inst3 = DataItem3('Mike', 10, 'Cherry Street 15')
>>> inst3.hobby = ['drawing', 'singing']
>>> print(inst3)
>>> print(sizeof(inst3), 'has dict:',  bool(inst3.__dict__))
DataItem(name='Mike', age=10, address='Cherry Street 15', **{'hobby': ['drawing', 'singing']})
48 has dict: True

There is yet another way to reduce memory usage if data structure isn’t supposed to contain reference cycles.

Let’s compare two classes:

class DataItem:
    __slots__ = ('name', 'age', 'address')
    def __init__(self, name, age, address):
        self.name = name
        self.age = age
        self.address = address

and

$ pip install recordclass

>>> from recordclass import structclass
>>> DataItem2 = structclass('DataItem', 'name age address')
>>> inst = DataItem('Mike', 10, 'Cherry Street 15')
>>> inst2 = DataItem2('Mike', 10, 'Cherry Street 15')
>>> print(inst2)
>>> print(sys.getsizeof(inst), sys.getsizeof(inst2))
DataItem(name='Mike', age=10, address='Cherry Street 15')
64 40

It became possible since structclass-based classes doesn’t support cyclic garbage collection, which is not needed in such cases.

There is also one advantage over __slots__-based class: you are able to add extra attributes:

>>> DataItem3 = structclass('DataItem', 'name age address', usedict=True)
>>> inst3 = DataItem3('Mike', 10, 'Cherry Street 15')
>>> inst3.hobby = ['drawing', 'singing']
>>> print(inst3)
>>> print(sizeof(inst3), 'has dict:',  bool(inst3.__dict__))
DataItem(name='Mike', age=10, address='Cherry Street 15', **{'hobby': ['drawing', 'singing']})
48 has dict: True

回答 7

这是我对@ Jarrod-Chesney非常好的脚本的测试运行。为了进行比较,我还针对python2运行了它,将“ range”替换为“ xrange”。

出于好奇,我还使用OrderedDict(ordict)添加了类似的测试以进行比较。

Python 3.6.9:

Time Taken = 0:00:04.971369,    profile_dict_of_nt,     Size = 944.27
Time Taken = 0:00:05.743104,    profile_list_of_nt,     Size = 1,066.93
Time Taken = 0:00:02.524507,    profile_dict_of_dict,   Size = 1,920.35
Time Taken = 0:00:02.123801,    profile_list_of_dict,   Size = 1,760.9
Time Taken = 0:00:05.374294,    profile_dict_of_obj,    Size = 1,532.12
Time Taken = 0:00:04.517245,    profile_list_of_obj,    Size = 1,441.04
Time Taken = 0:00:04.590298,    profile_dict_of_slot,   Size = 1,030.09
Time Taken = 0:00:04.197425,    profile_list_of_slot,   Size = 870.67

Time Taken = 0:00:08.833653,    profile_ordict_of_ordict, Size = 3,045.52
Time Taken = 0:00:11.539006,    profile_list_of_ordict, Size = 2,722.34
Time Taken = 0:00:06.428105,    profile_ordict_of_obj,  Size = 1,799.29
Time Taken = 0:00:05.559248,    profile_ordict_of_slot, Size = 1,257.75

Python 2.7.15+:

Time Taken = 0:00:05.193900,    profile_dict_of_nt,     Size = 906.0
Time Taken = 0:00:05.860978,    profile_list_of_nt,     Size = 1,177.0
Time Taken = 0:00:02.370905,    profile_dict_of_dict,   Size = 2,228.0
Time Taken = 0:00:02.100117,    profile_list_of_dict,   Size = 2,036.0
Time Taken = 0:00:08.353666,    profile_dict_of_obj,    Size = 2,493.0
Time Taken = 0:00:07.441747,    profile_list_of_obj,    Size = 2,337.0
Time Taken = 0:00:06.118018,    profile_dict_of_slot,   Size = 1,117.0
Time Taken = 0:00:04.654888,    profile_list_of_slot,   Size = 964.0

Time Taken = 0:00:59.576874,    profile_ordict_of_ordict, Size = 7,427.0
Time Taken = 0:10:25.679784,    profile_list_of_ordict, Size = 11,305.0
Time Taken = 0:05:47.289230,    profile_ordict_of_obj,  Size = 11,477.0
Time Taken = 0:00:51.485756,    profile_ordict_of_slot, Size = 11,193.0

因此,在两个主要版本上,@ Jarrod-Chesney的结论仍然看起来不错。

Here are my test runs of the very nice script of @Jarrod-Chesney. For comparison, I also run it against python2 with “range” replaced by “xrange”.

By curiosity, I also added similar tests with OrderedDict (ordict) for comparison.

Python 3.6.9:

Time Taken = 0:00:04.971369,    profile_dict_of_nt,     Size = 944.27
Time Taken = 0:00:05.743104,    profile_list_of_nt,     Size = 1,066.93
Time Taken = 0:00:02.524507,    profile_dict_of_dict,   Size = 1,920.35
Time Taken = 0:00:02.123801,    profile_list_of_dict,   Size = 1,760.9
Time Taken = 0:00:05.374294,    profile_dict_of_obj,    Size = 1,532.12
Time Taken = 0:00:04.517245,    profile_list_of_obj,    Size = 1,441.04
Time Taken = 0:00:04.590298,    profile_dict_of_slot,   Size = 1,030.09
Time Taken = 0:00:04.197425,    profile_list_of_slot,   Size = 870.67

Time Taken = 0:00:08.833653,    profile_ordict_of_ordict, Size = 3,045.52
Time Taken = 0:00:11.539006,    profile_list_of_ordict, Size = 2,722.34
Time Taken = 0:00:06.428105,    profile_ordict_of_obj,  Size = 1,799.29
Time Taken = 0:00:05.559248,    profile_ordict_of_slot, Size = 1,257.75

Python 2.7.15+:

Time Taken = 0:00:05.193900,    profile_dict_of_nt,     Size = 906.0
Time Taken = 0:00:05.860978,    profile_list_of_nt,     Size = 1,177.0
Time Taken = 0:00:02.370905,    profile_dict_of_dict,   Size = 2,228.0
Time Taken = 0:00:02.100117,    profile_list_of_dict,   Size = 2,036.0
Time Taken = 0:00:08.353666,    profile_dict_of_obj,    Size = 2,493.0
Time Taken = 0:00:07.441747,    profile_list_of_obj,    Size = 2,337.0
Time Taken = 0:00:06.118018,    profile_dict_of_slot,   Size = 1,117.0
Time Taken = 0:00:04.654888,    profile_list_of_slot,   Size = 964.0

Time Taken = 0:00:59.576874,    profile_ordict_of_ordict, Size = 7,427.0
Time Taken = 0:10:25.679784,    profile_list_of_ordict, Size = 11,305.0
Time Taken = 0:05:47.289230,    profile_ordict_of_obj,  Size = 11,477.0
Time Taken = 0:00:51.485756,    profile_ordict_of_slot, Size = 11,193.0

So, on both major versions, the conclusions of @Jarrod-Chesney are still looking good.


查找列表模式

问题:查找列表模式

给定项目列表,回想一下该模式列表是最常出现的项目。

我想知道如何创建一个函数,该函数可以找到列表的模式,但是如果列表不具有模式(例如,列表中的所有项目仅出现一次),则会显示一条消息。我想使此功能不导入任何功能。我正在尝试从头开始实现自己的功能。

Given a list of items, recall that the mode of the list is the item that occurs most often.

I would like to know how to create a function that can find the mode of a list but that displays a message if the list does not have a mode (e.g., all the items in the list only appear once). I want to make this function without importing any functions. I’m trying to make my own function from scratch.


回答 0

您可以使用max功能和键。看看使用’key’和lambda表达式的python max函数

max(set(lst), key=lst.count)

You can use the max function and a key. Have a look at python max function using ‘key’ and lambda expression.

max(set(lst), key=lst.count)

回答 1

您可以使用具有-esque函数Countercollections软件包中提供的mode

from collections import Counter
data = Counter(your_list_in_here)
data.most_common()   # Returns all unique items and their counts
data.most_common(1)  # Returns the highest occurring item

注意:Counter在python 2.7中是新的,并且在早期版本中不可用。

You can use the Counter supplied in the collections package which has a mode-esque function

from collections import Counter
data = Counter(your_list_in_here)
data.most_common()   # Returns all unique items and their counts
data.most_common(1)  # Returns the highest occurring item

Note: Counter is new in python 2.7 and is not available in earlier versions.


回答 2

Python 3.4包含了method statistics.mode,所以它很简单:

>>> from statistics import mode
>>> mode([1, 1, 2, 3, 3, 3, 3, 4])
 3

列表中可以有任何类型的元素,而不仅仅是数字:

>>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
 'red'

Python 3.4 includes the method statistics.mode, so it is straightforward:

>>> from statistics import mode
>>> mode([1, 1, 2, 3, 3, 3, 3, 4])
 3

You can have any type of elements in the list, not just numeric:

>>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
 'red'

回答 3

从一些统计软件(即SciPyMATLAB)中获取一个叶子,它们只会返回最小的最常见值,因此,如果两个值相等地频繁出现,则会返回其中的最小值。希望有一个例子可以帮助:

>>> from scipy.stats import mode

>>> mode([1, 2, 3, 4, 5])
(array([ 1.]), array([ 1.]))

>>> mode([1, 2, 2, 3, 3, 4, 5])
(array([ 2.]), array([ 2.]))

>>> mode([1, 2, 2, -3, -3, 4, 5])
(array([-3.]), array([ 2.]))

有什么原因导致您无法遵守该约定?

Taking a leaf from some statistics software, namely SciPy and MATLAB, these just return the smallest most common value, so if two values occur equally often, the smallest of these are returned. Hopefully an example will help:

>>> from scipy.stats import mode

>>> mode([1, 2, 3, 4, 5])
(array([ 1.]), array([ 1.]))

>>> mode([1, 2, 2, 3, 3, 4, 5])
(array([ 2.]), array([ 2.]))

>>> mode([1, 2, 2, -3, -3, 4, 5])
(array([-3.]), array([ 2.]))

Is there any reason why you can ‘t follow this convention?


回答 4

有许多简单的方法可以在Python中找到列表模式,例如:

import statistics
statistics.mode([1,2,3,3])
>>> 3

或者,您可以通过计数找到最大值

max(array, key = array.count)

这两种方法的问题在于它们不能在多种模式下使用。第一个返回错误,而第二个返回第一个模式。

为了找到集合的模式,您可以使用以下功能:

def mode(array):
    most = max(list(map(array.count, array)))
    return list(set(filter(lambda x: array.count(x) == most, array)))

There are many simple ways to find the mode of a list in Python such as:

import statistics
statistics.mode([1,2,3,3])
>>> 3

Or, you could find the max by its count

max(array, key = array.count)

The problem with those two methods are that they don’t work with multiple modes. The first returns an error, while the second returns the first mode.

In order to find the modes of a set, you could use this function:

def mode(array):
    most = max(list(map(array.count, array)))
    return list(set(filter(lambda x: array.count(x) == most, array)))

回答 5

扩展在列表为空时不起作用的社区答案,这是mode的有效代码:

def mode(arr):
        if arr==[]:
            return None
        else:
            return max(set(arr), key=arr.count)

Extending the Community answer that will not work when the list is empty, here is working code for mode:

def mode(arr):
        if arr==[]:
            return None
        else:
            return max(set(arr), key=arr.count)

回答 6

如果您对最小,最大或所有模式都感兴趣:

def get_small_mode(numbers, out_mode):
    counts = {k:numbers.count(k) for k in set(numbers)}
    modes = sorted(dict(filter(lambda x: x[1] == max(counts.values()), counts.items())).keys())
    if out_mode=='smallest':
        return modes[0]
    elif out_mode=='largest':
        return modes[-1]
    else:
        return modes

In case you are interested in either the smallest, largest or all modes:

def get_small_mode(numbers, out_mode):
    counts = {k:numbers.count(k) for k in set(numbers)}
    modes = sorted(dict(filter(lambda x: x[1] == max(counts.values()), counts.items())).keys())
    if out_mode=='smallest':
        return modes[0]
    elif out_mode=='largest':
        return modes[-1]
    else:
        return modes

回答 7

我写了这个方便的功能来找到模式。

def mode(nums):
    corresponding={}
    occurances=[]
    for i in nums:
            count = nums.count(i)
            corresponding.update({i:count})

    for i in corresponding:
            freq=corresponding[i]
            occurances.append(freq)

    maxFreq=max(occurances)

    keys=corresponding.keys()
    values=corresponding.values()

    index_v = values.index(maxFreq)
    global mode
    mode = keys[index_v]
    return mode

I wrote up this handy function to find the mode.

def mode(nums):
    corresponding={}
    occurances=[]
    for i in nums:
            count = nums.count(i)
            corresponding.update({i:count})

    for i in corresponding:
            freq=corresponding[i]
            occurances.append(freq)

    maxFreq=max(occurances)

    keys=corresponding.keys()
    values=corresponding.values()

    index_v = values.index(maxFreq)
    global mode
    mode = keys[index_v]
    return mode

回答 8

简短,但有点丑陋:

def mode(arr) :
    m = max([arr.count(a) for a in arr])
    return [x for x in arr if arr.count(x) == m][0] if m>1 else None

使用字典,稍微不那么难看:

def mode(arr) :
    f = {}
    for a in arr : f[a] = f.get(a,0)+1
    m = max(f.values())
    t = [(x,f[x]) for x in f if f[x]==m]
    return m > 1 t[0][0] else None

Short, but somehow ugly:

def mode(arr) :
    m = max([arr.count(a) for a in arr])
    return [x for x in arr if arr.count(x) == m][0] if m>1 else None

Using a dictionary, slightly less ugly:

def mode(arr) :
    f = {}
    for a in arr : f[a] = f.get(a,0)+1
    m = max(f.values())
    t = [(x,f[x]) for x in f if f[x]==m]
    return m > 1 t[0][0] else None

回答 9

稍长一些,但是可以有多种模式,并且可以获取具有最多计数或数据类型混合的字符串。

def getmode(inplist):
    '''with list of items as input, returns mode
    '''
    dictofcounts = {}
    listofcounts = []
    for i in inplist:
        countofi = inplist.count(i) # count items for each item in list
        listofcounts.append(countofi) # add counts to list
        dictofcounts[i]=countofi # add counts and item in dict to get later
    maxcount = max(listofcounts) # get max count of items
    if maxcount ==1:
        print "There is no mode for this dataset, values occur only once"
    else:
        modelist = [] # if more than one mode, add to list to print out
        for key, item in dictofcounts.iteritems():
            if item ==maxcount: # get item from original list with most counts
                modelist.append(str(key))
        print "The mode(s) are:",' and '.join(modelist)
        return modelist 

A little longer, but can have multiple modes and can get string with most counts or mix of datatypes.

def getmode(inplist):
    '''with list of items as input, returns mode
    '''
    dictofcounts = {}
    listofcounts = []
    for i in inplist:
        countofi = inplist.count(i) # count items for each item in list
        listofcounts.append(countofi) # add counts to list
        dictofcounts[i]=countofi # add counts and item in dict to get later
    maxcount = max(listofcounts) # get max count of items
    if maxcount ==1:
        print "There is no mode for this dataset, values occur only once"
    else:
        modelist = [] # if more than one mode, add to list to print out
        for key, item in dictofcounts.iteritems():
            if item ==maxcount: # get item from original list with most counts
                modelist.append(str(key))
        print "The mode(s) are:",' and '.join(modelist)
        return modelist 

回答 10

要使一个数字成为a mode,它必须比列表中至少一个其他数字出现更多次,并且不能是列表中唯一的数字。因此,我重构了@mathwizurd的答案(使用该difference方法),如下所示:

def mode(array):
    '''
    returns a set containing valid modes
    returns a message if no valid mode exists
      - when all numbers occur the same number of times
      - when only one number occurs in the list 
      - when no number occurs in the list 
    '''
    most = max(map(array.count, array)) if array else None
    mset = set(filter(lambda x: array.count(x) == most, array))
    return mset if set(array) - mset else "list does not have a mode!" 

这些测试成功通过:

mode([]) == None 
mode([1]) == None
mode([1, 1]) == None 
mode([1, 1, 2, 2]) == None 

For a number to be a mode, it must occur more number of times than at least one other number in the list, and it must not be the only number in the list. So, I refactored @mathwizurd’s answer (to use the difference method) as follows:

def mode(array):
    '''
    returns a set containing valid modes
    returns a message if no valid mode exists
      - when all numbers occur the same number of times
      - when only one number occurs in the list 
      - when no number occurs in the list 
    '''
    most = max(map(array.count, array)) if array else None
    mset = set(filter(lambda x: array.count(x) == most, array))
    return mset if set(array) - mset else "list does not have a mode!" 

These tests pass successfully:

mode([]) == None 
mode([1]) == None
mode([1, 1]) == None 
mode([1, 1, 2, 2]) == None 

回答 11

为什么不只是

def print_mode (thelist):
  counts = {}
  for item in thelist:
    counts [item] = counts.get (item, 0) + 1
  maxcount = 0
  maxitem = None
  for k, v in counts.items ():
    if v > maxcount:
      maxitem = k
      maxcount = v
  if maxcount == 1:
    print "All values only appear once"
  elif counts.values().count (maxcount) > 1:
    print "List has multiple modes"
  else:
    print "Mode of list:", maxitem

它没有应该进行的一些错误检查,但是它将在不导入任何功能的情况下找到模式,并且如果所有值仅出现一次,则将打印一条消息。它还不清楚是否有多个项目共享相同的最大计数。

Why not just

def print_mode (thelist):
  counts = {}
  for item in thelist:
    counts [item] = counts.get (item, 0) + 1
  maxcount = 0
  maxitem = None
  for k, v in counts.items ():
    if v > maxcount:
      maxitem = k
      maxcount = v
  if maxcount == 1:
    print "All values only appear once"
  elif counts.values().count (maxcount) > 1:
    print "List has multiple modes"
  else:
    print "Mode of list:", maxitem

This doesn’t have a few error checks that it should have, but it will find the mode without importing any functions and will print a message if all values appear only once. It will also detect multiple items sharing the same maximum count, although it wasn’t clear if you wanted that.


回答 12

该函数将返回一个函数的一个或多个模式,无论返回多少,以及返回数据集中一个或多个模式的频率。如果没有模式(即所有项目仅发生一次),该函数将返回错误字符串。这类似于上面的A_nagpal的函数,但据我拙见,它更完整,而且我认为对于任何Python新手(例如您的人)来说,阅读此问题更容易理解。

 def l_mode(list_in):
    count_dict = {}
    for e in (list_in):   
        count = list_in.count(e)
        if e not in count_dict.keys():
            count_dict[e] = count
    max_count = 0 
    for key in count_dict: 
        if count_dict[key] >= max_count:
            max_count = count_dict[key]
    corr_keys = [] 
    for corr_key, count_value in count_dict.items():
        if count_dict[corr_key] == max_count:
            corr_keys.append(corr_key)
    if max_count == 1 and len(count_dict) != 1: 
        return 'There is no mode for this data set. All values occur only once.'
    else: 
        corr_keys = sorted(corr_keys)
        return corr_keys, max_count

This function returns the mode or modes of a function no matter how many, as well as the frequency of the mode or modes in the dataset. If there is no mode (ie. all items occur only once), the function returns an error string. This is similar to A_nagpal’s function above but is, in my humble opinion, more complete, and I think it’s easier to understand for any Python novices (such as yours truly) reading this question to understand.

 def l_mode(list_in):
    count_dict = {}
    for e in (list_in):   
        count = list_in.count(e)
        if e not in count_dict.keys():
            count_dict[e] = count
    max_count = 0 
    for key in count_dict: 
        if count_dict[key] >= max_count:
            max_count = count_dict[key]
    corr_keys = [] 
    for corr_key, count_value in count_dict.items():
        if count_dict[corr_key] == max_count:
            corr_keys.append(corr_key)
    if max_count == 1 and len(count_dict) != 1: 
        return 'There is no mode for this data set. All values occur only once.'
    else: 
        corr_keys = sorted(corr_keys)
        return corr_keys, max_count

回答 13

这将返回所有模式:

def mode(numbers)
    largestCount = 0
    modes = []
    for x in numbers:
        if x in modes:
            continue
        count = numbers.count(x)
        if count > largestCount:
            del modes[:]
            modes.append(x)
            largestCount = count
        elif count == largestCount:
            modes.append(x)
    return modes

This will return all modes:

def mode(numbers)
    largestCount = 0
    modes = []
    for x in numbers:
        if x in modes:
            continue
        count = numbers.count(x)
        if count > largestCount:
            del modes[:]
            modes.append(x)
            largestCount = count
        elif count == largestCount:
            modes.append(x)
    return modes

回答 14

简单代码,无需输入即可查找列表模式:

nums = #your_list_goes_here
nums.sort()
counts = dict()
for i in nums:
    counts[i] = counts.get(i, 0) + 1
mode = max(counts, key=counts.get)

在多种模式下,它应该返回最小节点。

Simple code that finds the mode of the list without any imports:

nums = #your_list_goes_here
nums.sort()
counts = dict()
for i in nums:
    counts[i] = counts.get(i, 0) + 1
mode = max(counts, key=counts.get)

In case of multiple modes, it should return the minimum node.


回答 15

def mode(inp_list):
    sort_list = sorted(inp_list)
    dict1 = {}
    for i in sort_list:        
            count = sort_list.count(i)
            if i not in dict1.keys():
                dict1[i] = count

    maximum = 0 #no. of occurences
    max_key = -1 #element having the most occurences

    for key in dict1:
        if(dict1[key]>maximum):
            maximum = dict1[key]
            max_key = key 
        elif(dict1[key]==maximum):
            if(key<max_key):
                maximum = dict1[key]
                max_key = key

    return max_key
def mode(inp_list):
    sort_list = sorted(inp_list)
    dict1 = {}
    for i in sort_list:        
            count = sort_list.count(i)
            if i not in dict1.keys():
                dict1[i] = count

    maximum = 0 #no. of occurences
    max_key = -1 #element having the most occurences

    for key in dict1:
        if(dict1[key]>maximum):
            maximum = dict1[key]
            max_key = key 
        elif(dict1[key]==maximum):
            if(key<max_key):
                maximum = dict1[key]
                max_key = key

    return max_key

回答 16

def mode(data):
    lst =[]
    hgh=0
    for i in range(len(data)):
        lst.append(data.count(data[i]))
    m= max(lst)
    ml = [x for x in data if data.count(x)==m ] #to find most frequent values
    mode = []
    for x in ml: #to remove duplicates of mode
        if x not in mode:
        mode.append(x)
    return mode
print mode([1,2,2,2,2,7,7,5,5,5,5])
def mode(data):
    lst =[]
    hgh=0
    for i in range(len(data)):
        lst.append(data.count(data[i]))
    m= max(lst)
    ml = [x for x in data if data.count(x)==m ] #to find most frequent values
    mode = []
    for x in ml: #to remove duplicates of mode
        if x not in mode:
        mode.append(x)
    return mode
print mode([1,2,2,2,2,7,7,5,5,5,5])

回答 17

这是一个简单的函数,它获取列表中出现的第一种模式。它使用列表元素作为键和出现次数来创建字典,然后读取字典值以获取模式。

def findMode(readList):
    numCount={}
    highestNum=0
    for i in readList:
        if i in numCount.keys(): numCount[i] += 1
        else: numCount[i] = 1
    for i in numCount.keys():
        if numCount[i] > highestNum:
            highestNum=numCount[i]
            mode=i
    if highestNum != 1: print(mode)
    elif highestNum == 1: print("All elements of list appear once.")

Here is a simple function that gets the first mode that occurs in a list. It makes a dictionary with the list elements as keys and number of occurrences and then reads the dict values to get the mode.

def findMode(readList):
    numCount={}
    highestNum=0
    for i in readList:
        if i in numCount.keys(): numCount[i] += 1
        else: numCount[i] = 1
    for i in numCount.keys():
        if numCount[i] > highestNum:
            highestNum=numCount[i]
            mode=i
    if highestNum != 1: print(mode)
    elif highestNum == 1: print("All elements of list appear once.")

回答 18

如果您想使用一种对课堂有用的清晰方法,并且仅通过理解使用列表和词典,则可以执行以下操作:

def mode(my_list):
    # Form a new list with the unique elements
    unique_list = sorted(list(set(my_list)))
    # Create a comprehensive dictionary with the uniques and their count
    appearance = {a:my_list.count(a) for a in unique_list} 
    # Calculate max number of appearances
    max_app = max(appearance.values())
    # Return the elements of the dictionary that appear that # of times
    return {k: v for k, v in appearance.items() if v == max_app}

If you want a clear approach, useful for classroom and only using lists and dictionaries by comprehension, you can do:

def mode(my_list):
    # Form a new list with the unique elements
    unique_list = sorted(list(set(my_list)))
    # Create a comprehensive dictionary with the uniques and their count
    appearance = {a:my_list.count(a) for a in unique_list} 
    # Calculate max number of appearances
    max_app = max(appearance.values())
    # Return the elements of the dictionary that appear that # of times
    return {k: v for k, v in appearance.items() if v == max_app}

回答 19

#function to find mode
def mode(data):  
    modecnt=0
#for count of number appearing
    for i in range(len(data)):
        icount=data.count(data[i])
#for storing count of each number in list will be stored
        if icount>modecnt:
#the loop activates if current count if greater than the previous count 
            mode=data[i]
#here the mode of number is stored 
            modecnt=icount
#count of the appearance of number is stored
    return mode
print mode(data1)
#function to find mode
def mode(data):  
    modecnt=0
#for count of number appearing
    for i in range(len(data)):
        icount=data.count(data[i])
#for storing count of each number in list will be stored
        if icount>modecnt:
#the loop activates if current count if greater than the previous count 
            mode=data[i]
#here the mode of number is stored 
            modecnt=icount
#count of the appearance of number is stored
    return mode
print mode(data1)

回答 20

您可以在这里找到列表的均值,中位数和众数:

import numpy as np
from scipy import stats

#to take input
size = int(input())
numbers = list(map(int, input().split()))

print(np.mean(numbers))
print(np.median(numbers))
print(int(stats.mode(numbers)[0]))

Here is how you can find mean,median and mode of a list:

import numpy as np
from scipy import stats

#to take input
size = int(input())
numbers = list(map(int, input().split()))

print(np.mean(numbers))
print(np.median(numbers))
print(int(stats.mode(numbers)[0]))

回答 21

import numpy as np
def get_mode(xs):
    values, counts = np.unique(xs, return_counts=True)
    max_count_index = np.argmax(counts) #return the index with max value counts
    return values[max_count_index]
print(get_mode([1,7,2,5,3,3,8,3,2]))
import numpy as np
def get_mode(xs):
    values, counts = np.unique(xs, return_counts=True)
    max_count_index = np.argmax(counts) #return the index with max value counts
    return values[max_count_index]
print(get_mode([1,7,2,5,3,3,8,3,2]))

回答 22

对于那些寻求最小模式的人,例如:使用numpy的双峰分布情况。

import numpy as np
mode = np.argmax(np.bincount(your_list))

For those looking for the minimum mode, e.g:case of bi-modal distribution, using numpy.

import numpy as np
mode = np.argmax(np.bincount(your_list))

回答 23

数据集的模式是该集中最常出现的成员。如果有两个成员最常出现且次数相同,则数据具有两种模式。这就是所谓的双峰

如果有两种以上的模式,那么数据将被称为multimodal。如果数据集中的所有成员都出现相同的次数,则数据集中根本没有模式

以下功能modes()可用于在给定的数据列表中查找模式

import numpy as np; import pandas as pd

def modes(arr):
    df = pd.DataFrame(arr, columns=['Values'])
    dat = pd.crosstab(df['Values'], columns=['Freq'])
    if len(np.unique((dat['Freq']))) > 1:
        mode = list(dat.index[np.array(dat['Freq'] == max(dat['Freq']))])
        return mode
    else:
        print("There is NO mode in the data set")

输出:

# For a list of numbers in x as
In [1]: x = [2, 3, 4, 5, 7, 9, 8, 12, 2, 1, 1, 1, 3, 3, 2, 6, 12, 3, 7, 8, 9, 7, 12, 10, 10, 11, 12, 2]
In [2]: modes(x)
Out[2]: [2, 3, 12]
# For a list of repeated numbers in y as
In [3]: y = [2, 2, 3, 3, 4, 4, 10, 10]
In [4]: modes(y)
There is NO mode in the data set
# For a list of stings/characters in z as
In [5]: z = ['a', 'b', 'b', 'b', 'e', 'e', 'e', 'd', 'g', 'g', 'c', 'g', 'g', 'a', 'a', 'c', 'a']
In [6]: modes(z)
Out[6]: ['a', 'g']

如果我们不想从这些包中导入numpypandas调用任何函数,则要获得相同的输出,modes()可以将函数编写为:

def modes(arr):
    cnt = []
    for i in arr:
        cnt.append(arr.count(i))
    uniq_cnt = []
    for i in cnt:
        if i not in uniq_cnt:
            uniq_cnt.append(i)
    if len(uniq_cnt) > 1:
        m = []
        for i in list(range(len(cnt))):
            if cnt[i] == max(uniq_cnt):
                m.append(arr[i])
        mode = []
        for i in m:
            if i not in mode:
                mode.append(i)
        return mode
    else:
        print("There is NO mode in the data set")

Mode of a data set is/are the member(s) that occur(s) most frequently in the set. If there are two members that appear most often with same number of times, then the data has two modes. This is called bimodal.

If there are more than 2 modes, then the data would be called multimodal. If all the members in the data set appear the same number of times, then the data set has no mode at all.

Following function modes() can work to find mode(s) in a given list of data:

import numpy as np; import pandas as pd

def modes(arr):
    df = pd.DataFrame(arr, columns=['Values'])
    dat = pd.crosstab(df['Values'], columns=['Freq'])
    if len(np.unique((dat['Freq']))) > 1:
        mode = list(dat.index[np.array(dat['Freq'] == max(dat['Freq']))])
        return mode
    else:
        print("There is NO mode in the data set")

Output:

# For a list of numbers in x as
In [1]: x = [2, 3, 4, 5, 7, 9, 8, 12, 2, 1, 1, 1, 3, 3, 2, 6, 12, 3, 7, 8, 9, 7, 12, 10, 10, 11, 12, 2]
In [2]: modes(x)
Out[2]: [2, 3, 12]
# For a list of repeated numbers in y as
In [3]: y = [2, 2, 3, 3, 4, 4, 10, 10]
In [4]: modes(y)
There is NO mode in the data set
# For a list of stings/characters in z as
In [5]: z = ['a', 'b', 'b', 'b', 'e', 'e', 'e', 'd', 'g', 'g', 'c', 'g', 'g', 'a', 'a', 'c', 'a']
In [6]: modes(z)
Out[6]: ['a', 'g']

If we do not want to import numpy or pandas to call any function from these packages, then to get this same output, modes() function can be written as:

def modes(arr):
    cnt = []
    for i in arr:
        cnt.append(arr.count(i))
    uniq_cnt = []
    for i in cnt:
        if i not in uniq_cnt:
            uniq_cnt.append(i)
    if len(uniq_cnt) > 1:
        m = []
        for i in list(range(len(cnt))):
            if cnt[i] == max(uniq_cnt):
                m.append(arr[i])
        mode = []
        for i in m:
            if i not in mode:
                mode.append(i)
        return mode
    else:
        print("There is NO mode in the data set")

将缺失的日期添加到熊猫数据框

问题:将缺失的日期添加到熊猫数据框

我的数据可以在给定日期包含多个事件,也可以在一个日期包含否事件。我接受这些事件,按日期计数并绘制它们。但是,当我绘制它们时,我的两个系列并不总是匹配。

idx = pd.date_range(df['simpleDate'].min(), df['simpleDate'].max())
s = df.groupby(['simpleDate']).size()

在上面的代码中,idx变为30个日期范围。2013/09/01至2013/09/30但是S可能只有25或26天,因为在给定日期没有事件发生。然后,当我尝试绘制时,由于大小不匹配,我得到一个AssertionError:

fig, ax = plt.subplots()    
ax.bar(idx.to_pydatetime(), s, color='green')

解决这个问题的正确方法是什么?我是否要从IDX中删除没有值的日期,或者(我希望这样做)是将序列中缺少的日期添加为0(我希望这样做)?我希望有30天的完整图表(值为0)。如果这种方法正确,那么有关如何开始使用的任何建议?我需要某种动态reindex功能吗?

这是Sdf.groupby(['simpleDate']).size() )的代码段,请注意没有输入04和05。

09-02-2013     2
09-03-2013    10
09-06-2013     5
09-07-2013     1

My data can have multiple events on a given date or NO events on a date. I take these events, get a count by date and plot them. However, when I plot them, my two series don’t always match.

idx = pd.date_range(df['simpleDate'].min(), df['simpleDate'].max())
s = df.groupby(['simpleDate']).size()

In the above code idx becomes a range of say 30 dates. 09-01-2013 to 09-30-2013 However S may only have 25 or 26 days because no events happened for a given date. I then get an AssertionError as the sizes dont match when I try to plot:

fig, ax = plt.subplots()    
ax.bar(idx.to_pydatetime(), s, color='green')

What’s the proper way to tackle this? Do I want to remove dates with no values from IDX or (which I’d rather do) is add to the series the missing date with a count of 0. I’d rather have a full graph of 30 days with 0 values. If this approach is right, any suggestions on how to get started? Do I need some sort of dynamic reindex function?

Here’s a snippet of S ( df.groupby(['simpleDate']).size() ), notice no entries for 04 and 05.

09-02-2013     2
09-03-2013    10
09-06-2013     5
09-07-2013     1

回答 0

您可以使用Series.reindex

import pandas as pd

idx = pd.date_range('09-01-2013', '09-30-2013')

s = pd.Series({'09-02-2013': 2,
               '09-03-2013': 10,
               '09-06-2013': 5,
               '09-07-2013': 1})
s.index = pd.DatetimeIndex(s.index)

s = s.reindex(idx, fill_value=0)
print(s)

Yield

2013-09-01     0
2013-09-02     2
2013-09-03    10
2013-09-04     0
2013-09-05     0
2013-09-06     5
2013-09-07     1
2013-09-08     0
...

You could use Series.reindex:

import pandas as pd

idx = pd.date_range('09-01-2013', '09-30-2013')

s = pd.Series({'09-02-2013': 2,
               '09-03-2013': 10,
               '09-06-2013': 5,
               '09-07-2013': 1})
s.index = pd.DatetimeIndex(s.index)

s = s.reindex(idx, fill_value=0)
print(s)

yields

2013-09-01     0
2013-09-02     2
2013-09-03    10
2013-09-04     0
2013-09-05     0
2013-09-06     5
2013-09-07     1
2013-09-08     0
...

回答 1

使用更快的解决方法.asfreq()。这不需要创建新索引即可在中调用.reindex()

# "broken" (staggered) dates
dates = pd.Index([pd.Timestamp('2012-05-01'), 
                  pd.Timestamp('2012-05-04'), 
                  pd.Timestamp('2012-05-06')])
s = pd.Series([1, 2, 3], dates)

print(s.asfreq('D'))
2012-05-01    1.0
2012-05-02    NaN
2012-05-03    NaN
2012-05-04    2.0
2012-05-05    NaN
2012-05-06    3.0
Freq: D, dtype: float64

A quicker workaround is to use .asfreq(). This doesn’t require creation of a new index to call within .reindex().

# "broken" (staggered) dates
dates = pd.Index([pd.Timestamp('2012-05-01'), 
                  pd.Timestamp('2012-05-04'), 
                  pd.Timestamp('2012-05-06')])
s = pd.Series([1, 2, 3], dates)

print(s.asfreq('D'))
2012-05-01    1.0
2012-05-02    NaN
2012-05-03    NaN
2012-05-04    2.0
2012-05-05    NaN
2012-05-06    3.0
Freq: D, dtype: float64

回答 2

一个问题是,reindex如果存在重复值,该操作将失败。假设我们正在处理带时间戳的数据,我们希望按日期将其编入索引:

df = pd.DataFrame({
    'timestamps': pd.to_datetime(
        ['2016-11-15 1:00','2016-11-16 2:00','2016-11-16 3:00','2016-11-18 4:00']),
    'values':['a','b','c','d']})
df.index = pd.DatetimeIndex(df['timestamps']).floor('D')
df

Yield

            timestamps             values
2016-11-15  "2016-11-15 01:00:00"  a
2016-11-16  "2016-11-16 02:00:00"  b
2016-11-16  "2016-11-16 03:00:00"  c
2016-11-18  "2016-11-18 04:00:00"  d

由于2016-11-16日期重复,尝试重新编制索引:

all_days = pd.date_range(df.index.min(), df.index.max(), freq='D')
df.reindex(all_days)

失败与:

...
ValueError: cannot reindex from a duplicate axis

(这表示索引重复,而不是索引本身是重复项)

相反,我们可以使用.loc查找范围内所有日期的条目:

df.loc[all_days]

Yield

            timestamps             values
2016-11-15  "2016-11-15 01:00:00"  a
2016-11-16  "2016-11-16 02:00:00"  b
2016-11-16  "2016-11-16 03:00:00"  c
2016-11-17  NaN                    NaN
2016-11-18  "2016-11-18 04:00:00"  d

fillna 如果需要,可用于色谱柱系列以填充空白。

One issue is that reindex will fail if there are duplicate values. Say we’re working with timestamped data, which we want to index by date:

df = pd.DataFrame({
    'timestamps': pd.to_datetime(
        ['2016-11-15 1:00','2016-11-16 2:00','2016-11-16 3:00','2016-11-18 4:00']),
    'values':['a','b','c','d']})
df.index = pd.DatetimeIndex(df['timestamps']).floor('D')
df

yields

            timestamps             values
2016-11-15  "2016-11-15 01:00:00"  a
2016-11-16  "2016-11-16 02:00:00"  b
2016-11-16  "2016-11-16 03:00:00"  c
2016-11-18  "2016-11-18 04:00:00"  d

Due to the duplicate 2016-11-16 date, an attempt to reindex:

all_days = pd.date_range(df.index.min(), df.index.max(), freq='D')
df.reindex(all_days)

fails with:

...
ValueError: cannot reindex from a duplicate axis

(by this it means the index has duplicates, not that it is itself a dup)

Instead, we can use .loc to look up entries for all dates in range:

df.loc[all_days]

yields

            timestamps             values
2016-11-15  "2016-11-15 01:00:00"  a
2016-11-16  "2016-11-16 02:00:00"  b
2016-11-16  "2016-11-16 03:00:00"  c
2016-11-17  NaN                    NaN
2016-11-18  "2016-11-18 04:00:00"  d

fillna can be used on the column series to fill blanks if needed.


回答 3

另一种方法是resample,除了缺少日期外,还可以处理重复的日期。例如:

df.resample('D').mean()

resample是一个类似的延迟操作,groupby因此您需要执行另一个操作。在这种情况下mean工作得很好,但你也可以使用许多其他的熊猫方法,如maxsum等。

这是原始数据,但带有“ 2013-09-03”的附加条目:

             val
date           
2013-09-02     2
2013-09-03    10
2013-09-03    20    <- duplicate date added to OP's data
2013-09-06     5
2013-09-07     1

结果如下:

             val
date            
2013-09-02   2.0
2013-09-03  15.0    <- mean of original values for 2013-09-03
2013-09-04   NaN    <- NaN b/c date not present in orig
2013-09-05   NaN    <- NaN b/c date not present in orig
2013-09-06   5.0
2013-09-07   1.0

我将遗漏的日期保留为NaN以便清楚地说明其工作原理,但是您可以fillna(0)根据OP的要求添加以零代替NaN的方法,也可以interpolate()根据相邻行使用类似非零值的填充方法。

An alternative approach is resample, which can handle duplicate dates in addition to missing dates. For example:

df.resample('D').mean()

resample is a deferred operation like groupby so you need to follow it with another operation. In this case mean works well, but you can also use many other pandas methods like max, sum, etc.

Here is the original data, but with an extra entry for ‘2013-09-03’:

             val
date           
2013-09-02     2
2013-09-03    10
2013-09-03    20    <- duplicate date added to OP's data
2013-09-06     5
2013-09-07     1

And here are the results:

             val
date            
2013-09-02   2.0
2013-09-03  15.0    <- mean of original values for 2013-09-03
2013-09-04   NaN    <- NaN b/c date not present in orig
2013-09-05   NaN    <- NaN b/c date not present in orig
2013-09-06   5.0
2013-09-07   1.0

I left the missing dates as NaNs to make it clear how this works, but you can add fillna(0) to replace NaNs with zeroes as requested by the OP or alternatively use something like interpolate() to fill with non-zero values based on the neighboring rows.


回答 4

这是一种将缺失的日期填充到数据框中的好方法,您可以选择fill_valuedays_back填充和date_order排序对数据框进行排序的顺序():

def fill_in_missing_dates(df, date_col_name = 'date',date_order = 'asc', fill_value = 0, days_back = 30):

    df.set_index(date_col_name,drop=True,inplace=True)
    df.index = pd.DatetimeIndex(df.index)
    d = datetime.now().date()
    d2 = d - timedelta(days = days_back)
    idx = pd.date_range(d2, d, freq = "D")
    df = df.reindex(idx,fill_value=fill_value)
    df[date_col_name] = pd.DatetimeIndex(df.index)

    return df

Here’s a nice method to fill in missing dates into a dataframe, with your choice of fill_value, days_back to fill in, and sort order (date_order) by which to sort the dataframe:

def fill_in_missing_dates(df, date_col_name = 'date',date_order = 'asc', fill_value = 0, days_back = 30):

    df.set_index(date_col_name,drop=True,inplace=True)
    df.index = pd.DatetimeIndex(df.index)
    d = datetime.now().date()
    d2 = d - timedelta(days = days_back)
    idx = pd.date_range(d2, d, freq = "D")
    df = df.reindex(idx,fill_value=fill_value)
    df[date_col_name] = pd.DatetimeIndex(df.index)

    return df

“最终”是否总是在Python中执行?

问题:“最终”是否总是在Python中执行?

对于Python中任何可能的try-finally块,是否保证finally将始终执行该块?

例如,假设我在一个except街区中返回:

try:
    1/0
except ZeroDivisionError:
    return
finally:
    print("Does this code run?")

或者,也许我重新提出一个Exception

try:
    1/0
except ZeroDivisionError:
    raise
finally:
    print("What about this code?")

测试表明finally上述示例确实可以执行,但我想我还没有想到其他场景。

在任何情况下,某个finally块可能无法在Python中执行?

For any possible try-finally block in Python, is it guaranteed that the finally block will always be executed?

For example, let’s say I return while in an except block:

try:
    1/0
except ZeroDivisionError:
    return
finally:
    print("Does this code run?")

Or maybe I re-raise an Exception:

try:
    1/0
except ZeroDivisionError:
    raise
finally:
    print("What about this code?")

Testing shows that finally does get executed for the above examples, but I imagine there are other scenarios I haven’t thought of.

Are there any scenarios in which a finally block can fail to execute in Python?


回答 0

“保证”一词比任何finally应得的实现都要强大得多。什么是保证的是,如果执行全部的流出tryfinally结构,它会通过finally这样做。无法保证执行将流出tryfinally

  • finally中一台生成器或异步协同程序可能永远不会运行,如果对象根本不会执行到结束。可能有很多方式发生。这是一个:

    def gen(text):
        try:
            for line in text:
                try:
                    yield int(line)
                except:
                    # Ignore blank lines - but catch too much!
                    pass
        finally:
            print('Doing important cleanup')
    
    text = ['1', '', '2', '', '3']
    
    if any(n > 1 for n in gen(text)):
        print('Found a number')
    
    print('Oops, no cleanup.')
    

    请注意,这个示例有些棘手:当生成器被垃圾回收时,Python尝试finally通过抛出GeneratorExit异常来运行该块,但是这里我们捕获了该异常,然后yield再次出现,此时Python打印警告(“生成器忽略了GeneratorExit ”)并放弃。有关详细信息,请参见PEP 342(通过增强型生成器的协程)

    生成器或协同程序可能不会执行到结束的其他方式包括:如果对象只是从来没有GC’ed(是的,这是可能的,即使在CPython的),或者如果async with awaitS IN __aexit__,或者如果对象awaitS或yieldS IN一个finally块。此列表并非详尽无遗。

  • finally如果所有非守护程序线程都首先退出,则守护程序线程中的A 可能永远不会执行

  • os._exit将立即停止该进程而不执行finally块。

  • os.fork可能导致finally执行两次。如果您对共享资源的访问未正确同步,则可能会导致并发访问冲突(崩溃,停顿等),这不仅会发生两次常见的正常问题,还会导致并发访问冲突。

    由于multiprocessing在使用fork start方法(Unix上的默认设置)时使用fork-without-exec创建工作进程,然后os._exit在工作程序完成后调用工作程序,finally因此multiprocessing交互可能会出现问题(示例)。

  • C级分段故障将阻止finally块运行。
  • kill -SIGKILL将阻止finally块运行。SIGTERM并且SIGHUP也将阻止finally运行,除非你安装一个处理器来控制自己的关断块; 默认情况下,Python不处理SIGTERMSIGHUP
  • 中的异常finally会阻止清理完成。其中特别值得注意的情况是,如果用户点击控制-C 只是因为我们已经开始执行该finally块。Python将引发KeyboardInterrupt并跳过该finally块内容的每一行。(KeyboardInterrupt-safe代码很难编写)。
  • 如果计算机断电,或者休眠且无法唤醒,finally则无法运行数据块。

finally区块不是交易系统;它不提供原子性保证或任何种类的保证。这些示例中的一些可能看起来很明显,但是很容易忘记这种事情可能发生并且finally过于依赖。

“Guaranteed” is a much stronger word than any implementation of finally deserves. What is guaranteed is that if execution flows out of the whole tryfinally construct, it will pass through the finally to do so. What is not guaranteed is that execution will flow out of the tryfinally.

  • A finally in a generator or async coroutine might never run, if the object never executes to conclusion. There are a lot of ways that could happen; here’s one:

    def gen(text):
        try:
            for line in text:
                try:
                    yield int(line)
                except:
                    # Ignore blank lines - but catch too much!
                    pass
        finally:
            print('Doing important cleanup')
    
    text = ['1', '', '2', '', '3']
    
    if any(n > 1 for n in gen(text)):
        print('Found a number')
    
    print('Oops, no cleanup.')
    

    Note that this example is a bit tricky: when the generator is garbage collected, Python attempts to run the finally block by throwing in a GeneratorExit exception, but here we catch that exception and then yield again, at which point Python prints a warning (“generator ignored GeneratorExit”) and gives up. See PEP 342 (Coroutines via Enhanced Generators) for details.

    Other ways a generator or coroutine might not execute to conclusion include if the object is just never GC’ed (yes, that’s possible, even in CPython), or if an async with awaits in __aexit__, or if the object awaits or yields in a finally block. This list is not intended to be exhaustive.

  • A finally in a daemon thread might never execute if all non-daemon threads exit first.

  • os._exit will halt the process immediately without executing finally blocks.

  • os.fork may cause finally blocks to execute twice. As well as just the normal problems you’d expect from things happening twice, this could cause concurrent access conflicts (crashes, stalls, …) if access to shared resources is not correctly synchronized.

    Since multiprocessing uses fork-without-exec to create worker processes when using the fork start method (the default on Unix), and then calls os._exit in the worker once the worker’s job is done, finally and multiprocessing interaction can be problematic (example).

  • A C-level segmentation fault will prevent finally blocks from running.
  • kill -SIGKILL will prevent finally blocks from running. SIGTERM and SIGHUP will also prevent finally blocks from running unless you install a handler to control the shutdown yourself; by default, Python does not handle SIGTERM or SIGHUP.
  • An exception in finally can prevent cleanup from completing. One particularly noteworthy case is if the user hits control-C just as we’re starting to execute the finally block. Python will raise a KeyboardInterrupt and skip every line of the finally block’s contents. (KeyboardInterrupt-safe code is very hard to write).
  • If the computer loses power, or if it hibernates and doesn’t wake up, finally blocks won’t run.

The finally block is not a transaction system; it doesn’t provide atomicity guarantees or anything of the sort. Some of these examples might seem obvious, but it’s easy to forget such things can happen and rely on finally for too much.


回答 1

是。 最后总是胜利。

克服它的唯一方法是在finally:有机会执行之前停止执行(例如,使解释器崩溃,关闭计算机,永远暂停生成器)。

我想还有其他我没想到的情况。

您可能还没有想到以下几点:

def foo():
    # finally always wins
    try:
        return 1
    finally:
        return 2

def bar():
    # even if he has to eat an unhandled exception, finally wins
    try:
        raise Exception('boom')
    finally:
        return 'no boom'

根据您退出解释器的方式,有时您可以最终“取消”,但不是这样:

>>> import sys
>>> try:
...     sys.exit()
... finally:
...     print('finally wins!')
... 
finally wins!
$

使用不稳定的方法os._exit(在我看来,这属于“使解释器崩溃”的原因):

>>> import os
>>> try:
...     os._exit(1)
... finally:
...     print('finally!')
... 
$

我当前正在运行以下代码,以测试在宇宙热死之后,是否最终仍然可以执行:

try:
    while True:
       sleep(1)
finally:
    print('done')

但是,我仍在等待结果,因此请稍后再检查。

Yes. Finally always wins.

The only way to defeat it is to halt execution before finally: gets a chance to execute (e.g. crash the interpreter, turn off your computer, suspend a generator forever).

I imagine there are other scenarios I haven’t thought of.

Here are a couple more you may not have thought about:

def foo():
    # finally always wins
    try:
        return 1
    finally:
        return 2

def bar():
    # even if he has to eat an unhandled exception, finally wins
    try:
        raise Exception('boom')
    finally:
        return 'no boom'

Depending on how you quit the interpreter, sometimes you can “cancel” finally, but not like this:

>>> import sys
>>> try:
...     sys.exit()
... finally:
...     print('finally wins!')
... 
finally wins!
$

Using the precarious os._exit (this falls under “crash the interpreter” in my opinion):

>>> import os
>>> try:
...     os._exit(1)
... finally:
...     print('finally!')
... 
$

I’m currently running this code, to test if finally will still execute after the heat death of the universe:

try:
    while True:
       sleep(1)
finally:
    print('done')

However, I’m still waiting on the result, so check back here later.


回答 2

根据Python文档

无论以前发生了什么,一旦代码块完成并处理了所有引发的异常,便会执行final块。即使异常处理程序或else块中存在错误,并且引发了新的异常,final块中的代码仍将运行。

还应注意,如果有多个return语句,包括finally块中的一个语句,则finally块返回是唯一将执行的语句。

According to the Python documentation:

No matter what happened previously, the final-block is executed once the code block is complete and any raised exceptions handled. Even if there’s an error in an exception handler or the else-block and a new exception is raised, the code in the final-block is still run.

It should also be noted that if there are multiple return statements, including one in the finally block, then the finally block return is the only one that will execute.


回答 3

好吧,是的,不是。

可以保证的是,Python将始终尝试执行finally块。如果您从该块返回或引发未捕获的异常,则在实际返回或引发异常之前执行finally块。

(只要运行问题中的代码,您本可以控制自己的一切)

我能想象的唯一情况是在Python解释器本身崩溃(例如在C代码内部或由于断电)时,将不会执行finally块。

Well, yes and no.

What is guaranteed is that Python will always try to execute the finally block. In the case where you return from the block or raise an uncaught exception, the finally block is executed just before actually returning or raising the exception.

(what you could have controlled yourself by simply running the code in your question)

The only case I can imagine where the finally block will not be executed is when the Python interpretor itself crashes for example inside C code or because of power outage.


回答 4

我没有使用生成器功能就发现了这一点:

import multiprocessing
import time

def fun(arg):
  try:
    print("tried " + str(arg))
    time.sleep(arg)
  finally:
    print("finally cleaned up " + str(arg))
  return foo

list = [1, 2, 3]
multiprocessing.Pool().map(fun, list)

睡眠可以是可能运行时间不一致的任何代码。

这里出现的情况是,第一个完成的并行处理成功地离开了try块,但随后尝试从该函数返回一个在任何地方都未定义的值(foo),这会导致异常。该异常会杀死映射,而不允许其他进程到达其finally块。

另外,如果您bar = bazz在try块中的sleep()调用之后添加该行。然后,到达该行的第一个进程将引发异常(因为未定义bazz),这将导致运行其自己的finally块,但随后杀死该映射,从而导致其他try块消失而未到达其finally块,并且第一个过程也不到达其return语句。

这对于Python多处理意味着什么,即使哪一个进程都可能有异常,您也不能相信异常处理机制来清理所有进程中的资源。在多处理映射调用之外需要其他信号处理或管理资源。

I found this one without using a generator function:

import multiprocessing
import time

def fun(arg):
  try:
    print("tried " + str(arg))
    time.sleep(arg)
  finally:
    print("finally cleaned up " + str(arg))
  return foo

list = [1, 2, 3]
multiprocessing.Pool().map(fun, list)

The sleep can be any code that might run for inconsistent amounts of time.

What appears to be happening here is that the first parallel process to finish leaves the try block successfully, but then attempts to return from the function a value (foo) that hasn’t been defined anywhere, which causes an exception. That exception kills the map without allowing the other processes to reach their finally blocks.

Also, if you add the line bar = bazz just after the sleep() call in the try block. Then the first process to reach that line throws an exception (because bazz isn’t defined), which causes its own finally block to be run, but then kills the map, causing the other try blocks to disappear without reaching their finally blocks, and the first process not to reach its return statement, either.

What this means for Python multiprocessing is that you can’t trust the exception-handling mechanism to clean up resources in all processes if even one of the processes can have an exception. Additional signal handling or managing the resources outside the multiprocessing map call would be necessary.


回答 5

接受的答案的附录,只是为了帮助了解它的工作原理,并提供了一些示例:

  • 这个:

     try:
         1
     except:
         print 'except'
     finally:
         print 'finally'

    将输出

    最后

  •    try:
           1/0
       except:
           print 'except'
       finally:
           print 'finally'

    将输出

    除了
    最后

Addendum to the accepted answer, just to help to see how it works, with a few examples:

  • This:

     try:
         1
     except:
         print 'except'
     finally:
         print 'finally'
    

    will output

    finally

  •    try:
           1/0
       except:
           print 'except'
       finally:
           print 'finally'
    

    will output

    except
    finally


Python中的字符串到字典

问题:Python中的字符串到字典

所以我花了很多时间在此上,在我看来,这应该是一个简单的修复。我正在尝试使用Facebook的身份验证在我的网站上注册用户,并且正在服务器端进行操作。我已经到了获取访问令牌的地步,并且当我去:

https://graph.facebook.com/me?access_token=MY_ACCESS_TOKEN

我得到的信息就是这样的字符串:

{"id":"123456789","name":"John Doe","first_name":"John","last_name":"Doe","link":"http:\/\/www.facebook.com\/jdoe","gender":"male","email":"jdoe\u0040gmail.com","timezone":-7,"locale":"en_US","verified":true,"updated_time":"2011-01-12T02:43:35+0000"}

似乎我应该可以使用dict(string)它,但出现此错误:

ValueError: dictionary update sequence element #0 has length 1; 2 is required

所以我尝试使用Pickle,但收到此错误:

KeyError: '{'

我尝试使用django.serializers反序列化它,但结果相似。有什么想法吗?我觉得答案必须很简单,而且我很愚蠢。谢谢你的帮助!

So I’ve spent way to much time on this, and it seems to me like it should be a simple fix. I’m trying to use Facebook’s Authentication to register users on my site, and I’m trying to do it server side. I’ve gotten to the point where I get my access token, and when I go to:

https://graph.facebook.com/me?access_token=MY_ACCESS_TOKEN

I get the information I’m looking for as a string that’s like this:

{"id":"123456789","name":"John Doe","first_name":"John","last_name":"Doe","link":"http:\/\/www.facebook.com\/jdoe","gender":"male","email":"jdoe\u0040gmail.com","timezone":-7,"locale":"en_US","verified":true,"updated_time":"2011-01-12T02:43:35+0000"}

It seems like I should just be able to use dict(string) on this but I’m getting this error:

ValueError: dictionary update sequence element #0 has length 1; 2 is required

So I tried using Pickle, but got this error:

KeyError: '{'

I tried using django.serializers to de-serialize it but had similar results. Any thoughts? I feel like the answer has to be simple, and I’m just being stupid. Thanks for any help!


回答 0

此数据为JSON!如果您使用的是Python 2.6+,则可以使用内置json模块反序列化它,否则可以使用出色的第三方simplejson模块

import json    # or `import simplejson as json` if on Python < 2.6

json_string = u'{ "id":"123456789", ... }'
obj = json.loads(json_string)    # obj now contains a dict of the data

This data is JSON! You can deserialize it using the built-in json module if you’re on Python 2.6+, otherwise you can use the excellent third-party simplejson module.

import json    # or `import simplejson as json` if on Python < 2.6

json_string = u'{ "id":"123456789", ... }'
obj = json.loads(json_string)    # obj now contains a dict of the data

回答 1

使用ast.literal_eval评估Python文字。但是,您拥有的是JSON(例如,请注意“ true”),因此请使用JSON解串器。

>>> import json
>>> s = """{"id":"123456789","name":"John Doe","first_name":"John","last_name":"Doe","link":"http:\/\/www.facebook.com\/jdoe","gender":"male","email":"jdoe\u0040gmail.com","timezone":-7,"locale":"en_US","verified":true,"updated_time":"2011-01-12T02:43:35+0000"}"""
>>> json.loads(s)
{u'first_name': u'John', u'last_name': u'Doe', u'verified': True, u'name': u'John Doe', u'locale': u'en_US', u'gender': u'male', u'email': u'jdoe@gmail.com', u'link': u'http://www.facebook.com/jdoe', u'timezone': -7, u'updated_time': u'2011-01-12T02:43:35+0000', u'id': u'123456789'}

Use ast.literal_eval to evaluate Python literals. However, what you have is JSON (note “true” for example), so use a JSON deserializer.

>>> import json
>>> s = """{"id":"123456789","name":"John Doe","first_name":"John","last_name":"Doe","link":"http:\/\/www.facebook.com\/jdoe","gender":"male","email":"jdoe\u0040gmail.com","timezone":-7,"locale":"en_US","verified":true,"updated_time":"2011-01-12T02:43:35+0000"}"""
>>> json.loads(s)
{u'first_name': u'John', u'last_name': u'Doe', u'verified': True, u'name': u'John Doe', u'locale': u'en_US', u'gender': u'male', u'email': u'jdoe@gmail.com', u'link': u'http://www.facebook.com/jdoe', u'timezone': -7, u'updated_time': u'2011-01-12T02:43:35+0000', u'id': u'123456789'}