标签归档:string

如何在Python中按字母顺序对字符串中的字母进行排序

问题:如何在Python中按字母顺序对字符串中的字母进行排序

有没有一种简单的方法可以在Python中按字母顺序对字符串中的字母进行排序?

因此对于:

a = 'ZENOVW'

我想返回:

'ENOVWZ'

Is there an easy way to sort the letters in a string alphabetically in Python?

So for:

a = 'ZENOVW'

I would like to return:

'ENOVWZ'

回答 0

你可以做:

>>> a = 'ZENOVW'
>>> ''.join(sorted(a))
'ENOVWZ'

You can do:

>>> a = 'ZENOVW'
>>> ''.join(sorted(a))
'ENOVWZ'

回答 1

>>> a = 'ZENOVW'
>>> b = sorted(a)
>>> print b
['E', 'N', 'O', 'V', 'W', 'Z']

sorted返回一个列表,这样你就可以用它做一个字符串,再次join

>>> c = ''.join(b)

其中将的项目b''每个项目之间的空字符串连接在一起。

>>> print c
'ENOVWZ'
>>> a = 'ZENOVW'
>>> b = sorted(a)
>>> print b
['E', 'N', 'O', 'V', 'W', 'Z']

sorted returns a list, so you can make it a string again using join:

>>> c = ''.join(b)

which joins the items of b together with an empty string '' in between each item.

>>> print c
'ENOVWZ'

回答 2

Sorted()解决方案可以为您提供其他字符串带来的意外结果。

其他解决方案列表:

对字母排序并使其与众不同:

>>> s = "Bubble Bobble"
>>> ''.join(sorted(set(s.lower())))
' belou'

排序字母并使它们与众不同,同时保持大写字母:

>>> s = "Bubble Bobble"
>>> ''.join(sorted(set(s)))
' Bbelou'

排序字母并保留重复项:

>>> s = "Bubble Bobble"
>>> ''.join(sorted(s))
' BBbbbbeellou'

如果要消除结果中的空间,请在上述任何情况下添加strip()函数:

>>> s = "Bubble Bobble"
>>> ''.join(sorted(set(s.lower()))).strip()
'belou'

Sorted() solution can give you some unexpected results with other strings.

List of other solutions:

Sort letters and make them distinct:

>>> s = "Bubble Bobble"
>>> ''.join(sorted(set(s.lower())))
' belou'

Sort letters and make them distinct while keeping caps:

>>> s = "Bubble Bobble"
>>> ''.join(sorted(set(s)))
' Bbelou'

Sort letters and keep duplicates:

>>> s = "Bubble Bobble"
>>> ''.join(sorted(s))
' BBbbbbeellou'

If you want to get rid of the space in the result, add strip() function in any of those mentioned cases:

>>> s = "Bubble Bobble"
>>> ''.join(sorted(set(s.lower()))).strip()
'belou'

回答 3

您可以使用减少

>>> a = 'ZENOVW'
>>> reduce(lambda x,y: x+y, sorted(a))
'ENOVWZ'

You can use reduce

>>> a = 'ZENOVW'
>>> reduce(lambda x,y: x+y, sorted(a))
'ENOVWZ'

回答 4

Python函数sorted返回基于ASCII的字符串结果。

不正确:在下面的例子中,e并且d是落后HW由于它以ASCII值。

>>>a = "Hello World!"
>>>"".join(sorted(a))
' !!HWdellloor'

正确:为了写排序后的字符串而不更改字母大小写。使用代码:

>>> a = "Hello World!"
>>> "".join(sorted(a,key=lambda x:x.lower()))
' !deHllloorW'

如果要删除所有标点和数字。使用代码:

>>> a = "Hello World!"
>>> "".join(filter(lambda x:x.isalpha(), sorted(a,key=lambda x:x.lower())))
'deHllloorW'

Python functionsorted returns ASCII based result for string.

INCORRECT: In the example below, e and d is behind H and W due it’s to ASCII value.

>>>a = "Hello World!"
>>>"".join(sorted(a))
' !!HWdellloor'

CORRECT: In order to write the sorted string without changing the case of letter. Use the code:

>>> a = "Hello World!"
>>> "".join(sorted(a,key=lambda x:x.lower()))
' !deHllloorW'

If you want to remove all punctuation and numbers. Use the code:

>>> a = "Hello World!"
>>> "".join(filter(lambda x:x.isalpha(), sorted(a,key=lambda x:x.lower())))
'deHllloorW'

回答 5

该代码可用于按字母顺序对字符串进行排序,而无需使用python的任何内置函数

k =输入(“再次输入任何字符串”)

li = []
x = len(k)
for i in range (0,x):
    li.append(k[i])

print("List is : ",li)


for i in range(0,x):
    for j in range(0,x):
        if li[i]<li[j]:
            temp = li[i]
            li[i]=li[j]
            li[j]=temp
j=""

for i in range(0,x):
    j = j+li[i]

print("After sorting String is : ",j)

the code can be used to sort string in alphabetical order without using any inbuilt function of python

k = input(“Enter any string again “)

li = []
x = len(k)
for i in range (0,x):
    li.append(k[i])

print("List is : ",li)


for i in range(0,x):
    for j in range(0,x):
        if li[i]<li[j]:
            temp = li[i]
            li[i]=li[j]
            li[j]=temp
j=""

for i in range(0,x):
    j = j+li[i]

print("After sorting String is : ",j)

回答 6

真的很喜欢用reduce()函数的答案。这是使用accumulate()对字符串排序的另一种方法。

from itertools import accumulate
s = 'mississippi'
print(tuple(accumulate(sorted(s)))[-1])

排序-> [‘i’,’i’,’i’,’i’,’m’,’p’,’p’,’s’,’s’,’s’,’s’ ]

元组(累计(已排序)->(’i’,’ii’,’iii’,’iiii’,’iiiim’,’iiiimp’,’iiiimpp’,’iiiimpps’,’iiiimppss’,’iiiimppsss ‘,’iiiimppssss’)

我们正在选择元组的最后一个索引(-1)

Really liked the answer with the reduce() function. Here’s another way to sort the string using accumulate().

from itertools import accumulate
s = 'mississippi'
print(tuple(accumulate(sorted(s)))[-1])

sorted(s) -> [‘i’, ‘i’, ‘i’, ‘i’, ‘m’, ‘p’, ‘p’, ‘s’, ‘s’, ‘s’, ‘s’]

tuple(accumulate(sorted(s)) -> (‘i’, ‘ii’, ‘iii’, ‘iiii’, ‘iiiim’, ‘iiiimp’, ‘iiiimpp’, ‘iiiimpps’, ‘iiiimppss’, ‘iiiimppsss’, ‘iiiimppssss’)

We are selecting the last index (-1) of the tuple


Python:对Unicode转义的字符串使用.format()

问题:Python:对Unicode转义的字符串使用.format()

我正在使用Python 2.6.5。我的代码要求使用“大于或等于”符号。它去了:

>>> s = u'\u2265'
>>> print s
>>> 
>>> print "{0}".format(s)
Traceback (most recent call last):
     File "<input>", line 1, in <module> 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265'
  in position 0: ordinal not in range(128)`  

为什么会出现此错误?有正确的方法吗?我需要使用该.format()功能。

I am using Python 2.6.5. My code requires the use of the “more than or equal to” sign. Here it goes:

>>> s = u'\u2265'
>>> print s
>>> ≥
>>> print "{0}".format(s)
Traceback (most recent call last):
     File "<input>", line 1, in <module> 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265'
  in position 0: ordinal not in range(128)`  

Why do I get this error? Is there a right way to do this? I need to use the .format() function.


回答 0

只需将第二个字符串也设为unicode字符串

>>> s = u'\u2265'
>>> print s

>>> print "{0}".format(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)
>>> print u"{0}".format(s)
≥
>>> 

Just make the second string also a unicode string

>>> s = u'\u2265'
>>> print s
≥
>>> print "{0}".format(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)
>>> print u"{0}".format(s)
≥
>>> 

回答 1

unicode需要unicode格式字符串。

>>> print u'{0}'.format(s)

unicodes need unicode format strings.

>>> print u'{0}'.format(s)
≥

回答 2

一点的更多信息,为什么出现这种情况。

>>> s = u'\u2265'
>>> print s

之所以起作用,是因为print自动为您的环境使用系统编码,该编码很可能已设置为UTF-8。(您可以通过做检查import sys; print sys.stdout.encoding

>>> print "{0}".format(s)

失败,因为format尝试匹配调用它的类型的编码(我找不到关于它的文档,但这是我注意到的行为)。由于字符串文字是python 2中编码为ASCII的字节字符串,因此format尝试将其编码s为ASCII,然后导致该异常。观察:

>>> s = u'\u2265'
>>> s.encode('ascii')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)

因此,这基本上就是这些方法起作用的原因:

>>> s = u'\u2265'
>>> print u'{}'.format(s)

>>> print '{}'.format(s.encode('utf-8'))

源字符集由编码声明定义。如果源文件中没有给出编码声明,则为ASCII(https://docs.python.org/2/reference/lexical_analysis.html#string-literals

A bit more information on why that happens.

>>> s = u'\u2265'
>>> print s

works because print automatically uses the system encoding for your environment, which was likely set to UTF-8. (You can check by doing import sys; print sys.stdout.encoding)

>>> print "{0}".format(s)

fails because format tries to match the encoding of the type that it is called on (I couldn’t find documentation on this, but this is the behavior I’ve noticed). Since string literals are byte strings encoded as ASCII in python 2, format tries to encode s as ASCII, which then results in that exception. Observe:

>>> s = u'\u2265'
>>> s.encode('ascii')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)

So that is basically why these approaches work:

>>> s = u'\u2265'
>>> print u'{}'.format(s)
≥
>>> print '{}'.format(s.encode('utf-8'))
≥

The source character set is defined by the encoding declaration; it is ASCII if no encoding declaration is given in the source file (https://docs.python.org/2/reference/lexical_analysis.html#string-literals)


Python:将None转换为空字符串的最惯用方式?

问题:Python:将None转换为空字符串的最惯用方式?

做以下事情的最惯用的方法是什么?

def xstr(s):
    if s is None:
        return ''
    else:
        return s

s = xstr(a) + xstr(b)

更新:我合并了Tryptich的建议使用str(s),这使此例程可用于字符串以外的其他类型。Vinay Sajip的lambda建议给我留下了深刻的印象,但是我想保持我的代码相对简单。

def xstr(s):
    if s is None:
        return ''
    else:
        return str(s)

What is the most idiomatic way to do the following?

def xstr(s):
    if s is None:
        return ''
    else:
        return s

s = xstr(a) + xstr(b)

update: I’m incorporating Tryptich’s suggestion to use str(s), which makes this routine work for other types besides strings. I’m awfully impressed by Vinay Sajip’s lambda suggestion, but I want to keep my code relatively simple.

def xstr(s):
    if s is None:
        return ''
    else:
        return str(s)

回答 0

如果您实际上希望函数的行为类似于str()内置函数,但是当参数为None时返回空字符串,请执行以下操作:

def xstr(s):
    if s is None:
        return ''
    return str(s)

If you actually want your function to behave like the str() built-in, but return an empty string when the argument is None, do this:

def xstr(s):
    if s is None:
        return ''
    return str(s)

回答 1

def xstr(s):
    return '' if s is None else str(s)
def xstr(s):
    return '' if s is None else str(s)

回答 2

可能最短的是 str(s or '')

因为None为False,如果x为false,则“ x或y”返回y。有关详细说明,请参见布尔运算符。它很短,但不是很明确。

Probably the shortest would be str(s or '')

Because None is False, and “x or y” returns y if x is false. See Boolean Operators for a detailed explanation. It’s short, but not very explicit.


回答 3

如果您知道该值将始终是字符串或无:

xstr = lambda s: s or ""

print xstr("a") + xstr("b") # -> 'ab'
print xstr("a") + xstr(None) # -> 'a'
print xstr(None) + xstr("b") # -> 'b'
print xstr(None) + xstr(None) # -> ''

If you know that the value will always either be a string or None:

xstr = lambda s: s or ""

print xstr("a") + xstr("b") # -> 'ab'
print xstr("a") + xstr(None) # -> 'a'
print xstr(None) + xstr("b") # -> 'b'
print xstr(None) + xstr(None) # -> ''

回答 4

return s or '' 可以很好地解决您所说的问题!

return s or '' will work just fine for your stated problem!


回答 5

def xstr(s):
   return s or ""
def xstr(s):
   return s or ""

回答 6

功能方式(单线)

xstr = lambda s: '' if s is None else s

Functional way (one-liner)

xstr = lambda s: '' if s is None else s

回答 7

一个巧妙的单线代码可以在其他一些答案上进行构建:

s = (lambda v: v or '')(a) + (lambda v: v or '')(b)

甚至只是:

s = (a or '') + (b or '')

A neat one-liner to do this building on some of the other answers:

s = (lambda v: v or '')(a) + (lambda v: v or '')(b)

or even just:

s = (a or '') + (b or '')

回答 8

def xstr(s):
    return {None:''}.get(s, s)
def xstr(s):
    return {None:''}.get(s, s)

回答 9

我使用max函数:

max(None, '')  #Returns blank
max("Hello",'') #Returns Hello

就像一个吊饰;)只需将字符串放在函数的第一个参数中即可。

I use max function:

max(None, '')  #Returns blank
max("Hello",'') #Returns Hello

Works like a charm ;) Just put your string in the first parameter of the function.


回答 10

如果您需要与Python 2.4兼容,请在上面进行修改

xstr = lambda s: s is not None and s or ''

Variation on the above if you need to be compatible with Python 2.4

xstr = lambda s: s is not None and s or ''

回答 11

如果要格式化字符串,则可以执行以下操作:

from string import Formatter

class NoneAsEmptyFormatter(Formatter):
    def get_value(self, key, args, kwargs):
        v = super().get_value(key, args, kwargs)
        return '' if v is None else v

fmt = NoneAsEmptyFormatter()
s = fmt.format('{}{}', a, b)

If it is about formatting strings, you can do the following:

from string import Formatter

class NoneAsEmptyFormatter(Formatter):
    def get_value(self, key, args, kwargs):
        v = super().get_value(key, args, kwargs)
        return '' if v is None else v

fmt = NoneAsEmptyFormatter()
s = fmt.format('{}{}', a, b)

回答 12

def xstr(s):
    return s if s else ''

s = "%s%s" % (xstr(a), xstr(b))
def xstr(s):
    return s if s else ''

s = "%s%s" % (xstr(a), xstr(b))

回答 13

在下面说明的场景中,我们总是可以避免类型转换。

customer = "John"
name = str(customer)
if name is None
   print "Name is blank"
else: 
   print "Customer name : " + name

在上面的示例中,如果变量customer的值为None,则在分配给’name’的同时进一步进行强制转换。“ if”子句中的比较将始终失败。

customer = "John" # even though its None still it will work properly.
name = customer
if name is None
   print "Name is blank"
else: 
   print "Customer name : " + str(name)

上面的示例将正常工作。当从URL,JSON或XML中获取值,甚至值需要进一步的类型转换以进行任何操作时,这种情况非常普遍。

We can always avoid type casting in scenarios explained below.

customer = "John"
name = str(customer)
if name is None
   print "Name is blank"
else: 
   print "Customer name : " + name

In the example above in case variable customer’s value is None the it further gets casting while getting assigned to ‘name’. The comparison in ‘if’ clause will always fail.

customer = "John" # even though its None still it will work properly.
name = customer
if name is None
   print "Name is blank"
else: 
   print "Customer name : " + str(name)

Above example will work properly. Such scenarios are very common when values are being fetched from URL, JSON or XML or even values need further type casting for any manipulation.


回答 14

使用短路评估:

s = a or '' + b or ''

由于+对字符串不是很好的操作,因此最好使用格式字符串:

s = "%s%s" % (a or '', b or '')

Use short circuit evaluation:

s = a or '' + b or ''

Since + is not a very good operation on strings, better use format strings:

s = "%s%s" % (a or '', b or '')

回答 15

如果使用的是python v3.7,请使用F字符串

xstr = F"{s}"

Use F string if you are using python v3.7

xstr = F"{s}"

将字符串打印为十六进制字节?

问题:将字符串打印为十六进制字节?

我有这个字符串:Hello world !!我想使用Python作为打印它48:65:6c:6c:6f:20:77:6f:72:6c:64:20:21:21

hex() 仅适用于整数。

如何做呢?

I have this string: Hello world !! and I want to print it using Python as 48:65:6c:6c:6f:20:77:6f:72:6c:64:20:21:21.

hex() works only for integers.

How can it be done?


回答 0

您可以将字符串转换为int生成器,对每个元素应用十六进制格式,并使用分隔符插入:

>>> s = "Hello world !!"
>>> ":".join("{:02x}".format(ord(c)) for c in s)
'48:65:6c:6c:6f:20:77:6f:72:6c:64:20:21:21

Your can transform your string to a int generator, apply hex formatting for each element and intercalate with separator:

>>> s = "Hello world !!"
>>> ":".join("{:02x}".format(ord(c)) for c in s)
'48:65:6c:6c:6f:20:77:6f:72:6c:64:20:21:21

回答 1

':'.join(x.encode('hex') for x in 'Hello World!')
':'.join(x.encode('hex') for x in 'Hello World!')

回答 2

对于Python 2.x:

':'.join(x.encode('hex') for x in 'Hello World!')

上面的代码不适用于Python 3.x,对于3.x,下面的代码将起作用:

':'.join(hex(ord(x))[2:] for x in 'Hello World!')

For Python 2.x:

':'.join(x.encode('hex') for x in 'Hello World!')

The code above will not work with Python 3.x, for 3.x, the code below will work:

':'.join(hex(ord(x))[2:] for x in 'Hello World!')

回答 3

两行中的另一个答案可能使某些人更容易阅读,并且有助于调试字符串中的换行符或其他奇数字符:

对于Python 2.7

for character in string:
    print character, character.encode('hex')

对于Python 3.7(未在3的所有版本上进行测试)

for character in string:
    print(character, character.encode('utf-8').hex())

Another answer in two lines that some might find easier to read, and helps with debugging line breaks or other odd characters in a string:

For Python 2.7

for character in string:
    print character, character.encode('hex')

For Python 3.7 (not tested on all releases of 3)

for character in string:
    print(character, character.encode('utf-8').hex())

回答 4

Fedor Gogolev答案的一些补充:

首先,如果字符串包含“ ASCII码”低于10的字符,则不会按要求显示它们。在这种情况下,正确的格式应为{:02x}

>>> s = "Hello unicode \u0005 !!"
>>> ":".join("{0:x}".format(ord(c)) for c in s)
'48:65:6c:6c:6f:20:75:6e:69:63:6f:64:65:20:5:20:21:21'
                                           ^

>>> ":".join("{:02x}".format(ord(c)) for c in s)
'48:65:6c:6c:6f:20:75:6e:69:63:6f:64:65:20:05:20:21:21'
                                           ^^

其次,如果您的“字符串”实际上是“字节字符串”-并且由于区别在Python 3中很重要-您可能更喜欢以下内容:

>>> s = b"Hello bytes \x05 !!"
>>> ":".join("{:02x}".format(c) for c in s)
'48:65:6c:6c:6f:20:62:79:74:65:73:20:05:20:21:21'

请注意,由于字节对象被定义“范围在0 <= x <256之间的不可变整数序列”,因此不需要在上面的代码中进行转换。

Some complements to Fedor Gogolev answer:

First, if the string contains characters whose ‘ASCII code’ is below 10, they will not be displayed as required. In that case, the correct format should be {:02x}:

>>> s = "Hello unicode \u0005 !!"
>>> ":".join("{0:x}".format(ord(c)) for c in s)
'48:65:6c:6c:6f:20:75:6e:69:63:6f:64:65:20:5:20:21:21'
                                           ^

>>> ":".join("{:02x}".format(ord(c)) for c in s)
'48:65:6c:6c:6f:20:75:6e:69:63:6f:64:65:20:05:20:21:21'
                                           ^^

Second, if your “string” is in reality a “byte string” — and since the difference matters in Python 3 — you might prefer the following:

>>> s = b"Hello bytes \x05 !!"
>>> ":".join("{:02x}".format(c) for c in s)
'48:65:6c:6c:6f:20:62:79:74:65:73:20:05:20:21:21'

Please note there is no need for conversion in the above code as a bytes objects is defined as “an immutable sequence of integers in the range 0 <= x < 256”.


回答 5

将字符串打印为十六进制字节?

接受的答案给出:

s = "Hello world !!"
":".join("{:02x}".format(ord(c)) for c in s)

返回:

'48:65:6c:6c:6f:20:77:6f:72:6c:64:20:21:21'

只要您使用字节(主要是ascii字符),可接受的答案就起作用。但是,如果您使用unicode,例如:

a_string = u"Привет мир!!" # "Prevyet mir", or "Hello World" in Russian.

您需要以某种方式转换为字节。

如果您的终端不接受这些字符,则可以从UTF-8解码或使用名称(以便可以与我一起粘贴并运行代码):

a_string = (
    "\N{CYRILLIC CAPITAL LETTER PE}"
    "\N{CYRILLIC SMALL LETTER ER}"
    "\N{CYRILLIC SMALL LETTER I}"
    "\N{CYRILLIC SMALL LETTER VE}"
    "\N{CYRILLIC SMALL LETTER IE}"
    "\N{CYRILLIC SMALL LETTER TE}"
    "\N{SPACE}"
    "\N{CYRILLIC SMALL LETTER EM}"
    "\N{CYRILLIC SMALL LETTER I}"
    "\N{CYRILLIC SMALL LETTER ER}"
    "\N{EXCLAMATION MARK}"
    "\N{EXCLAMATION MARK}"
)

因此,我们看到:

":".join("{:02x}".format(ord(c)) for c in a_string)

退货

'41f:440:438:432:435:442:20:43c:438:440:21:21'

不良/意外的结果-这些代码点结合在一起,构成了来自Unicode联盟的Unicode 字形,代表了全世界的语言。但是,这并不是我们实际存储此信息的方式,因此可以由其他来源对其进行解释。

为了允许另一个源使用此数据,我们通常需要转换为UTF-8编码,例如,将该字符串以字节为单位保存到磁盘或发布为html。因此,我们需要进行编码以将代码点转换为UTF-8 的代码单元 -在Python 3中ord是不需要的,因为bytes整数是可迭代的:

>>> ":".join("{:02x}".format(c) for c in a_string.encode('utf-8'))
'd0:9f:d1:80:d0:b8:d0:b2:d0:b5:d1:82:20:d0:bc:d0:b8:d1:80:21:21'

或者,也许更优雅地使用新的f字符串(仅在Python 3中可用):

>>> ":".join(f'{c:02x}' for c in a_string.encode('utf-8'))
'd0:9f:d1:80:d0:b8:d0:b2:d0:b5:d1:82:20:d0:bc:d0:b8:d1:80:21:21'

在Python 2中,请c转到ord第一个,即ord(c)-更多示例:

>>> ":".join("{:02x}".format(ord(c)) for c in a_string.encode('utf-8'))
'd0:9f:d1:80:d0:b8:d0:b2:d0:b5:d1:82:20:d0:bc:d0:b8:d1:80:21:21'
>>> ":".join(format(ord(c), '02x') for c in a_string.encode('utf-8'))
'd0:9f:d1:80:d0:b8:d0:b2:d0:b5:d1:82:20:d0:bc:d0:b8:d1:80:21:21'

Print a string as hex bytes?

The accepted answer gives:

s = "Hello world !!"
":".join("{:02x}".format(ord(c)) for c in s)

returns:

'48:65:6c:6c:6f:20:77:6f:72:6c:64:20:21:21'

The accepted answer works only so long as you use bytes (mostly ascii characters). But if you use unicode, e.g.:

a_string = u"Привет мир!!" # "Prevyet mir", or "Hello World" in Russian.

You need to convert to bytes somehow.

If your terminal doesn’t accept these characters, you can decode from UTF-8 or use the names (so you can paste and run the code along with me):

a_string = (
    "\N{CYRILLIC CAPITAL LETTER PE}"
    "\N{CYRILLIC SMALL LETTER ER}"
    "\N{CYRILLIC SMALL LETTER I}"
    "\N{CYRILLIC SMALL LETTER VE}"
    "\N{CYRILLIC SMALL LETTER IE}"
    "\N{CYRILLIC SMALL LETTER TE}"
    "\N{SPACE}"
    "\N{CYRILLIC SMALL LETTER EM}"
    "\N{CYRILLIC SMALL LETTER I}"
    "\N{CYRILLIC SMALL LETTER ER}"
    "\N{EXCLAMATION MARK}"
    "\N{EXCLAMATION MARK}"
)

So we see that:

":".join("{:02x}".format(ord(c)) for c in a_string)

returns

'41f:440:438:432:435:442:20:43c:438:440:21:21'

a poor/unexpected result – these are the code points that combine to make the graphemes we see in Unicode, from the Unicode Consortium – representing languages all over the world. This is not how we actually store this information so it can be interpreted by other sources, though.

To allow another source to use this data, we would usually need to convert to UTF-8 encoding, for example, to save this string in bytes to disk or to publish to html. So we need that encoding to convert the code points to the code units of UTF-8 – in Python 3, ord is not needed because bytes are iterables of integers:

>>> ":".join("{:02x}".format(c) for c in a_string.encode('utf-8'))
'd0:9f:d1:80:d0:b8:d0:b2:d0:b5:d1:82:20:d0:bc:d0:b8:d1:80:21:21'

Or perhaps more elegantly, using the new f-strings (only available in Python 3):

>>> ":".join(f'{c:02x}' for c in a_string.encode('utf-8'))
'd0:9f:d1:80:d0:b8:d0:b2:d0:b5:d1:82:20:d0:bc:d0:b8:d1:80:21:21'

In Python 2, pass c to ord first, i.e. ord(c) – more examples:

>>> ":".join("{:02x}".format(ord(c)) for c in a_string.encode('utf-8'))
'd0:9f:d1:80:d0:b8:d0:b2:d0:b5:d1:82:20:d0:bc:d0:b8:d1:80:21:21'
>>> ":".join(format(ord(c), '02x') for c in a_string.encode('utf-8'))
'd0:9f:d1:80:d0:b8:d0:b2:d0:b5:d1:82:20:d0:bc:d0:b8:d1:80:21:21'

回答 6

您可以使用hexdump

import hexdump
hexdump.dump("Hello World", sep=":")

.lower()如果需要小写,请附加)。这适用于Python 2和3。

You can use hexdump‘s

import hexdump
hexdump.dump("Hello World", sep=":")

(append .lower() if you require lower-case). This works for both Python 2 & 3.


回答 7

使用map和lambda函数可以生成一个十六进制值列表,可以将其打印(或用于其他目的)

>>> s = 'Hello 1 2 3 \x01\x02\x03 :)'

>>> map(lambda c: hex(ord(c)), s)
['0x48', '0x65', '0x6c', '0x6c', '0x6f', '0x20', '0x31', '0x20', '0x32', '0x20', '0x33', '0x20', '0x1', '0x2', '0x3', '0x20', '0x3a', '0x29']

Using map and lambda function can produce a list of hex values, which can be printed (or used for other purposes)

>>> s = 'Hello 1 2 3 \x01\x02\x03 :)'

>>> map(lambda c: hex(ord(c)), s)
['0x48', '0x65', '0x6c', '0x6c', '0x6f', '0x20', '0x31', '0x20', '0x32', '0x20', '0x33', '0x20', '0x1', '0x2', '0x3', '0x20', '0x3a', '0x29']

回答 8

这可以通过以下方式完成:

from __future__ import print_function
str = "Hello World !!"
for char in str:
    mm = int(char.encode('hex'), 16)
    print(hex(mm), sep=':', end=' ' )

此输出将为十六进制,如下所示:

0x48 0x65 0x6c 0x6c 0x6f 0x20 0x57 0x6f 0x72 0x6c 0x64 0x20 0x21 0x21

This can be done in following ways:

from __future__ import print_function
str = "Hello World !!"
for char in str:
    mm = int(char.encode('hex'), 16)
    print(hex(mm), sep=':', end=' ' )

The output of this will be in hex as follows:

0x48 0x65 0x6c 0x6c 0x6f 0x20 0x57 0x6f 0x72 0x6c 0x64 0x20 0x21 0x21


回答 9

对于那些不关心Python3或冒号的人来说,更通用一些:

from codecs import encode

data = open('/dev/urandom', 'rb').read(20)
print(encode(data, 'hex'))      # data

print(encode(b"hello", 'hex'))  # string

A bit more general for those who don’t care about Python3 or colons:

from codecs import encode

data = open('/dev/urandom', 'rb').read(20)
print(encode(data, 'hex'))      # data

print(encode(b"hello", 'hex'))  # string

回答 10

base64.b16encodepython2中使用(内置)

>>> s = 'Hello world !!'
>>> h = base64.b16encode(s)
>>> ':'.join([h[i:i+2] for i in xrange(0, len(h), 2)]
'48:65:6C:6C:6F:20:77:6F:72:6C:64:20:21:21'

Using base64.b16encode in python2 (its built-in)

>>> s = 'Hello world !!'
>>> h = base64.b16encode(s)
>>> ':'.join([h[i:i+2] for i in xrange(0, len(h), 2)]
'48:65:6C:6C:6F:20:77:6F:72:6C:64:20:21:21'

回答 11

为了方便起见,非常简单。

def hexlify_byteString(byteString, delim="%"):
    ''' very simple way to hexlify a bytestring using delimiters '''
    retval = ""
    for intval in byteString:
        retval += ( '0123456789ABCDEF'[int(intval / 16)])
        retval += ( '0123456789ABCDEF'[int(intval % 16)])
        retval += delim
    return( retval[:-1])

hexlify_byteString(b'Hello World!', ":")
# Out[439]: '48:65:6C:6C:6F:20:57:6F:72:6C:64:21'

Just for convenience, very simple.

def hexlify_byteString(byteString, delim="%"):
    ''' very simple way to hexlify a bytestring using delimiters '''
    retval = ""
    for intval in byteString:
        retval += ( '0123456789ABCDEF'[int(intval / 16)])
        retval += ( '0123456789ABCDEF'[int(intval % 16)])
        retval += delim
    return( retval[:-1])

hexlify_byteString(b'Hello World!', ":")
# Out[439]: '48:65:6C:6C:6F:20:57:6F:72:6C:64:21'

回答 12

对于性能比更高的东西''.format(),您可以使用以下代码:

>>> ':'.join( '%02x'%(v if type(v) is int else ord(v)) for v in 'Hello World !!' )
'48:65:6C:6C:6F:20:77:6F:72:6C:64:20:21:21'
>>> 
>>> ':'.join( '%02x'%(v if type(v) is int else ord(v)) for v in b'Hello World !!' )
'48:65:6C:6C:6F:20:77:6F:72:6C:64:20:21:21'
>>> 

抱歉,
如果一个人可以简单地做到'%02x'%v这一点,那就再好不过了,但这只需要int …,
但是您会被字节字符串所困扰,b''而没有选择逻辑ord(v)

for something that offers more performance than ''.format(), you can use this:

>>> ':'.join( '%02x'%(v if type(v) is int else ord(v)) for v in 'Hello World !!' )
'48:65:6C:6C:6F:20:77:6F:72:6C:64:20:21:21'
>>> 
>>> ':'.join( '%02x'%(v if type(v) is int else ord(v)) for v in b'Hello World !!' )
'48:65:6C:6C:6F:20:77:6F:72:6C:64:20:21:21'
>>> 

sorry this couldn’t look nicer
would be nice if one could simply do '%02x'%v, but that only takes int…
but you’ll be stuck with byte-strings b'' without the logic to select ord(v).


在Python中拆分空字符串时,为什么split()返回空列表,而split(’\ n’)返回[”]?

问题:在Python中拆分空字符串时,为什么split()返回空列表,而split(’\ n’)返回[”]?

split('\n')用来获取一个字符串中的行,并发现''.split()返回一个空列表[],而''.split('\n')return ['']。有什么特殊原因造成这种差异?

还有没有更方便的方法来计算字符串中的行数?

I am using split('\n') to get lines in one string, and found that ''.split() returns an empty list, [], while ''.split('\n') returns ['']. Is there any specific reason for such a difference?

And is there any more convenient way to count lines in a string?


回答 0

问题:我正在使用split(’\ n’)在一个字符串中获取行,并发现”.split()返回空列表[],而”.split(’\ n’)返回[”] 。

所述str.split()方法有两种算法。如果未提供任何参数,它将在重复运行空白时拆分。但是,如果给出参数,则将其视为单个定界符,且不会重复运行。

在拆分空字符串的情况下,第一种模式(无参数)将返回一个空列表,因为空白被吃掉并且结果列表中没有任何值。

相比之下,第二种模式(带有参数如\n)将产生第一个空字段。考虑一下您是否写过'\n'.split('\n'),您将得到两个字段(一个字段拆分成两半)。

问题:有什么特殊原因造成这种差异?

当数据在具有可变空白量的列中对齐时,第一种模式很有用。例如:

>>> data = '''\
Shasta      California     14,200
McKinley    Alaska         20,300
Fuji        Japan          12,400
'''
>>> for line in data.splitlines():
        print line.split()

['Shasta', 'California', '14,200']
['McKinley', 'Alaska', '20,300']
['Fuji', 'Japan', '12,400']

第二种模式对于定界数据(例如CSV)很有用,其中重复的逗号表示空白字段。例如:

>>> data = '''\
Guido,BDFL,,Amsterdam
Barry,FLUFL,,USA
Tim,,,USA
'''
>>> for line in data.splitlines():
        print line.split(',')

['Guido', 'BDFL', '', 'Amsterdam']
['Barry', 'FLUFL', '', 'USA']
['Tim', '', '', 'USA']

注意,结果字段的数量比定界符的数量大一。想想剪一条绳子。如果不削减,则只有一件。一切,给出两块。进行两次切割,得到三块。Python的str.split(delimiter)方法也是如此:

>>> ''.split(',')       # No cuts
['']
>>> ','.split(',')      # One cut
['', '']
>>> ',,'.split(',')     # Two cuts
['', '', '']

问题:还有什么更方便的方法来计算字符串中的行数?

是的,有两种简单的方法。一个使用str.count(),另一个使用str.splitlines()。除非最后一行缺少,否则两种方法都将给出相同的答案\n。如果最后的换行符丢失,则str.splitlines方法将给出准确的答案。一种更快且更准确的技术是使用count方法,然后将其更正为最终的换行符:

>>> data = '''\
Line 1
Line 2
Line 3
Line 4'''

>>> data.count('\n')                               # Inaccurate
3
>>> len(data.splitlines())                         # Accurate, but slow
4
>>> data.count('\n') + (not data.endswith('\n'))   # Accurate and fast
4    

来自@Kaz的问题:为什么两个非常不同的算法被误用到一个函数中?

str.split的签名大约有20年的历史了,那个时代的许多API都是严格实用的。虽然并不完美,但方法签名也不是“糟糕的”。在大多数情况下,Guido的API设计选择经受了时间的考验。

当前的API并非没有优势。考虑如下字符串:

ps_aux_header  = "USER               PID  %CPU %MEM      VSZ"
patient_header = "name,age,height,weight"

当要求将这些字符串分成多个字段时,人们倾向于使用相同的英语单词“ split”来描述这两个字符串。当要求读取诸如fields = line.split() 或的代码时fields = line.split(','),人们倾向于正确地将语句解释为“将行拆分为字段”。

Microsoft Excel的“ 文本到列”工具做出了类似的API选择,并将两种分割算法都合并到了同一工具中。尽管似乎涉及多个算法,但人们似乎在思维上将字段拆分建模为一个单独的概念。

Question: I am using split(‘\n’) to get lines in one string, and found that ”.split() returns empty list [], while ”.split(‘\n’) returns [”].

The str.split() method has two algorithms. If no arguments are given, it splits on repeated runs of whitespace. However, if an argument is given, it is treated as a single delimiter with no repeated runs.

In the case of splitting an empty string, the first mode (no argument) will return an empty list because the whitespace is eaten and there are no values to put in the result list.

In contrast, the second mode (with an argument such as \n) will produce the first empty field. Consider if you had written '\n'.split('\n'), you would get two fields (one split, gives you two halves).

Question: Is there any specific reason for such a difference?

This first mode is useful when data is aligned in columns with variable amounts of whitespace. For example:

>>> data = '''\
Shasta      California     14,200
McKinley    Alaska         20,300
Fuji        Japan          12,400
'''
>>> for line in data.splitlines():
        print line.split()

['Shasta', 'California', '14,200']
['McKinley', 'Alaska', '20,300']
['Fuji', 'Japan', '12,400']

The second mode is useful for delimited data such as CSV where repeated commas denote empty fields. For example:

>>> data = '''\
Guido,BDFL,,Amsterdam
Barry,FLUFL,,USA
Tim,,,USA
'''
>>> for line in data.splitlines():
        print line.split(',')

['Guido', 'BDFL', '', 'Amsterdam']
['Barry', 'FLUFL', '', 'USA']
['Tim', '', '', 'USA']

Note, the number of result fields is one greater than the number of delimiters. Think of cutting a rope. If you make no cuts, you have one piece. Making one cut, gives two pieces. Making two cuts, gives three pieces. And so it is with Python’s str.split(delimiter) method:

>>> ''.split(',')       # No cuts
['']
>>> ','.split(',')      # One cut
['', '']
>>> ',,'.split(',')     # Two cuts
['', '', '']

Question: And is there any more convenient way to count lines in a string?

Yes, there are a couple of easy ways. One uses str.count() and the other uses str.splitlines(). Both ways will give the same answer unless the final line is missing the \n. If the final newline is missing, the str.splitlines approach will give the accurate answer. A faster technique that is also accurate uses the count method but then corrects it for the final newline:

>>> data = '''\
Line 1
Line 2
Line 3
Line 4'''

>>> data.count('\n')                               # Inaccurate
3
>>> len(data.splitlines())                         # Accurate, but slow
4
>>> data.count('\n') + (not data.endswith('\n'))   # Accurate and fast
4    

Question from @Kaz: Why the heck are two very different algorithms shoe-horned into a single function?

The signature for str.split is about 20 years old, and a number of the APIs from that era are strictly pragmatic. While not perfect, the method signature isn’t “terrible” either. For the most part, Guido’s API design choices have stood the test of time.

The current API is not without advantages. Consider strings such as:

ps_aux_header  = "USER               PID  %CPU %MEM      VSZ"
patient_header = "name,age,height,weight"

When asked to break these strings into fields, people tend to describe both using the same English word, “split”. When asked to read code such as fields = line.split() or fields = line.split(','), people tend to correctly interpret the statements as “splits a line into fields”.

Microsoft Excel’s text-to-columns tool made a similar API choice and incorporates both splitting algorithms in the same tool. People seem to mentally model field-splitting as a single concept even though more than one algorithm is involved.


回答 1

根据文档,这似乎只是它应该工作的方式:

使用指定的分隔符分割空字符串将返回['']

如果未指定sep或为None,则将应用不同的拆分算法:连续的空白行将被视为单个分隔符,并且如果字符串的开头或结尾处有空白,则结果在开头或结尾将不包含空字符串。因此,使用None分隔符拆分空字符串或仅包含空格的字符串将返回[]。

因此,为了更清楚一点,该split()函数实现了两种不同的拆分算法,并使用参数的存在来决定要运行哪个参数。这可能是因为它允许优化一个不带参数的参数,而不是优化带参数的参数。我不知道。

It seems to simply be the way it’s supposed to work, according to the documentation:

Splitting an empty string with a specified separator returns [''].

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].

So, to make it clearer, the split() function implements two different splitting algorithms, and uses the presence of an argument to decide which one to run. This might be because it allows optimizing the one for no arguments more than the one with arguments; I don’t know.


回答 2

.split()没有参数的人会变得聪明。它在任何空格,制表符,空格,换行符等处分割,并因此跳过所有空字符串。

>>> "  fii    fbar \n bopp ".split()
['fii', 'fbar', 'bopp']

本质上,.split()不带参数的用于从字符串中提取单词,而.split()带参数的参数只是带一个字符串并将其分割。

这就是差异的原因。

是的,通过分割来计数行不是一种有效的方法。计算换行符的数量,如果字符串不以换行符结尾,则加一个。

.split() without parameters tries to be clever. It splits on any whitespace, tabs, spaces, line feeds etc, and it also skips all empty strings as a result of this.

>>> "  fii    fbar \n bopp ".split()
['fii', 'fbar', 'bopp']

Essentially, .split() without parameters are used to extract words from a string, as opposed to .split() with parameters which just takes a string and splits it.

That’s the reason for the difference.

And yeah, counting lines by splitting is not an efficient way. Count the number of line feeds, and add one if the string doesn’t end with a line feed.


回答 3

用途count()

s = "Line 1\nLine2\nLine3"
n_lines = s.count('\n') + 1

Use count():

s = "Line 1\nLine2\nLine3"
n_lines = s.count('\n') + 1

回答 4

>>> print str.split.__doc__
S.split([sep [,maxsplit]]) -> list of strings

Return a list of the words in the string S, using sep as the
delimiter string.  If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator and empty strings are removed
from the result.

注意最后一句话。

要计算行数,您可以简单地计算行数\n

line_count = some_string.count('\n') + some_string[-1] != '\n'

最后一部分考虑到不结束最后一行\n,即使这意味着,Hello, World!Hello, World!\n具有相同的行数(这对我来说是合理的),否则,你可以简单地添加1到的计数\n

>>> print str.split.__doc__
S.split([sep [,maxsplit]]) -> list of strings

Return a list of the words in the string S, using sep as the
delimiter string.  If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator and empty strings are removed
from the result.

Note the last sentence.

To count lines you can simply count how many \n are there:

line_count = some_string.count('\n') + some_string[-1] != '\n'

The last part takes into account the last line that do not end with \n, even though this means that Hello, World! and Hello, World!\n have the same line count(which for me is reasonable), otherwise you can simply add 1 to the count of \n.


回答 5

要计算行数,可以计算换行数:

n_lines = sum(1 for s in the_string if s == "\n") + 1 # add 1 for last line

编辑

内置的另一个答案count更合适,实际上

To count lines, you can count the number of line breaks:

n_lines = sum(1 for s in the_string if s == "\n") + 1 # add 1 for last line

Edit:

The other answer with built-in count is more suitable, actually


查找和替换列表中的字符串值

问题:查找和替换列表中的字符串值

我得到了这个清单:

words = ['how', 'much', 'is[br]', 'the', 'fish[br]', 'no', 'really']

我想[br]用一些与之相似的奇异值代替,&lt;br /&gt;从而得到一个新的清单:

words = ['how', 'much', 'is<br />', 'the', 'fish<br />', 'no', 'really']

I got this list:

words = ['how', 'much', 'is[br]', 'the', 'fish[br]', 'no', 'really']

What I would like is to replace [br] with some fantastic value similar to &lt;br /&gt; and thus getting a new list:

words = ['how', 'much', 'is<br />', 'the', 'fish<br />', 'no', 'really']

回答 0

words = [w.replace('[br]', '<br />') for w in words]

这些称为列表推导

words = [w.replace('[br]', '<br />') for w in words]

These are called List Comprehensions.


回答 1

您可以使用,例如:

words = [word.replace('[br]','<br />') for word in words]

You can use, for example:

words = [word.replace('[br]','<br />') for word in words]

回答 2

除了列表理解之外,您还可以尝试地图

>>> map(lambda x: str.replace(x, "[br]", "<br/>"), words)
['how', 'much', 'is<br/>', 'the', 'fish<br/>', 'no', 'really']

Beside list comprehension, you can try map

>>> map(lambda x: str.replace(x, "[br]", "<br/>"), words)
['how', 'much', 'is<br/>', 'the', 'fish<br/>', 'no', 'really']

回答 3

如果您想知道不同方法的性能,请参考以下时间安排:

In [1]: words = [str(i) for i in range(10000)]

In [2]: %timeit replaced = [w.replace('1', '<1>') for w in words]
100 loops, best of 3: 2.98 ms per loop

In [3]: %timeit replaced = map(lambda x: str.replace(x, '1', '<1>'), words)
100 loops, best of 3: 5.09 ms per loop

In [4]: %timeit replaced = map(lambda x: x.replace('1', '<1>'), words)
100 loops, best of 3: 4.39 ms per loop

In [5]: import re

In [6]: r = re.compile('1')

In [7]: %timeit replaced = [r.sub('<1>', w) for w in words]
100 loops, best of 3: 6.15 ms per loop

如您所见,对于这种简单的模式,可接受的列表理解是最快的,但请查看以下内容:

In [8]: %timeit replaced = [w.replace('1', '<1>').replace('324', '<324>').replace('567', '<567>') for w in words]
100 loops, best of 3: 8.25 ms per loop

In [9]: r = re.compile('(1|324|567)')

In [10]: %timeit replaced = [r.sub('<\1>', w) for w in words]
100 loops, best of 3: 7.87 ms per loop

这表明对于更复杂的替换,预编译的reg-exp(如中的9-10)可以更快。这实际上取决于您的问题和reg-exp的最短部分。

In case you’re wondering about the performance of the different approaches, here are some timings:

In [1]: words = [str(i) for i in range(10000)]

In [2]: %timeit replaced = [w.replace('1', '<1>') for w in words]
100 loops, best of 3: 2.98 ms per loop

In [3]: %timeit replaced = map(lambda x: str.replace(x, '1', '<1>'), words)
100 loops, best of 3: 5.09 ms per loop

In [4]: %timeit replaced = map(lambda x: x.replace('1', '<1>'), words)
100 loops, best of 3: 4.39 ms per loop

In [5]: import re

In [6]: r = re.compile('1')

In [7]: %timeit replaced = [r.sub('<1>', w) for w in words]
100 loops, best of 3: 6.15 ms per loop

as you can see for such simple patterns the accepted list comprehension is the fastest, but look at the following:

In [8]: %timeit replaced = [w.replace('1', '<1>').replace('324', '<324>').replace('567', '<567>') for w in words]
100 loops, best of 3: 8.25 ms per loop

In [9]: r = re.compile('(1|324|567)')

In [10]: %timeit replaced = [r.sub('<\1>', w) for w in words]
100 loops, best of 3: 7.87 ms per loop

This shows that for more complicated substitutions a pre-compiled reg-exp (as in 9-10) can be (much) faster. It really depends on your problem and the shortest part of the reg-exp.


回答 4

一个for循环的示例(我更喜欢列表理解)。

a, b = '[br]', '<br />'
for i, v in enumerate(words):
    if a in v:
        words[i] = v.replace(a, b)
print(words)
# ['how', 'much', 'is<br/>', 'the', 'fish<br/>', 'no', 'really']

An example with for loop (I prefer List Comprehensions).

a, b = '[br]', '<br />'
for i, v in enumerate(words):
    if a in v:
        words[i] = v.replace(a, b)
print(words)
# ['how', 'much', 'is<br/>', 'the', 'fish<br/>', 'no', 'really']

不区分大小写

问题:不区分大小写

我喜欢使用表达

if 'MICHAEL89' in USERNAMES:
    ...

USERNAMES清单在哪里。


有什么方法可以区分大小写不敏感的项目,还是需要使用自定义方法?只是想知道是否需要为此编写额外的代码。

I love using the expression

if 'MICHAEL89' in USERNAMES:
    ...

where USERNAMES is a list.


Is there any way to match items with case insensitivity or do I need to use a custom method? Just wondering if there is a need to write extra code for this.


回答 0

username = 'MICHAEL89'
if username.upper() in (name.upper() for name in USERNAMES):
    ...

或者:

if username.upper() in map(str.upper, USERNAMES):
    ...

或者,可以的,您可以定制方法。

username = 'MICHAEL89'
if username.upper() in (name.upper() for name in USERNAMES):
    ...

Alternatively:

if username.upper() in map(str.upper, USERNAMES):
    ...

Or, yes, you can make a custom method.


回答 1

我会做一个包装纸,这样您就可以做到无创。至少,例如:

class CaseInsensitively(object):
    def __init__(self, s):
        self.__s = s.lower()
    def __hash__(self):
        return hash(self.__s)
    def __eq__(self, other):
        # ensure proper comparison between instances of this class
        try:
           other = other.__s
        except (TypeError, AttributeError):
          try:
             other = other.lower()
          except:
             pass
        return self.__s == other

现在,if CaseInsensitively('MICHAEL89') in whatever:应按要求运行(无论右侧是列表,字典还是集合)。(可能需要付出更多的努力才能获得相似的字符串包含结果,在某些情况下避免发出警告,包括unicode等等)。

I would make a wrapper so you can be non-invasive. Minimally, for example…:

class CaseInsensitively(object):
    def __init__(self, s):
        self.__s = s.lower()
    def __hash__(self):
        return hash(self.__s)
    def __eq__(self, other):
        # ensure proper comparison between instances of this class
        try:
           other = other.__s
        except (TypeError, AttributeError):
          try:
             other = other.lower()
          except:
             pass
        return self.__s == other

Now, if CaseInsensitively('MICHAEL89') in whatever: should behave as required (whether the right-hand side is a list, dict, or set). (It may require more effort to achieve similar results for string inclusion, avoid warnings in some cases involving unicode, etc).


回答 2

通常(至少在oop中),您可以对对象进行形状调整,使其表现出所需的效果。name in USERNAMES不区分大小写,因此USERNAMES需要更改:

class NameList(object):
    def __init__(self, names):
        self.names = names

    def __contains__(self, name): # implements `in`
        return name.lower() in (n.lower() for n in self.names)

    def add(self, name):
        self.names.append(name)

# now this works
usernames = NameList(USERNAMES)
print someone in usernames

这样做的好处在于,它无需进行任何类外的代码更改,便可以进行许多改进。例如,您可以将更self.names改为一组以进行更快的查找,或者(n.lower() for n in self.names)仅计算一次并将其存储在类中,依此类推…

Usually (in oop at least) you shape your object to behave the way you want. name in USERNAMES is not case insensitive, so USERNAMES needs to change:

class NameList(object):
    def __init__(self, names):
        self.names = names

    def __contains__(self, name): # implements `in`
        return name.lower() in (n.lower() for n in self.names)

    def add(self, name):
        self.names.append(name)

# now this works
usernames = NameList(USERNAMES)
print someone in usernames

The great thing about this is that it opens the path for many improvements, without having to change any code outside the class. For example, you could change the self.names to a set for faster lookups, or compute the (n.lower() for n in self.names) only once and store it on the class and so on …


回答 3

str.casefold建议使用不区分大小写的字符串匹配。@nmichaels的解决方案可以轻松调整。

使用以下任一方法:

if 'MICHAEL89'.casefold() in (name.casefold() for name in USERNAMES):

要么:

if 'MICHAEL89'.casefold() in map(str.casefold, USERNAMES):

根据文档

大小写折叠类似于小写字母,但是更具攻击性,因为它旨在消除字符串中的所有大小写区别。例如,德语小写字母“ß”等效于“ ss”。由于它已经是小写字母,lower()因此对“ß”无效。casefold() 将其转换为“ ss”。

str.casefold is recommended for case-insensitive string matching. @nmichaels’s solution can trivially be adapted.

Use either:

if 'MICHAEL89'.casefold() in (name.casefold() for name in USERNAMES):

Or:

if 'MICHAEL89'.casefold() in map(str.casefold, USERNAMES):

As per the docs:

Casefolding is similar to lowercasing but more aggressive because it is intended to remove all case distinctions in a string. For example, the German lowercase letter ‘ß’ is equivalent to “ss”. Since it is already lowercase, lower() would do nothing to ‘ß’; casefold() converts it to “ss”.


回答 4

这是一种方法:

if string1.lower() in string2.lower(): 
    ...

为此,string1string2对象都必须是type string

Here’s one way:

if string1.lower() in string2.lower(): 
    ...

For this to work, both string1 and string2 objects must be of type string.


回答 5

我认为您必须编写一些额外的代码。例如:

if 'MICHAEL89' in map(lambda name: name.upper(), USERNAMES):
   ...

在这种情况下,我们将形成一个新列表,其中包含所有条目 USERNAMES转换为大写字母,然后与该新列表进行比较。

更新资料

@viraptor所说,最好使用生成器而不是map。参见@Nathon答案

I think you have to write some extra code. For example:

if 'MICHAEL89' in map(lambda name: name.upper(), USERNAMES):
   ...

In this case we are forming a new list with all entries in USERNAMES converted to upper case and then comparing against this new list.

Update

As @viraptor says, it is even better to use a generator instead of map. See @Nathon‘s answer.


回答 6

你可以做

matcher = re.compile('MICHAEL89', re.IGNORECASE)
filter(matcher.match, USERNAMES) 

更新:玩了一会儿,我认为您可以使用以下方法获得更好的短路类型方法

matcher = re.compile('MICHAEL89', re.IGNORECASE)
if any( ifilter( matcher.match, USERNAMES ) ):
    #your code here

ifilter函数来自itertools,它是Python中我最喜欢的模块之一。它比生成器快,但仅在被调用时才创建列表的下一项。

You could do

matcher = re.compile('MICHAEL89', re.IGNORECASE)
filter(matcher.match, USERNAMES) 

Update: played around a bit and am thinking you could get a better short-circuit type approach using

matcher = re.compile('MICHAEL89', re.IGNORECASE)
if any( ifilter( matcher.match, USERNAMES ) ):
    #your code here

The ifilter function is from itertools, one of my favorite modules within Python. It’s faster than a generator but only creates the next item of the list when called upon.


回答 7

我的5分(错误)

“”中的’a’.join([‘A’])。lower()

更新

uch,完全同意@jpp,我将举一个不良做法的例子:(

My 5 (wrong) cents

‘a’ in “”.join([‘A’]).lower()

UPDATE

Ouch, totally agree @jpp, I’ll keep as an example of bad practice :(


回答 8

我需要此字典而不是列表,Jochen解决方案在这种情况下是最优雅的,因此我对其进行了修改:

class CaseInsensitiveDict(dict):
    ''' requests special dicts are case insensitive when using the in operator,
     this implements a similar behaviour'''
    def __contains__(self, name): # implements `in`
        return name.casefold() in (n.casefold() for n in self.keys())

现在您可以像这样转换字典USERNAMESDICT = CaseInsensitiveDict(USERNAMESDICT)并使用if 'MICHAEL89' in USERNAMESDICT:

I needed this for a dictionary instead of list, Jochen solution was the most elegant for that case so I modded it a bit:

class CaseInsensitiveDict(dict):
    ''' requests special dicts are case insensitive when using the in operator,
     this implements a similar behaviour'''
    def __contains__(self, name): # implements `in`
        return name.casefold() in (n.casefold() for n in self.keys())

now you can convert a dictionary like so USERNAMESDICT = CaseInsensitiveDict(USERNAMESDICT) and use if 'MICHAEL89' in USERNAMESDICT:


回答 9

为了做到这一点,这就是我所做的:

if any(([True if 'MICHAEL89' in username.upper() else False for username in USERNAMES])):
    print('username exists in list')

我没有及时测试它。我不确定它的速度/效率。

To have it in one line, this is what I did:

if any(([True if 'MICHAEL89' in username.upper() else False for username in USERNAMES])):
    print('username exists in list')

I didn’t test it time-wise though. I am not sure how fast/efficient it is.


在Python中,如何在一行代码中创建n个字符的字符串?

问题:在Python中,如何在一行代码中创建n个字符的字符串?

我需要在Python中生成一个包含n个字符的字符串。使用现有的Python库是否可以实现这一目标?例如,我需要一个由10个字母组成的字符串:

string_val = 'abcdefghij'

I need to generate a string with n characters in Python. Is there a one line answer to achieve this with the existing Python library? For instance, I need a string of 10 letters:

string_val = 'abcdefghij'

回答 0

要将同一字母重复10次:

string_val = "x" * 10  # gives you "xxxxxxxxxx"

而且,如果您想要更复杂的东西,例如n随机的小写字母,它仍然只是一行代码(不计算import语句并定义n):

from random import choice
from string import ascii_lowercase
n = 10

string_val = "".join(choice(ascii_lowercase) for i in range(n))

To simply repeat the same letter 10 times:

string_val = "x" * 10  # gives you "xxxxxxxxxx"

And if you want something more complex, like n random lowercase letters, it’s still only one line of code (not counting the import statements and defining n):

from random import choice
from string import ascii_lowercase
n = 10

string_val = "".join(choice(ascii_lowercase) for i in range(n))

回答 1

前十个小写字母为string.lowercase[:10](当然,如果您string先前已导入标准库模块,则为-)。

“使10个字符组成字符串”的其他方法:('x'*10所有10个字符均为小写xs;-),''.join(chr(ord('a')+i) for i in xrange(10))(又是前十个小写字母),等等,等等;-)。

The first ten lowercase letters are string.lowercase[:10] (if you have imported the standard library module string previously, of course;-).

Other ways to “make a string of 10 characters”: 'x'*10 (all the ten characters will be lowercase xs;-), ''.join(chr(ord('a')+i) for i in xrange(10)) (the first ten lowercase letters again), etc, etc;-).


回答 2

如果您只需要任何字母:

 'a'*10  # gives 'aaaaaaaaaa'

如果要连续字母(最多26个):

 ''.join(['%c' % x for x in range(97, 97+10)])  # gives 'abcdefghij'

if you just want any letters:

 'a'*10  # gives 'aaaaaaaaaa'

if you want consecutive letters (up to 26):

 ''.join(['%c' % x for x in range(97, 97+10)])  # gives 'abcdefghij'

回答 3

为什么要“一行”?您可以将任何东西放在一行上。

假设您希望它们以“ a”开头,并且每次增加一个字符(环绕> 26),则显示以下一行:

>>> mkstring = lambda(x): "".join(map(chr, (ord('a')+(y%26) for y in range(x))))
>>> mkstring(10)
'abcdefghij'
>>> mkstring(30)
'abcdefghijklmnopqrstuvwxyzabcd'

Why “one line”? You can fit anything onto one line.

Assuming you want them to start with ‘a’, and increment by one character each time (with wrapping > 26), here’s a line:

>>> mkstring = lambda(x): "".join(map(chr, (ord('a')+(y%26) for y in range(x))))
>>> mkstring(10)
'abcdefghij'
>>> mkstring(30)
'abcdefghijklmnopqrstuvwxyzabcd'

回答 4

这可能有点问题,但是对于那些对生成的字符串的随机性感兴趣的人,我的答案是:

import os
import string

def _pwd_gen(size=16):
    chars = string.letters
    chars_len = len(chars)
    return str().join(chars[int(ord(c) / 256. * chars_len)] for c in os.urandom(size))

请参阅这些 答案random.py的资料以获取更多见解。

This might be a little off the question, but for those interested in the randomness of the generated string, my answer would be:

import os
import string

def _pwd_gen(size=16):
    chars = string.letters
    chars_len = len(chars)
    return str().join(chars[int(ord(c) / 256. * chars_len)] for c in os.urandom(size))

See these answers and random.py‘s source for more insight.


回答 5

如果可以使用重复的字母,则可以使用*运算符:

>>> 'a'*5

'aaaaa'

If you can use repeated letters, you can use the * operator:

>>> 'a'*5

'aaaaa'

python中最有效的字符串连接方法是什么?

问题:python中最有效的字符串连接方法是什么?

有没有在Python任何有效的质量字符串连接方法(如StringBuilder的 C#或StringBuffer的在Java中)?我在这里找到以下方法:

  • 简单串联使用 +
  • 使用字符串列表和join方法
  • UserStringMutableString模块使用
  • 使用字符数组和array模块
  • cStringIOStringIO模块使用

但是您的专家使用或建议了什么,为什么?

[ 这里的一个相关问题 ]

Is there any efficient mass string concatenation method in Python (like StringBuilder in C# or StringBuffer in Java)? I found following methods here:

  • Simple concatenation using +
  • Using string list and join method
  • Using UserString from MutableString module
  • Using character array and the array module
  • Using cStringIO from StringIO module

But what do you experts use or suggest, and why?

[A related question here]


回答 0

您可能对此感兴趣:Guido 的优化轶事。尽管还应该记住这是一篇老文章,并且早于诸如此类的内容的存在''.join(尽管我猜string.joinfields大致相同)

鉴于此,如果您可以将问题塞入该array模块,则该模块可能是最快的。但是''.join可能足够快,并且具有惯用的好处,因此其他Python程序员更容易理解。

最后,优化的黄金法则:除非您知道自己需要进行优化,否则不要进行优化,而要进行衡量而不是猜测。

您可以使用该timeit模块测量不同的方法。这样可以告诉您哪个最快,而不是互联网上的随机陌生人进行猜测。

You may be interested in this: An optimization anecdote by Guido. Although it is worth remembering also that this is an old article and it predates the existence of things like ''.join (although I guess string.joinfields is more-or-less the same)

On the strength of that, the array module may be fastest if you can shoehorn your problem into it. But ''.join is probably fast enough and has the benefit of being idiomatic and thus easier for other python programmers to understand.

Finally, the golden rule of optimization: don’t optimize unless you know you need to, and measure rather than guessing.

You can measure different methods using the timeit module. That can tell you which is fastest, instead of random strangers on the internet making guesses.


回答 1

''.join(sequenceofstrings) 通常是最有效的方法-最简单,最快。

''.join(sequenceofstrings) is what usually works best — simplest and fastest.


回答 2

Python 3.6改变了使用文字字符串插值对已知组件进行字符串连接的游戏。

根据mkoistinen的答案给出测试用例,有字符串

domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'

竞争者是

  • f'http://{domain}/{lang}/{path}'0.151微秒

  • 'http://%s/%s/%s' % (domain, lang, path) -0.321微秒

  • 'http://' + domain + '/' + lang + '/' + path -0.356微秒

  • ''.join(('http://', domain, '/', lang, '/', path))0.249微秒(请注意,构建一个定长元组比构建一个定长列表更快)。

因此,目前最短和最漂亮的代码也是最快的。

在Python 3.6的Alpha版本中,f''字符串的实现是最慢的-实际上,生成的字节代码几乎等同于''.join()带有不必要调用的情况,str.__format__而没有参数的调用则只会返回self不变。这些效率低下问题已在3.6决赛之前解决。

速度可以与Python 2最快的方法(+在我的计算机上串联)形成对比。而这需要0.203 μs的8位字符串,0.259微秒如果字符串所有的Unicode。

Python 3.6 changed the game for string concatenation of known components with Literal String Interpolation.

Given the test case from mkoistinen’s answer, having strings

domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'

The contenders are

  • f'http://{domain}/{lang}/{path}'0.151 µs

  • 'http://%s/%s/%s' % (domain, lang, path) – 0.321 µs

  • 'http://' + domain + '/' + lang + '/' + path – 0.356 µs

  • ''.join(('http://', domain, '/', lang, '/', path))0.249 µs (notice that building a constant-length tuple is slightly faster than building a constant-length list).

Thus currently the shortest and the most beautiful code possible is also fastest.

In alpha versions of Python 3.6 the implementation of f'' strings was the slowest possible – actually the generated byte code is pretty much equivalent to the ''.join() case with unnecessary calls to str.__format__ which without arguments would just return self unchanged. These inefficiencies were addressed before 3.6 final.

The speed can be contrasted with the fastest method for Python 2, which is + concatenation on my computer; and that takes 0.203 µs with 8-bit strings, and 0.259 µs if the strings are all Unicode.


回答 3

这取决于您在做什么。

在Python 2.5之后,使用+运算符进行字符串连接非常快。如果您只是串联几个值,则使用+运算符最有效:

>>> x = timeit.Timer(stmt="'a' + 'b'")
>>> x.timeit()
0.039999961853027344

>>> x = timeit.Timer(stmt="''.join(['a', 'b'])")
>>> x.timeit()
0.76200008392333984

但是,如果将一个字符串放入一个循环中,则最好使用列表连接方法:

>>> join_stmt = """
... joined_str = ''
... for i in xrange(100000):
...   joined_str += str(i)
... """
>>> x = timeit.Timer(join_stmt)
>>> x.timeit(100)
13.278000116348267

>>> list_stmt = """
... str_list = []
... for i in xrange(100000):
...   str_list.append(str(i))
... ''.join(str_list)
... """
>>> x = timeit.Timer(list_stmt)
>>> x.timeit(100)
12.401000022888184

…但是请注意,在差异变得明显之前,您必须将相对大量的字符串放在一起。

It depends on what you’re doing.

After Python 2.5, string concatenation with the + operator is pretty fast. If you’re just concatenating a couple of values, using the + operator works best:

>>> x = timeit.Timer(stmt="'a' + 'b'")
>>> x.timeit()
0.039999961853027344

>>> x = timeit.Timer(stmt="''.join(['a', 'b'])")
>>> x.timeit()
0.76200008392333984

However, if you’re putting together a string in a loop, you’re better off using the list joining method:

>>> join_stmt = """
... joined_str = ''
... for i in xrange(100000):
...   joined_str += str(i)
... """
>>> x = timeit.Timer(join_stmt)
>>> x.timeit(100)
13.278000116348267

>>> list_stmt = """
... str_list = []
... for i in xrange(100000):
...   str_list.append(str(i))
... ''.join(str_list)
... """
>>> x = timeit.Timer(list_stmt)
>>> x.timeit(100)
12.401000022888184

…but notice that you have to be putting together a relatively high number of strings before the difference becomes noticeable.


回答 4

按照约翰·福伊(John Fouhy)的回答,除非必须这样做,否则不要进行优化,但是,如果您在这里问这个问题,可能正是因为您必须这样做。就我而言,我需要从字符串变量中组合一些URL……要快。我注意到(到目前为止)似乎没有人在考虑使用字符串格式方法,所以我认为我会尝试这样做,并且主要出于温和的兴趣,我认为我会把字符串插值运算符扔在那里,以获得更好的度量。老实说,我不认为这两个都会叠加成直接的’+’操作或”.join()。但猜猜怎么了?在我的Python 2.7.5系统上,字符串插值运算符将它们全部规则化,而string.format()的性能最差:

# concatenate_test.py

from __future__ import print_function
import timeit

domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'
iterations = 1000000

def meth_plus():
    '''Using + operator'''
    return 'http://' + domain + '/' + lang + '/' + path

def meth_join():
    '''Using ''.join()'''
    return ''.join(['http://', domain, '/', lang, '/', path])

def meth_form():
    '''Using string.format'''
    return 'http://{0}/{1}/{2}'.format(domain, lang, path)

def meth_intp():
    '''Using string interpolation'''
    return 'http://%s/%s/%s' % (domain, lang, path)

plus = timeit.Timer(stmt="meth_plus()", setup="from __main__ import meth_plus")
join = timeit.Timer(stmt="meth_join()", setup="from __main__ import meth_join")
form = timeit.Timer(stmt="meth_form()", setup="from __main__ import meth_form")
intp = timeit.Timer(stmt="meth_intp()", setup="from __main__ import meth_intp")

plus.val = plus.timeit(iterations)
join.val = join.timeit(iterations)
form.val = form.timeit(iterations)
intp.val = intp.timeit(iterations)

min_val = min([plus.val, join.val, form.val, intp.val])

print('plus %0.12f (%0.2f%% as fast)' % (plus.val, (100 * min_val / plus.val), ))
print('join %0.12f (%0.2f%% as fast)' % (join.val, (100 * min_val / join.val), ))
print('form %0.12f (%0.2f%% as fast)' % (form.val, (100 * min_val / form.val), ))
print('intp %0.12f (%0.2f%% as fast)' % (intp.val, (100 * min_val / intp.val), ))

结果:

# python2.7 concatenate_test.py
plus 0.360787868500 (90.81% as fast)
join 0.452811956406 (72.36% as fast)
form 0.502608060837 (65.19% as fast)
intp 0.327636957169 (100.00% as fast)

如果我使用较短的域和较短的路径,则插值仍然胜出。但是,更长的字符串之间的区别更加明显。

现在,我有了一个不错的测试脚本,我也在Python 2.6、3.3和3.4下进行了测试,这是结果。在Python 2.6中,加号运算符是最快的!在Python 3上,join胜出。注意:这些测试在我的系统上是非常可重复的。因此,“ plus”在2.6上总是更快,“ intp”在2.7上总是更快,而“ join”在Python 3.x上总是更快。

# python2.6 concatenate_test.py
plus 0.338213920593 (100.00% as fast)
join 0.427221059799 (79.17% as fast)
form 0.515371084213 (65.63% as fast)
intp 0.378169059753 (89.43% as fast)

# python3.3 concatenate_test.py
plus 0.409130576998 (89.20% as fast)
join 0.364938726001 (100.00% as fast)
form 0.621366866995 (58.73% as fast)
intp 0.419064424001 (87.08% as fast)

# python3.4 concatenate_test.py
plus 0.481188605998 (85.14% as fast)
join 0.409673971997 (100.00% as fast)
form 0.652010936996 (62.83% as fast)
intp 0.460400978001 (88.98% as fast)

# python3.5 concatenate_test.py
plus 0.417167026084 (93.47% as fast)
join 0.389929617057 (100.00% as fast)
form 0.595661019906 (65.46% as fast)
intp 0.404455224983 (96.41% as fast)

学过的知识:

  • 有时,我的假设是完全错误的。
  • 针对系统环境进行测试。您将在生产中运行。
  • 字符串插值还没有结束!

tl; dr:

  • 如果使用2.6,请使用+运算符。
  • 如果您使用的是2.7,请使用’%’运算符。
  • 如果您使用的是3.x,请使用”.join()。

As per John Fouhy’s answer, don’t optimize unless you have to, but if you’re here and asking this question, it may be precisely because you have to. In my case, I needed assemble some URLs from string variables… fast. I noticed no one (so far) seems to be considering the string format method, so I thought I’d try that and, mostly for mild interest, I thought I’d toss the string interpolation operator in there for good measuer. To be honest, I didn’t think either of these would stack up to a direct ‘+’ operation or a ”.join(). But guess what? On my Python 2.7.5 system, the string interpolation operator rules them all and string.format() is the worst performer:

# concatenate_test.py

from __future__ import print_function
import timeit

domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'
iterations = 1000000

def meth_plus():
    '''Using + operator'''
    return 'http://' + domain + '/' + lang + '/' + path

def meth_join():
    '''Using ''.join()'''
    return ''.join(['http://', domain, '/', lang, '/', path])

def meth_form():
    '''Using string.format'''
    return 'http://{0}/{1}/{2}'.format(domain, lang, path)

def meth_intp():
    '''Using string interpolation'''
    return 'http://%s/%s/%s' % (domain, lang, path)

plus = timeit.Timer(stmt="meth_plus()", setup="from __main__ import meth_plus")
join = timeit.Timer(stmt="meth_join()", setup="from __main__ import meth_join")
form = timeit.Timer(stmt="meth_form()", setup="from __main__ import meth_form")
intp = timeit.Timer(stmt="meth_intp()", setup="from __main__ import meth_intp")

plus.val = plus.timeit(iterations)
join.val = join.timeit(iterations)
form.val = form.timeit(iterations)
intp.val = intp.timeit(iterations)

min_val = min([plus.val, join.val, form.val, intp.val])

print('plus %0.12f (%0.2f%% as fast)' % (plus.val, (100 * min_val / plus.val), ))
print('join %0.12f (%0.2f%% as fast)' % (join.val, (100 * min_val / join.val), ))
print('form %0.12f (%0.2f%% as fast)' % (form.val, (100 * min_val / form.val), ))
print('intp %0.12f (%0.2f%% as fast)' % (intp.val, (100 * min_val / intp.val), ))

The results:

# python2.7 concatenate_test.py
plus 0.360787868500 (90.81% as fast)
join 0.452811956406 (72.36% as fast)
form 0.502608060837 (65.19% as fast)
intp 0.327636957169 (100.00% as fast)

If I use a shorter domain and shorter path, interpolation still wins out. The difference is more pronounced, though, with longer strings.

Now that I had a nice test script, I also tested under Python 2.6, 3.3 and 3.4, here’s the results. In Python 2.6, the plus operator is the fastest! On Python 3, join wins out. Note: these tests are very repeatable on my system. So, ‘plus’ is always faster on 2.6, ‘intp’ is always faster on 2.7 and ‘join’ is always faster on Python 3.x.

# python2.6 concatenate_test.py
plus 0.338213920593 (100.00% as fast)
join 0.427221059799 (79.17% as fast)
form 0.515371084213 (65.63% as fast)
intp 0.378169059753 (89.43% as fast)

# python3.3 concatenate_test.py
plus 0.409130576998 (89.20% as fast)
join 0.364938726001 (100.00% as fast)
form 0.621366866995 (58.73% as fast)
intp 0.419064424001 (87.08% as fast)

# python3.4 concatenate_test.py
plus 0.481188605998 (85.14% as fast)
join 0.409673971997 (100.00% as fast)
form 0.652010936996 (62.83% as fast)
intp 0.460400978001 (88.98% as fast)

# python3.5 concatenate_test.py
plus 0.417167026084 (93.47% as fast)
join 0.389929617057 (100.00% as fast)
form 0.595661019906 (65.46% as fast)
intp 0.404455224983 (96.41% as fast)

Lesson learned:

  • Sometimes, my assumptions are dead wrong.
  • Test against the system env. you’ll be running in production.
  • String interpolation isn’t dead yet!

tl;dr:

  • If you using 2.6, use the + operator.
  • if you’re using 2.7 use the ‘%’ operator.
  • if you’re using 3.x use ”.join().

回答 5

它在很大程度上取决于每个新串联后新字符串的相对大小。对于+运算符,对于每个串联,都会创建一个新字符串。如果中间字符串相对较长,则+由于存储新的中间字符串而变得越来越慢。

考虑这种情况:

from time import time
stri=''
a='aagsdfghfhdyjddtyjdhmfghmfgsdgsdfgsdfsdfsdfsdfsdfsdfddsksarigqeirnvgsdfsdgfsdfgfg'
l=[]
#case 1
t=time()
for i in range(1000):
    stri=stri+a+repr(i)
print time()-t

#case 2
t=time()
for i in xrange(1000):
    l.append(a+repr(i))
z=''.join(l)
print time()-t

#case 3
t=time()
for i in range(1000):
    stri=stri+repr(i)
print time()-t

#case 4
t=time()
for i in xrange(1000):
    l.append(repr(i))
z=''.join(l)
print time()-t

结果

1 0.00493192672729

2 0.000509023666382

3 0.00042200088501

4 0.000482797622681

在1&2的情况下,我们添加了一个大字符串,join()的执行速度提高了约10倍。在情况3&4中,我们添加一个小字符串,并且’+’的执行速度稍快

it pretty much depends on the relative sizes of the new string after every new concatenation. With the + operator, for every concatenation a new string is made. If the intermediary strings are relatively long, the + becomes increasingly slower because the new intermediary string is being stored.

Consider this case:

from time import time
stri=''
a='aagsdfghfhdyjddtyjdhmfghmfgsdgsdfgsdfsdfsdfsdfsdfsdfddsksarigqeirnvgsdfsdgfsdfgfg'
l=[]
#case 1
t=time()
for i in range(1000):
    stri=stri+a+repr(i)
print time()-t

#case 2
t=time()
for i in xrange(1000):
    l.append(a+repr(i))
z=''.join(l)
print time()-t

#case 3
t=time()
for i in range(1000):
    stri=stri+repr(i)
print time()-t

#case 4
t=time()
for i in xrange(1000):
    l.append(repr(i))
z=''.join(l)
print time()-t

Results

1 0.00493192672729

2 0.000509023666382

3 0.00042200088501

4 0.000482797622681

In the case of 1&2, we add a large string, and join() performs about 10 times faster. In case 3&4, we add a small string, and ‘+’ performs slightly faster


回答 6

我遇到了一种情况,我需要一个未知大小的可附加字符串。这些是基准测试结果(python 2.7.3):

$ python -m timeit -s 's=""' 's+="a"'
10000000 loops, best of 3: 0.176 usec per loop
$ python -m timeit -s 's=[]' 's.append("a")'
10000000 loops, best of 3: 0.196 usec per loop
$ python -m timeit -s 's=""' 's="".join((s,"a"))'
100000 loops, best of 3: 16.9 usec per loop
$ python -m timeit -s 's=""' 's="%s%s"%(s,"a")'
100000 loops, best of 3: 19.4 usec per loop

这似乎表明“ + =”是最快的。skymind链接的结果有些过时。

(我意识到第二个示例还不完整,最终列表将需要加入。但是,这确实表明,仅准备列表所花费的时间比字符串concat要长。)

I ran into a situation where I needed to have an appendable string of unknown size. These are the benchmark results (python 2.7.3):

$ python -m timeit -s 's=""' 's+="a"'
10000000 loops, best of 3: 0.176 usec per loop
$ python -m timeit -s 's=[]' 's.append("a")'
10000000 loops, best of 3: 0.196 usec per loop
$ python -m timeit -s 's=""' 's="".join((s,"a"))'
100000 loops, best of 3: 16.9 usec per loop
$ python -m timeit -s 's=""' 's="%s%s"%(s,"a")'
100000 loops, best of 3: 19.4 usec per loop

This seems to show that ‘+=’ is the fastest. The results from the skymind link are a bit out of date.

(I realize that the second example is not complete, the final list would need to be joined. This does show, however, that simply preparing the list takes longer than the string concat.)


回答 7

一年后,让我们用python 3.4.3测试mkoistinen的答案:

  • 加0.963564149000(速度为95.83%)
  • 加入0.923408469000(速度为100.00%)
  • 表格1.501130934000(速度为61.51%)
  • intp 1.019677452000(速度为90.56%)

没有改变。加入仍然是最快的方法。就可读性而言,可以说intp是最佳选择,但是您可能仍想使用intp。

One Year later, let’s test mkoistinen’s answer with python 3.4.3:

  • plus 0.963564149000 (95.83% as fast)
  • join 0.923408469000 (100.00% as fast)
  • form 1.501130934000 (61.51% as fast)
  • intp 1.019677452000 (90.56% as fast)

Nothing changed. Join is still the fastest method. With intp being arguably the best choice in terms of readability you might want to use intp nevertheless.


回答 8

受到@JasonBaker基准测试的启发,下面是一个比较10个"abcdefghijklmnopqrstuvxyz"字符串的简单示例,它显示了.join()更快的速度。即使变量有微小增加:

链状

>>> x = timeit.Timer(stmt='"abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz"')
>>> x.timeit()
0.9828147209324385

加入

>>> x = timeit.Timer(stmt='"".join(["abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz"])')
>>> x.timeit()
0.6114138159765048

Inspired by @JasonBaker’s benchmarks, here’s a simple one comparing 10 "abcdefghijklmnopqrstuvxyz" strings, showing that .join() is faster; even with this tiny increase in variables:

Catenation

>>> x = timeit.Timer(stmt='"abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz"')
>>> x.timeit()
0.9828147209324385

Join

>>> x = timeit.Timer(stmt='"".join(["abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz"])')
>>> x.timeit()
0.6114138159765048

回答 9

对于一部分短字符串(即2个或3个不超过几个字符的字符串),加号的速度仍然更快。在Python 2和3中使用mkoistinen的出色脚本:

plus 2.679107467004 (100.00% as fast)
join 3.653773699996 (73.32% as fast)
form 6.594011374000 (40.63% as fast)
intp 4.568015249999 (58.65% as fast)

因此,当您的代码执行大量单独的小串联时,如果速度至关重要,则plus是首选方法。

For a small set of short strings (i.e. 2 or 3 strings of no more than a few characters), plus is still way faster. Using mkoistinen’s wonderful script in Python 2 and 3:

plus 2.679107467004 (100.00% as fast)
join 3.653773699996 (73.32% as fast)
form 6.594011374000 (40.63% as fast)
intp 4.568015249999 (58.65% as fast)

So when your code is doing a huge number of separate small concatenations, plus is the preferred way if speed is crucial.


回答 10

可能“ Python 3.6中的新f字符串”是连接字符串的最有效方法。

使用%s

>>> timeit.timeit("""name = "Some"
... age = 100
... '%s is %s.' % (name, age)""", number = 10000)
0.0029734770068898797

使用.format

>>> timeit.timeit("""name = "Some"
... age = 100
... '{} is {}.'.format(name, age)""", number = 10000)
0.004015227983472869

使用f

>>> timeit.timeit("""name = "Some"
... age = 100
... f'{name} is {age}.'""", number = 10000)
0.0019175919878762215

资料来源:https : //realpython.com/python-f-strings/

Probably “new f-strings in Python 3.6” is the most efficient way of concatenating strings.

Using %s

>>> timeit.timeit("""name = "Some"
... age = 100
... '%s is %s.' % (name, age)""", number = 10000)
0.0029734770068898797

Using .format

>>> timeit.timeit("""name = "Some"
... age = 100
... '{} is {}.'.format(name, age)""", number = 10000)
0.004015227983472869

Using f

>>> timeit.timeit("""name = "Some"
... age = 100
... f'{name} is {age}.'""", number = 10000)
0.0019175919878762215

Source: https://realpython.com/python-f-strings/


为什么`if None .__ eq __(“ a”)`似乎评估为True(但不完全)?

问题:为什么`if None .__ eq __(“ a”)`似乎评估为True(但不完全)?

如果您在Python 3.7中执行以下语句,它将(根据我的测试)打印b

if None.__eq__("a"):
    print("b")

但是,None.__eq__("a")计算为NotImplemented

当然,"a".__eq__("a")计算结果为True,并"b".__eq__("a")计算结果为False

我最初是在测试函数的返回值时发现此问题的,但是在第二种情况下却未返回任何内容-因此,该函数返回了None

这里发生了什么?

If you execute the following statement in Python 3.7, it will (from my testing) print b:

if None.__eq__("a"):
    print("b")

However, None.__eq__("a") evaluates to NotImplemented.

Naturally, "a".__eq__("a") evaluates to True, and "b".__eq__("a") evaluates to False.

I initially discovered this when testing the return value of a function, but didn’t return anything in the second case — so, the function returned None.

What’s going on here?


回答 0

这是一个很好的例子,说明为什么__dunder__不应该直接使用这些方法,因为它们通常不是等效操作符的适当替代;您应该使用==运算符来代替相等性比较,或者在这种特殊情况下,当检查时None,请使用is(跳至答案的底部以获取更多信息)。

你做完了

None.__eq__('a')
# NotImplemented

NotImplemented由于所比较的类型不同,返回的结果不同。考虑另一个示例,其中以这种方式比较了具有不同类型的两个对象,例如1'a'。这样做(1).__eq__('a')也不正确,并且会返回NotImplemented。比较这两个值是否相等的正确方法是

1 == 'a'
# False

这里发生的是

  1. 首先,(1).__eq__('a')尝试,然后返回NotImplemented。这表明不支持该操作,因此
  2. 'a'.__eq__(1)被调用,它也返回相同的NotImplemented。所以,
  3. 将对象视为不一样,然后False将其返回。

这是一个不错的小MCVE,它使用一些自定义类来说明这种情况:

class A:
    def __eq__(self, other):
        print('A.__eq__')
        return NotImplemented

class B:
    def __eq__(self, other):
        print('B.__eq__')
        return NotImplemented

class C:
    def __eq__(self, other):
        print('C.__eq__')
        return True

a = A()
b = B()
c = C()

print(a == b)
# A.__eq__
# B.__eq__
# False

print(a == c)
# A.__eq__
# C.__eq__
# True

print(c == a)
# C.__eq__
# True

当然,这并不能解释为什么该操作返回true。这是因为NotImplemented实际上是一个真实值:

bool(None.__eq__("a"))
# True

和…一样,

bool(NotImplemented)
# True

有关什么值被视为真实和虚假的更多信息,请参阅真值测试的文档部分以及此答案。值得注意的是,这里NotImplemented是truthy,但它会是一个不同的故事有类中定义一个__bool____len__方法返回False0分别。


如果要==使用与运算符等效的功能,请使用operator.eq

import operator
operator.eq(1, 'a')
# False

但是,如前所述,对于要检查的特定情况,请None使用is

var = 'a'
var is None
# False

var2 = None
var2 is None
# True

其功能等效项是使用operator.is_

operator.is_(var2, None)
# True

None是一个特殊对象,并且在任何时间内存中只有1个版本。IOW,它是NoneType该类的唯一单例(但是同一对象可以具有任意数量的引用)。该PEP8方针更加明确:

与单例之类的比较None应始终使用isis not,而不应使用相等运算符。

综上所述,对于单身人士喜欢None,与基准检查is是比较合适的,虽然两者==is会工作得很好。

This is a great example of why the __dunder__ methods should not be used directly as they are quite often not appropriate replacements for their equivalent operators; you should use the == operator instead for equality comparisons, or in this special case, when checking for None, use is (skip to the bottom of the answer for more information).

You’ve done

None.__eq__('a')
# NotImplemented

Which returns NotImplemented since the types being compared are different. Consider another example where two objects with different types are being compared in this fashion, such as 1 and 'a'. Doing (1).__eq__('a') is also not correct, and will return NotImplemented. The right way to compare these two values for equality would be

1 == 'a'
# False

What happens here is

  1. First, (1).__eq__('a') is tried, which returns NotImplemented. This indicates that the operation is not supported, so
  2. 'a'.__eq__(1) is called, which also returns the same NotImplemented. So,
  3. The objects are treated as if they are not the same, and False is returned.

Here’s a nice little MCVE using some custom classes to illustrate how this happens:

class A:
    def __eq__(self, other):
        print('A.__eq__')
        return NotImplemented

class B:
    def __eq__(self, other):
        print('B.__eq__')
        return NotImplemented

class C:
    def __eq__(self, other):
        print('C.__eq__')
        return True

a = A()
b = B()
c = C()

print(a == b)
# A.__eq__
# B.__eq__
# False

print(a == c)
# A.__eq__
# C.__eq__
# True

print(c == a)
# C.__eq__
# True

Of course, that doesn’t explain why the operation returns true. This is because NotImplemented is actually a truthy value:

bool(None.__eq__("a"))
# True

Same as,

bool(NotImplemented)
# True

For more information on what values are considered truthy and falsy, see the docs section on Truth Value Testing, as well as this answer. It is worth noting here that NotImplemented is truthy, but it would have been a different story had the class defined a __bool__ or __len__ method that returned False or 0 respectively.


If you want the functional equivalent of the == operator, use operator.eq:

import operator
operator.eq(1, 'a')
# False

However, as mentioned earlier, for this specific scenario, where you are checking for None, use is:

var = 'a'
var is None
# False

var2 = None
var2 is None
# True

The functional equivalent of this is using operator.is_:

operator.is_(var2, None)
# True

None is a special object, and only 1 version exists in memory at any point of time. IOW, it is the sole singleton of the NoneType class (but the same object may have any number of references). The PEP8 guidelines make this explicit:

Comparisons to singletons like None should always be done with is or is not, never the equality operators.

In summary, for singletons like None, a reference check with is is more appropriate, although both == and is will work just fine.


回答 1

您看到的结果是由于以下事实造成的:

None.__eq__("a") # evaluates to NotImplemented

评估为NotImplemented,其NotImplemented真实值记录为True

https://docs.python.org/3/library/constants.html

这应该由二进制特殊的方法被返回(如特殊的值__eq__()__lt__()__add__()__rsub__(),等等),以指示该操作不相对于另一种类型的实施; 可通过就地二进制特殊的方法(例如被返回__imul__()__iand__()为了相同的目的,等等)。它的真实价值是真实的。

如果您__eq()__手动调用该方法,而不仅仅是使用==,则需要准备好处理它可能返回NotImplemented并且其真实值是true 的可能性。

The result you are seeing is caused by that fact that

None.__eq__("a") # evaluates to NotImplemented

evaluates to NotImplemented, and NotImplemented‘s truth value is documented to be True:

https://docs.python.org/3/library/constants.html

Special value which should be returned by the binary special methods (e.g. __eq__(), __lt__(), __add__(), __rsub__(), etc.) to indicate that the operation is not implemented with respect to the other type; may be returned by the in-place binary special methods (e.g. __imul__(), __iand__(), etc.) for the same purpose. Its truth value is true.

If you call the __eq()__ method manually rather than just using ==, you need to be prepared to deal with the possibility it may return NotImplemented and that its truth value is true.


回答 2

正如您已经想过的None.__eq__("a")NotImplemented但是如果尝试类似

if NotImplemented:
    print("Yes")
else:
    print("No")

结果是

这意味着 NotImplemented true

因此,问题的结果显而易见:

None.__eq__(something) Yield NotImplemented

bool(NotImplemented)评估为True

所以if None.__eq__("a")永远是真的

As you already figured None.__eq__("a") evaluates to NotImplemented however if you try something like

if NotImplemented:
    print("Yes")
else:
    print("No")

the result is

yes

this mean that the truth value of NotImplemented true

Therefor the outcome of the question is obvious:

None.__eq__(something) yields NotImplemented

And bool(NotImplemented) evaluates to True

So if None.__eq__("a") is always True


回答 3

为什么?

它返回一个NotImplemented,是的:

>>> None.__eq__('a')
NotImplemented
>>> 

但是,如果您看一下:

>>> bool(NotImplemented)
True
>>> 

NotImplemented实际上是一个真实的值,所以这就是它返回的原因b,任何True会通过的东西,不会通过的东西False

怎么解决呢?

您必须检查它是否为True,因此请更加可疑,如下所示:

>>> NotImplemented == True
False
>>> 

所以你会做:

>>> if None.__eq__('a') == True:
    print('b')


>>> 

如您所见,它不会返回任何内容。

Why?

It returns a NotImplemented, yeah:

>>> None.__eq__('a')
NotImplemented
>>> 

But if you look at this:

>>> bool(NotImplemented)
True
>>> 

NotImplemented is actually a truthy value, so that’s why it returns b, anything that is True will pass, anything that is False wouldn’t.

How to solve it?

You have to check if it is True, so be more suspicious, as you see:

>>> NotImplemented == True
False
>>> 

So you would do:

>>> if None.__eq__('a') == True:
    print('b')


>>> 

And as you see, it wouldn’t return anything.