标签归档:lowercase

如何在Python中小写一个字符串?

问题:如何在Python中小写一个字符串?

有没有一种方法可以将字符串从大写,甚至部分大写转换为小写?

例如,“公里”→“公里”。

Is there a way to convert a string from uppercase, or even part uppercase to lowercase?

For example, “Kilometers” → “kilometers”.


回答 0

用途.lower()-例如:

s = "Kilometer"
print(s.lower())

官方2.x文档在这里: 官方3.x文档在这里:str.lower()
str.lower()

Use .lower() – For example:

s = "Kilometer"
print(s.lower())

The official 2.x documentation is here: str.lower()
The official 3.x documentation is here: str.lower()


回答 1

如何在Python中将字符串转换为小写?

有什么办法可以将整个用户输入的字符串从大写甚至部分大写转换为小写?

例如公里->公里

规范的Python方式是

>>> 'Kilometers'.lower()
'kilometers'

但是,如果目的是进行不区分大小写的匹配,则应使用大小写折叠:

>>> 'Kilometers'.casefold()
'kilometers'

原因如下:

>>> "Maße".casefold()
'masse'
>>> "Maße".lower()
'maße'
>>> "MASSE" == "Maße"
False
>>> "MASSE".lower() == "Maße".lower()
False
>>> "MASSE".casefold() == "Maße".casefold()
True

这是Python 3中的str方法,但是在Python 2中,您需要查看PyICU或py2casefold- 几个答案在此解决

Unicode Python 3

Python 3将纯字符串文字处理为unicode:

>>> string = 'Километр'
>>> string
'Километр'
>>> string.lower()
'километр'

Python 2,纯字符串文字是字节

在Python 2中,将以下内容粘贴到外壳中,使用以下命令将文字编码为字节字符串 utf-8

并且lower不映射字节会知道的任何更改,因此我们得到相同的字符串。

>>> string = 'Километр'
>>> string
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> string.lower()
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> print string.lower()
Километр

在脚本中,Python将反对非ascii(从Python 2.5开始,在Python 2.4中为警告)字节,该字节位于未给出编码的字符串中,因为预期的编码将是模棱两可的。有关更多信息,请参阅文档PEP 263中的Unicode操作方法。

使用Unicode文字,而不是str文字

因此,我们需要一个unicode字符串来处理此转换,只需使用unicode字符串文字即可轻松完成此操作,该字符串文字可以用u前缀消除歧义(请注意,该u前缀在Python 3中也适用):

>>> unicode_literal = u'Километр'
>>> print(unicode_literal.lower())
километр

请注意,字节与字节完全不同str-转义字符'\u'后跟2字节宽度,或这些unicode字母的16位表示形式:

>>> unicode_literal
u'\u041a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> unicode_literal.lower()
u'\u043a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'

现在,如果我们仅使用a的形式str,则需要将其转换为unicode。Python的Unicode类型是一种通用编码格式,相对于大多数其他编码而言,它具有许多优点。我们可以使用unicode构造函数或str.decode编解码器方法将转换strunicode

>>> unicode_from_string = unicode(string, 'utf-8') # "encoding" unicode from string
>>> print(unicode_from_string.lower())
километр
>>> string_to_unicode = string.decode('utf-8') 
>>> print(string_to_unicode.lower())
километр
>>> unicode_from_string == string_to_unicode == unicode_literal
True

两种方法都转换为unicode类型-并与unicode_literal相同。

最佳做法,使用Unicode

建议始终使用Unicode文本

软件仅应在内部使用Unicode字符串,并在输出时转换为特定的编码。

必要时可以回编码

但是,要使小写字母恢复为type str,请utf-8再次将python字符串编码为:

>>> print string
Километр
>>> string
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> string.decode('utf-8')
u'\u041a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> string.decode('utf-8').lower()
u'\u043a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> string.decode('utf-8').lower().encode('utf-8')
'\xd0\xba\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> print string.decode('utf-8').lower().encode('utf-8')
километр

因此,在Python 2中,Unicode可以编码为Python字符串,而Python字符串可以解码为Unicode类型。

How to convert string to lowercase in Python?

Is there any way to convert an entire user inputted string from uppercase, or even part uppercase to lowercase?

E.g. Kilometers –> kilometers

The canonical Pythonic way of doing this is

>>> 'Kilometers'.lower()
'kilometers'

However, if the purpose is to do case insensitive matching, you should use case-folding:

>>> 'Kilometers'.casefold()
'kilometers'

Here’s why:

>>> "Maße".casefold()
'masse'
>>> "Maße".lower()
'maße'
>>> "MASSE" == "Maße"
False
>>> "MASSE".lower() == "Maße".lower()
False
>>> "MASSE".casefold() == "Maße".casefold()
True

This is a str method in Python 3, but in Python 2, you’ll want to look at the PyICU or py2casefold – several answers address this here.

Unicode Python 3

Python 3 handles plain string literals as unicode:

>>> string = 'Километр'
>>> string
'Километр'
>>> string.lower()
'километр'

Python 2, plain string literals are bytes

In Python 2, the below, pasted into a shell, encodes the literal as a string of bytes, using utf-8.

And lower doesn’t map any changes that bytes would be aware of, so we get the same string.

>>> string = 'Километр'
>>> string
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> string.lower()
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> print string.lower()
Километр

In scripts, Python will object to non-ascii (as of Python 2.5, and warning in Python 2.4) bytes being in a string with no encoding given, since the intended coding would be ambiguous. For more on that, see the Unicode how-to in the docs and PEP 263

Use Unicode literals, not str literals

So we need a unicode string to handle this conversion, accomplished easily with a unicode string literal, which disambiguates with a u prefix (and note the u prefix also works in Python 3):

>>> unicode_literal = u'Километр'
>>> print(unicode_literal.lower())
километр

Note that the bytes are completely different from the str bytes – the escape character is '\u' followed by the 2-byte width, or 16 bit representation of these unicode letters:

>>> unicode_literal
u'\u041a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> unicode_literal.lower()
u'\u043a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'

Now if we only have it in the form of a str, we need to convert it to unicode. Python’s Unicode type is a universal encoding format that has many advantages relative to most other encodings. We can either use the unicode constructor or str.decode method with the codec to convert the str to unicode:

>>> unicode_from_string = unicode(string, 'utf-8') # "encoding" unicode from string
>>> print(unicode_from_string.lower())
километр
>>> string_to_unicode = string.decode('utf-8') 
>>> print(string_to_unicode.lower())
километр
>>> unicode_from_string == string_to_unicode == unicode_literal
True

Both methods convert to the unicode type – and same as the unicode_literal.

Best Practice, use Unicode

It is recommended that you always work with text in Unicode.

Software should only work with Unicode strings internally, converting to a particular encoding on output.

Can encode back when necessary

However, to get the lowercase back in type str, encode the python string to utf-8 again:

>>> print string
Километр
>>> string
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> string.decode('utf-8')
u'\u041a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> string.decode('utf-8').lower()
u'\u043a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> string.decode('utf-8').lower().encode('utf-8')
'\xd0\xba\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> print string.decode('utf-8').lower().encode('utf-8')
километр

So in Python 2, Unicode can encode into Python strings, and Python strings can decode into the Unicode type.


回答 2

对于Python 2,这不适用于UTF-8中的非英语单词。在这种情况下decode('utf-8')可以帮助:

>>> s='Километр'
>>> print s.lower()
Километр
>>> print s.decode('utf-8').lower()
километр

With Python 2, this doesn’t work for non-English words in UTF-8. In this case decode('utf-8') can help:

>>> s='Километр'
>>> print s.lower()
Километр
>>> print s.decode('utf-8').lower()
километр

回答 3

另外,您可以覆盖一些变量:

s = input('UPPER CASE')
lower = s.lower()

如果您这样使用:

s = "Kilometer"
print(s.lower())     - kilometer
print(s)             - Kilometer

它会在被调用时起作用。

Also, you can overwrite some variables:

s = input('UPPER CASE')
lower = s.lower()

If you use like this:

s = "Kilometer"
print(s.lower())     - kilometer
print(s)             - Kilometer

It will work just when called.


回答 4

请勿尝试,完全不推荐,请勿这样做:

import string
s='ABCD'
print(''.join([string.ascii_lowercase[string.ascii_uppercase.index(i)] for i in s]))

输出:

abcd

由于尚无人编写,因此您可以使用 swapcase(因此大写字母将变为小写,反之亦然)(并且在我刚才提到的情况下,应使用此字母(将大写转换为小写,将小写转换为大写)):

s='ABCD'
print(s.swapcase())

输出:

abcd

Don’t try this, totally un-recommend, don’t do this:

import string
s='ABCD'
print(''.join([string.ascii_lowercase[string.ascii_uppercase.index(i)] for i in s]))

Output:

abcd

Since no one wrote it yet you can use swapcase (so uppercase letters will become lowercase, and vice versa) (and this one you should use in cases where i just mentioned (convert upper to lower, lower to upper)):

s='ABCD'
print(s.swapcase())

Output:

abcd