Python字符串中的u前缀是什么?

问题:Python字符串中的u前缀是什么?

像:

u'Hello'

我的猜测是它表示“ Unicode”,对吗?

如果可以,那么什么时候可用?

Like in:

u'Hello'

My guess is that it indicates “Unicode”, is it correct?

If so, since when is it available?


回答 0

您说得对,请参阅3.1.3。Unicode字符串

自Python 2.0以来就是语法。

Python 3使它们多余,因为默认的字符串类型是Unicode。从3.0到3.2版本删除了它们,但是在3.3+重新添加了它们,以便与Python 2兼容,以辅助2到3过渡。

You’re right, see 3.1.3. Unicode Strings.

It’s been the syntax since Python 2.0.

Python 3 made them redundant, as the default string type is Unicode. Versions 3.0 through 3.2 removed them, but they were re-added in 3.3+ for compatibility with Python 2 to aide the 2 to 3 transition.


回答 1

u in u'Some String'表示您的字符串是Unicode字符串

问:我很着急,很可怕,我从Google搜索登陆了。我正在尝试将此数据写入文件,但出现错误,并且此刻我需要最简单的,可能有缺陷的解决方案。

答:您应该真正阅读Joel的“ 绝对最低要求”,每个软件开发人员绝对,肯定必须了解有关字符集的Unicode和字符集(无借口!)文章。

问:请不要输入时间码

A:很好。尝试str('Some String')'Some String'.encode('ascii', 'ignore')。但是,你真的应该读一些答案和讨论上转换Unicode字符串这个优秀,卓越,底漆上的字符编码。

The u in u'Some String' means that your string is a Unicode string.

Q: I’m in a terrible, awful hurry and I landed here from Google Search. I’m trying to write this data to a file, I’m getting an error, and I need the dead simplest, probably flawed, solution this second.

A: You should really read Joel’s Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) essay on character sets.

Q: sry no time code pls

A: Fine. try str('Some String') or 'Some String'.encode('ascii', 'ignore'). But you should really read some of the answers and discussion on Converting a Unicode string and this excellent, excellent, primer on character encoding.


回答 2

我的猜测是它表示“ Unicode”,对吗?

是。

如果可以,那么什么时候可用?

Python2.x。

在Python 3.x中,默认情况下,字符串使用Unicode,并且不需要u前缀。注意:在Python 3.0-3.2中,u是语法错误。在Python 3.3以上版本中,再次简化编写2/3兼容应用程序是合法的。

My guess is that it indicates “Unicode”, is it correct?

Yes.

If so, since when is it available?

Python 2.x.

In Python 3.x the strings use Unicode by default and there’s no need for the u prefix. Note: in Python 3.0-3.2, the u is a syntax error. In Python 3.3+ it’s legal again to make it easier to write 2/3 compatible apps.


回答 3

我来这里是因为我的requests输出中有滑稽字符综合症。我以为response.text会给我一个正确解码的字符串,但是在输出中我发现了有趣的双字符,应该是德国变音符。

原来response.encoding是空的,所以response不知道如何正确解码内容,只是将其视为ASCII(我猜)。

我的解决方案是使用“ response.content”获取原始字节并手动对其应用decode('utf_8')。结果是舍恩·乌姆劳特(schöneUmlaute)。

正确解码

毛皮

与不正确解码

Ăź

I came here because I had funny-char-syndrome on my requests output. I thought response.text would give me a properly decoded string, but in the output I found funny double-chars where German umlauts should have been.

Turns out response.encoding was empty somehow and so response did not know how to properly decode the content and just treated it as ASCII (I guess).

My solution was to get the raw bytes with ‘response.content’ and manually apply decode('utf_8') to it. The result was schöne Umlaute.

The correctly decoded

für

vs. the improperly decoded

fĂźr


回答 4

供人类使用的所有字符串都应使用u“”。

我发现以下心态在处理Python字符串时有很大帮助:所有 Python清单字符串都应使用u""语法。该""语法仅适用于字节数组。

在扑打开始之前,让我解释一下。大多数Python程序开始于使用"" for字符串。但是随后他们需要从Internet上支持文档,因此他们开始使用它"".decode,突然之间,到处都有关于解码该内容的异常-都是由于使用""了字符串。在这种情况下,Unicode的确像病毒一样,将造成严重破坏。

但是,如果您遵循我的规则,就不会感染这种病毒(因为您已经被感染了)。

All strings meant for humans should use u””.

I found that the following mindset helps a lot when dealing with Python strings: All Python manifest strings should use the u"" syntax. The "" syntax is for byte arrays, only.

Before the bashing begins, let me explain. Most Python programs start out with using "" for strings. But then they need to support documentation off the Internet, so they start using "".decode and all of a sudden they are getting exceptions everywhere about decoding this and that – all because of the use of "" for strings. In this case, Unicode does act like a virus and will wreak havoc.

But, if you follow my rule, you won’t have this infection (because you will already be infected).


回答 5

它是Unicode。

只需将变量放在之间str(),它将正常工作。

但是,如果您有两个类似以下的列表:

a = ['co32','co36']
b = [u'co32',u'co36']

如果选中set(a)==set(b),它将显示为False,但是如果执行以下操作:

b = str(b)
set(a)==set(b)

现在,结果将为True。

It’s Unicode.

Just put the variable between str(), and it will work fine.

But in case you have two lists like the following:

a = ['co32','co36']
b = [u'co32',u'co36']

If you check set(a)==set(b), it will come as False, but if you do as follows:

b = str(b)
set(a)==set(b)

Now, the result will be True.