如何使用python3制作unicode字符串

问题:如何使用python3制作unicode字符串

我用这个:

u = unicode(text, 'utf-8')

但是Python 3出现了错误(或者…也许我只是忘了包含一些东西):

NameError: global name 'unicode' is not defined

谢谢。

I used this :

u = unicode(text, 'utf-8')

But getting error with Python 3 (or… maybe I just forgot to include something) :

NameError: global name 'unicode' is not defined

Thank you.


回答 0

在Python3中,文字字符串默认为unicode。

假设这text是一个bytes对象,只需使用text.decode('utf-8')

unicode的Python2等效str于Python3,因此您还可以编写:

str(text, 'utf-8')

若你宁可。

Literal strings are unicode by default in Python3.

Assuming that text is a bytes object, just use text.decode('utf-8')

unicode of Python2 is equivalent to str in Python3, so you can also write:

str(text, 'utf-8')

if you prefer.


回答 1

Python 3.0的新功能说:

所有文本均为Unicode;但是编码的Unicode表示为二进制数据

如果您想确保输出的是utf-8,请参考以下页面中unicode 3.0版本的示例:

b'\x80abc'.decode("utf-8", "strict")

What’s new in Python 3.0 says:

All text is Unicode; however encoded Unicode is represented as binary data

If you want to ensure you are outputting utf-8, here’s an example from this page on unicode in 3.0:

b'\x80abc'.decode("utf-8", "strict")

回答 2

作为一种解决方法,我一直在使用:

# Fix Python 2.x.
try:
    UNICODE_EXISTS = bool(type(unicode))
except NameError:
    unicode = lambda s: str(s)

As a workaround, I’ve been using this:

# Fix Python 2.x.
try:
    UNICODE_EXISTS = bool(type(unicode))
except NameError:
    unicode = lambda s: str(s)

回答 3

这就是我解决问题的方式,例如将\ uFE0F,\ u000A等字符转换为16字节编码的表情符号。

example = 'raw vegan chocolate cocoa pie w chocolate & vanilla cream\\uD83D\\uDE0D\\uD83D\\uDE0D\\u2764\\uFE0F Present Moment Caf\\u00E8 in St.Augustine\\u2764\\uFE0F\\u2764\\uFE0F '
import codecs
new_str = codecs.unicode_escape_decode(example)[0]
print(new_str)
>>> 'raw vegan chocolate cocoa pie w chocolate & vanilla cream\ud83d\ude0d\ud83d\ude0d❤️ Present Moment Cafè in St.Augustine❤️❤️ '
new_new_str = new_str.encode('utf-16', 'surrogatepass').decode('utf-16')
print(new_new_str)
>>> 'raw vegan chocolate cocoa pie w chocolate & vanilla cream😍😍❤️ Present Moment Cafè in St.Augustine❤️❤️ '

This how I solved my problem to convert chars like \uFE0F, \u000A, etc. And also emojis that encoded with 16 bytes.

example = 'raw vegan chocolate cocoa pie w chocolate & vanilla cream\\uD83D\\uDE0D\\uD83D\\uDE0D\\u2764\\uFE0F Present Moment Caf\\u00E8 in St.Augustine\\u2764\\uFE0F\\u2764\\uFE0F '
import codecs
new_str = codecs.unicode_escape_decode(example)[0]
print(new_str)
>>> 'raw vegan chocolate cocoa pie w chocolate & vanilla cream\ud83d\ude0d\ud83d\ude0d❤️ Present Moment Cafè in St.Augustine❤️❤️ '
new_new_str = new_str.encode('utf-16', 'surrogatepass').decode('utf-16')
print(new_new_str)
>>> 'raw vegan chocolate cocoa pie w chocolate & vanilla cream😍😍❤️ Present Moment Cafè in St.Augustine❤️❤️ '

回答 4

在我使用多年的Python 2程序中,有以下一行:

ocd[i].namn=unicode(a[:b], 'utf-8')

这在Python 3中不起作用。

但是,该程序最终可用于:

ocd[i].namn=a[:b]

我不记得为什么将unicode放在首位,但是我认为这是因为该名称可以包含瑞典字母åäöÅÄÖ。但是,即使它们没有“ unicode”也可以工作。

In a Python 2 program that I used for many years there was this line:

ocd[i].namn=unicode(a[:b], 'utf-8')

This did not work in Python 3.

However, the program turned out to work with:

ocd[i].namn=a[:b]

I don’t remember why I put unicode there in the first place, but I think it was because the name can contains Swedish letters åäöÅÄÖ. But even they work without “unicode”.


回答 5

python 3.x中最简单的方法

text = "hi , I'm text"
text.encode('utf-8')

the easiest way in python 3.x

text = "hi , I'm text"
text.encode('utf-8')