问题:UnicodeEncodeError:’ascii’编解码器无法在位置20编码字符u’\ xa0’:序数不在范围内(128)




agent_telno = agent.find('div', 'agent_contact_number')
agent_telno = '' if agent_telno is None else agent_telno.contents[0]
p.agent_info = str(agent_contact + ' ' + agent_telno).strip()


Traceback (most recent call last):
  File "foobar.py", line 792, in <module>
    p.agent_info = str(agent_contact + ' ' + agent_telno).strip()
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)



I’m having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup.

The problem is that the error is not always reproducible; it sometimes works with some pages, and sometimes, it barfs by throwing a UnicodeEncodeError. I have tried just about everything I can think of, and yet I have not found anything that works consistently without throwing some kind of Unicode-related error.

One of the sections of code that is causing problems is shown below:

agent_telno = agent.find('div', 'agent_contact_number')
agent_telno = '' if agent_telno is None else agent_telno.contents[0]
p.agent_info = str(agent_contact + ' ' + agent_telno).strip()

Here is a stack trace produced on SOME strings when the snippet above is run:

Traceback (most recent call last):
  File "foobar.py", line 792, in <module>
    p.agent_info = str(agent_contact + ' ' + agent_telno).strip()
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

I suspect that this is because some pages (or more specifically, pages from some of the sites) may be encoded, whilst others may be unencoded. All the sites are based in the UK and provide data meant for UK consumption – so there are no issues relating to internalization or dealing with text written in anything other than English.

Does anyone have any ideas as to how to solve this so that I can CONSISTENTLY fix this problem?

回答 0

您需要阅读Python Unicode HOWTO。这个错误是第一个例子



p.agent_info = u' '.join((agent_contact, agent_telno)).encode('utf-8').strip()


You need to read the Python Unicode HOWTO. This error is the very first example.

Basically, stop using str to convert from unicode to encoded text / bytes.

Instead, properly use .encode() to encode the string:

p.agent_info = u' '.join((agent_contact, agent_telno)).encode('utf-8').strip()

or work entirely in unicode.

回答 1

这是经典的python unicode痛点!考虑以下:

a = u'bats\u00E0'
print a
 => batsà


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)


 => 'bats\xc3\xa0'
print a.encode('utf-8')
 => batsà

Voil \ u00E0!


有关此主题的出色论述,请参见Ned Batchelder在PyCon上的演讲:http : //nedbatchelder.com/text/unipain.html

This is a classic python unicode pain point! Consider the following:

a = u'bats\u00E0'
print a
 => batsà

All good so far, but if we call str(a), let’s see what happens:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)

Oh dip, that’s not gonna do anyone any good! To fix the error, encode the bytes explicitly with .encode and tell python what codec to use:

 => 'bats\xc3\xa0'
print a.encode('utf-8')
 => batsà


The issue is that when you call str(), python uses the default character encoding to try and encode the bytes you gave it, which in your case are sometimes representations of unicode characters. To fix the problem, you have to tell python how to deal with the string you give it by using .encode(‘whatever_unicode’). Most of the time, you should be fine using utf-8.

For an excellent exposition on this topic, see Ned Batchelder’s PyCon talk here: http://nedbatchelder.com/text/unipain.html

回答 2


yourstring = yourstring.encode('ascii', 'ignore').decode('ascii')


>>> u'City: Malmö'.encode('ascii', 'ignore').decode('ascii')
'City: Malm'

I found elegant work around for me to remove symbols and continue to keep string as string in follows:

yourstring = yourstring.encode('ascii', 'ignore').decode('ascii')

It’s important to notice that using the ignore option is dangerous because it silently drops any unicode(and internationalization) support from the code that uses it, as seen here (convert unicode):

>>> u'City: Malmö'.encode('ascii', 'ignore').decode('ascii')
'City: Malm'

回答 3

好吧,我尝试了一切,但并没有帮助,在谷歌搜索之后,我发现了以下内容并有所帮助。使用python 2.7。

# encoding=utf8
import sys

well i tried everything but it did not help, after googling around i figured the following and it helped. python 2.7 is in use.

# encoding=utf8
import sys

回答 4

导致甚至打印失败的一个细微问题是环境变量设置错误,例如。此处LC_ALL设置为“ C”。在Debian中,他们不鼓励设置它:Locale上的Debian Wiki

$ echo $LANG
$ echo $LC_ALL 
$ python -c "print (u'voil\u00e0')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)
$ export LC_ALL='en_US.utf8'
$ python -c "print (u'voil\u00e0')"
$ unset LC_ALL
$ python -c "print (u'voil\u00e0')"

A subtle problem causing even print to fail is having your environment variables set wrong, eg. here LC_ALL set to “C”. In Debian they discourage setting it: Debian wiki on Locale

$ echo $LANG
$ echo $LC_ALL 
$ python -c "print (u'voil\u00e0')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)
$ export LC_ALL='en_US.utf8'
$ python -c "print (u'voil\u00e0')"
$ unset LC_ALL
$ python -c "print (u'voil\u00e0')"

回答 5




For me, what worked was:


Hope this helps someone.

回答 6


s = mystring.decode('ascii', 'ignore')

I’ve actually found that in most of my cases, just stripping out those characters is much simpler:

s = mystring.decode('ascii', 'ignore')

回答 7



sudo apt-get install language-pack-en


在某些Linux发行版中,需要确保正确设置了默认的英语语言环境(因此unicode字符可以由shell / terminal处理)。有时,与手动配置相比,它更容易安装。



open(foo, encoding='utf-8')


  • 您的语言环境文件(/etc/default/locale),应包含例如



  • LANG/ LC_CTYPEin shell的值。

  • 通过以下方法检查您的shell支持的语言环境:

    locale -a | grep "UTF-8"


  1. 初始化和配置VM(例如使用vagrant):

    vagrant init ubuntu/trusty64; vagrant up; vagrant ssh


  2. 打印unicode字符(例如商标符号):

    $ python -c 'print(u"\u2122");'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 0: ordinal not in range(128)
  3. 现在安装language-pack-en

    $ sudo apt-get -y install language-pack-en
    The following extra packages will be installed:
    Generating locales...
      en_GB.UTF-8... /usr/sbin/locale-gen: done
    Generation complete.
  4. 现在应该解决问题:

    $ python -c 'print(u"\u2122");'
  5. 否则,请尝试以下命令:

    $ LC_ALL=C.UTF-8 python -c 'print(u"\u2122");'

The problem is that you’re trying to print a unicode character, but your terminal doesn’t support it.

You can try installing language-pack-en package to fix that:

sudo apt-get install language-pack-en

which provides English translation data updates for all supported packages (including Python). Install different language package if necessary (depending which characters you’re trying to print).

On some Linux distributions it’s required in order to make sure that the default English locales are set-up properly (so unicode characters can be handled by shell/terminal). Sometimes it’s easier to install it, than configuring it manually.

Then when writing the code, make sure you use the right encoding in your code.

For example:

open(foo, encoding='utf-8')

If you’ve still a problem, double check your system configuration, such as:

  • Your locale file (/etc/default/locale), which should have e.g.



  • Value of LANG/LC_CTYPE in shell.

  • Check which locale your shell supports by:

    locale -a | grep "UTF-8"

Demonstrating the problem and solution in fresh VM.

  1. Initialize and provision the VM (e.g. using vagrant):

    vagrant init ubuntu/trusty64; vagrant up; vagrant ssh

    See: available Ubuntu boxes..

  2. Printing unicode characters (such as trade mark sign like ):

    $ python -c 'print(u"\u2122");'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 0: ordinal not in range(128)
  3. Now installing language-pack-en:

    $ sudo apt-get -y install language-pack-en
    The following extra packages will be installed:
    Generating locales...
      en_GB.UTF-8... /usr/sbin/locale-gen: done
    Generation complete.
  4. Now problem should be solved:

    $ python -c 'print(u"\u2122");'
  5. Otherwise, try the following command:

    $ LC_ALL=C.UTF-8 python -c 'print(u"\u2122");'

回答 8


  1. 通过以下命令查找支持的UTF-8语言环境:

    locale -a | grep "UTF-8"
  2. 在运行脚本之前将其导出,例如:

    export LC_ALL=$(locale -a | grep UTF-8)


    export LC_ALL=C.UTF-8
  3. 通过打印特殊字符进行测试,例如

    python -c 'print(u"\u2122");'


In shell:

  1. Find supported UTF-8 locale by the following command:

    locale -a | grep "UTF-8"
  2. Export it, before running the script, e.g.:

    export LC_ALL=$(locale -a | grep UTF-8)

    or manually like:

    export LC_ALL=C.UTF-8
  3. Test it by printing special character, e.g. :

    python -c 'print(u"\u2122");'

Above tested in Ubuntu.

回答 9


# -*- coding: utf-8 -*-

那就是python源代码编码的定义。PEP 263中的更多信息。

Add line below at the beginning of your script ( or as second line):

# -*- coding: utf-8 -*-

That’s definition of python source code encoding. More info in PEP 263.

回答 10


def safeStr(obj):
    try: return str(obj)
    except UnicodeEncodeError:
        return obj.encode('ascii', 'ignore').decode('ascii')
    except: return ""


if __name__ == '__main__': 
    print safeStr( 1 ) 
    print safeStr( "test" ) 
    print u'98\xb0'
    print safeStr( u'98\xb0' )




这是为Python 2编写的。 对于Python 3,我相信您将要使用bytes(obj,"ascii")而不是str(obj)。我尚未对此进行测试,但是我会在某个时候修改答案。

Here’s a rehashing of some other so-called “cop out” answers. There are situations in which simply throwing away the troublesome characters/strings is a good solution, despite the protests voiced here.

def safeStr(obj):
    try: return str(obj)
    except UnicodeEncodeError:
        return obj.encode('ascii', 'ignore').decode('ascii')
    except: return ""

Testing it:

if __name__ == '__main__': 
    print safeStr( 1 ) 
    print safeStr( "test" ) 
    print u'98\xb0'
    print safeStr( u'98\xb0' )



Suggestion: you might want to name this function to toAscii instead? That’s a matter of preference.

This was written for Python 2. For Python 3, I believe you’ll want to use bytes(obj,"ascii") rather than str(obj). I didn’t test this yet, but I will at some point and revise the answer.

回答 11


# -*- coding: utf-8 -*-
from __future__ import unicode_literals

I always put the code below in the first two lines of the python files:

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

回答 12


def safe_unicode(obj, *args):
    """ return the unicode representation of obj """
        return unicode(obj, *args)
    except UnicodeDecodeError:
        # obj is byte string
        ascii_text = str(obj).encode('string_escape')
        return unicode(ascii_text)

def safe_str(obj):
    """ return the byte string representation of obj """
        return str(obj)
    except UnicodeEncodeError:
        # obj is unicode
        return unicode(obj).encode('unicode_escape')

Simple helper functions found here.

def safe_unicode(obj, *args):
    """ return the unicode representation of obj """
        return unicode(obj, *args)
    except UnicodeDecodeError:
        # obj is byte string
        ascii_text = str(obj).encode('string_escape')
        return unicode(ascii_text)

def safe_str(obj):
    """ return the byte string representation of obj """
        return str(obj)
    except UnicodeEncodeError:
        # obj is unicode
        return unicode(obj).encode('unicode_escape')

回答 13



Just add to a variable encode(‘utf-8’)


回答 14


export LC_ALL="en_US.UTF-8"

Please open terminal and fire the below command:

export LC_ALL="en_US.UTF-8"

回答 15


import unicodedata
message = unicodedata.normalize("NFKD", message)


unicodedata.normalize(form,unistr)返回Unicode字符串unistr的普通形式form。格式的有效值为“ NFC”,“ NFKC”,“ NFD”和“ NFKD”。

Unicode标准基于规范对等和兼容性对等的定义,定义了Unicode字符串的各种规范化形式。在Unicode中,可以用各种方式表示几个字符。例如,字符U + 00C7(带有CEDILLA的拉丁文大写字母C)也可以表示为序列U + 0043(拉丁文的大写字母C)U + 0327(合并CEDILLA)。


除了这两种形式,还有基于兼容性对等的两种其他常规形式。在Unicode中,支持某些字符,这些字符通常会与其他字符统一。例如,U + 2160(罗马数字ONE)与U + 0049(拉丁大写字母I)实际上是同一回事。但是,Unicode支持它与现有字符集(例如gb2312)兼容。




I just used the following:

import unicodedata
message = unicodedata.normalize("NFKD", message)

Check what documentation says about it:

unicodedata.normalize(form, unistr) Return the normal form form for the Unicode string unistr. Valid values for form are ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’.

The Unicode standard defines various normalization forms of a Unicode string, based on the definition of canonical equivalence and compatibility equivalence. In Unicode, several characters can be expressed in various way. For example, the character U+00C7 (LATIN CAPITAL LETTER C WITH CEDILLA) can also be expressed as the sequence U+0043 (LATIN CAPITAL LETTER C) U+0327 (COMBINING CEDILLA).

For each character, there are two normal forms: normal form C and normal form D. Normal form D (NFD) is also known as canonical decomposition, and translates each character into its decomposed form. Normal form C (NFC) first applies a canonical decomposition, then composes pre-combined characters again.

In addition to these two forms, there are two additional normal forms based on compatibility equivalence. In Unicode, certain characters are supported which normally would be unified with other characters. For example, U+2160 (ROMAN NUMERAL ONE) is really the same thing as U+0049 (LATIN CAPITAL LETTER I). However, it is supported in Unicode for compatibility with existing character sets (e.g. gb2312).

The normal form KD (NFKD) will apply the compatibility decomposition, i.e. replace all compatibility characters with their equivalents. The normal form KC (NFKC) first applies the compatibility decomposition, followed by the canonical composition.

Even if two unicode strings are normalized and look the same to a human reader, if one has combining characters and the other doesn’t, they may not compare equal.

Solves it for me. Simple and easy.

回答 16




result_html = result.to_html(col_space=1, index=False, justify={'right'})

text = u"""
Hello all, <br>
Here's weekly summary report.  Let me know if you have any questions. <br>
Data Summary <br>
<p>Data Team</p>

Below solution worked for me, Just added

u “String”

(representing the string as unicode) before my string.

result_html = result.to_html(col_space=1, index=False, justify={'right'})

text = u"""
Hello all, <br>
Here's weekly summary report.  Let me know if you have any questions. <br>
Data Summary <br>
<p>Data Team</p>

回答 17

this这至少在Python 3中有效…

Python 3


import os
import locale
os.environ["PYTHONIOENCODING"] = "utf-8"
myLocale=locale.setlocale(category=locale.LC_ALL, locale="en_GB.UTF-8")
print(myText.encode('utf-8', errors='ignore'))


Alas this works in Python 3 at least…

Python 3

Sometimes the error is in the enviroment variables and enconding so

import os
import locale
os.environ["PYTHONIOENCODING"] = "utf-8"
myLocale=locale.setlocale(category=locale.LC_ALL, locale="en_GB.UTF-8")
print(myText.encode('utf-8', errors='ignore'))

where errors are ignored in encoding.

回答 18


# 'value' contains the problematic data
unic = u''
unic += value
value = unic



I just had this problem, and Google led me here, so just to add to the general solutions here, this is what worked for me:

# 'value' contains the problematic data
unic = u''
unic += value
value = unic

I had this idea after reading Ned’s presentation.

I don’t claim to fully understand why this works, though. So if anyone can edit this answer or put in a comment to explain, I’ll appreciate it.

回答 19

manage.py migrate在带有本地化夹具的Django中运行时,我们遇到了此错误。

我们的资料包含# -*- coding: utf-8 -*-声明,MySQL已为utf8正确配置,而Ubuntu在中具有适当的语言包和值/etc/default/locale

问题只是因为Django容器(我们使用docker)缺少 LANG env var。


We struck this error when running manage.py migrate in Django with localized fixtures.

Our source contained the # -*- coding: utf-8 -*- declaration, MySQL was correctly configured for utf8 and Ubuntu had the appropriate language pack and values in /etc/default/locale.

The issue was simply that the Django container (we use docker) was missing the LANG env var.

Setting LANG to en_US.UTF-8 and restarting the container before re-running migrations fixed the problem.

回答 20

这里的许多答案(例如,@ agf和@Andbdrew)已经解决了OP问题的最直接方面。

但是,我认为有一个微妙但重要的方面已被很大程度上忽略,这对于像我这样在尝试理解Python编码时最终落到这里的每个人都非常重要:Python 2 vs Python 3字符表示的管理截然不同。我觉得很多困惑与人们在不了解版本的情况下阅读Python编码有关。

我建议有兴趣了解OP问题根本原因的人首先阅读Spolsky对字符表示法和Unicode 介绍,然后转向Python 2和Python 3中的Unicode Batchelder

Many answers here (@agf and @Andbdrew for example) have already addressed the most immediate aspects of the OP question.

However, I think there is one subtle but important aspect that has been largely ignored and that matters dearly for everyone who like me ended up here while trying to make sense of encodings in Python: Python 2 vs Python 3 management of character representation is wildly different. I feel like a big chunk of confusion out there has to do with people reading about encodings in Python without being version aware.

I suggest anyone interested in understanding the root cause of OP problem to begin by reading Spolsky’s introduction to character representations and Unicode and then move to Batchelder on Unicode in Python 2 and Python 3.

回答 21



    data = data #Don't convert to String


Try to avoid conversion of variable to str(variable). Sometimes, It may cause the issue.

Simple tip to avoid :

    data = data #Don't convert to String

The above example will solve Encode error also.

回答 22

如果您有类似的packet_data = "This is data"操作,请在初始化后立即在下一行执行此操作packet_data

unic = u''
packet_data = unic

If you have something like packet_data = "This is data" then do this on the next line, right after initializing packet_data:

unic = u''
packet_data = unic

回答 23

python 3.0及更高版本的更新。在python编辑器中尝试以下操作:

locale-gen en_US.UTF-8
export LANG=en_US.UTF-8 LANGUAGE=en_US.en


有关更多信息,请参见PEP 538-将传统C语言环境强制为基于UTF-8的语言环境

Update for python 3.0 and later. Try the following in the python editor:

locale-gen en_US.UTF-8
export LANG=en_US.UTF-8 LANGUAGE=en_US.en

This sets the system`s default locale encoding to the UTF-8 format.

More can be read here at PEP 538 — Coercing the legacy C locale to a UTF-8 based locale.

回答 24



import sys
import codecs

def main(fIn, fOut):
    soup = BeautifulSoup(fIn)
    # Do processing, with data including non-ASCII characters

if __name__ == '__main__':
    with (sys.stdin) as fIn: # Don't think we need codecs.getreader here
        with codecs.getwriter('utf-8')(sys.stdout) as fOut:
            main(fIn, fOut)

I had this issue trying to output Unicode characters to stdout, but with sys.stdout.write, rather than print (so that I could support output to a different file as well).

From BeautifulSoup’s own documentation, I solved this with the codecs library:

import sys
import codecs

def main(fIn, fOut):
    soup = BeautifulSoup(fIn)
    # Do processing, with data including non-ASCII characters

if __name__ == '__main__':
    with (sys.stdin) as fIn: # Don't think we need codecs.getreader here
        with codecs.getwriter('utf-8')(sys.stdout) as fOut:
            main(fIn, fOut)

回答 25

当使用Apache部署django项目时,经常会发生此问题。因为Apache在/ etc / sysconfig / httpd中设置环境变量LANG = C。只需打开文件并注释(或更改为您的样式)此设置即可。或使用WSGIDaemonProcess命令的lang选项,在这种情况下,您将能够为不同的虚拟主机设置不同的LANG环境变量。

This problem often happens when a django project deploys using Apache. Because Apache sets environment variable LANG=C in /etc/sysconfig/httpd. Just open the file and comment (or change to your flavior) this setting. Or use the lang option of the WSGIDaemonProcess command, in this case you will be able to set different LANG environment variable to different virtualhosts.

回答 26


s = s.encode('ascii',errors='ignore')


The recommended solution did not work for me, and I could live with dumping all non ascii characters, so

s = s.encode('ascii',errors='ignore')

which left me with something stripped that doesn’t throw errors.

回答 27


 >>>print(unicodedata.normalize('NFD', re.sub("[\(\[].*?[\)\]]", "", "bats\xc3\xa0")).encode('ascii', 'ignore'))



This will work:

 >>>print(unicodedata.normalize('NFD', re.sub("[\(\[].*?[\)\]]", "", "bats\xc3\xa0")).encode('ascii', 'ignore'))


