问题:如何将列表的字符串表示形式转换为列表?
我想知道最简单的方法是将string
类似以下的列表转换为list
:
x = u'[ "A","B","C" , " D"]'
即使用户在逗号之间加上空格,也要在引号内使用空格。我还需要处理以下内容:
x = ["A", "B", "C", "D"]
在Python中。
我知道我可以使用strip()
并split()
使用split运算符删除空格,并检查非字母。但是代码变得非常混乱。有我不知道的快速功能吗?
I was wondering what the simplest way is to convert a string
list like the following to a list
:
x = u'[ "A","B","C" , " D"]'
Even in case user puts spaces in between the commas, and spaces inside of the quotes. I need to handle that as well to:
x = ["A", "B", "C", "D"]
in Python.
I know I can strip spaces with strip()
and split()
using the split operator and check for non alphabets. But the code was getting very kludgy. Is there a quick function that I’m not aware of?
回答 0
>>> import ast
>>> x = u'[ "A","B","C" , " D"]'
>>> x = ast.literal_eval(x)
>>> x
['A', 'B', 'C', ' D']
>>> x = [n.strip() for n in x]
>>> x
['A', 'B', 'C', 'D']
ast.literal_eval:
使用ast.literal_eval,您可以安全地评估表达式节点或包含Python表达式的字符串。提供的字符串或节点只能由以下Python文字结构组成:字符串,数字,元组,列表,字典,布尔值和无。
>>> import ast
>>> x = u'[ "A","B","C" , " D"]'
>>> x = ast.literal_eval(x)
>>> x
['A', 'B', 'C', ' D']
>>> x = [n.strip() for n in x]
>>> x
['A', 'B', 'C', 'D']
ast.literal_eval:
With ast.literal_eval, you can safely evaluate an expression node or a string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.
回答 1
json
每当有字典的字符串列表时,该模块都是更好的解决方案。该json.loads(your_data)
函数可用于将其转换为列表。
>>> import json
>>> x = u'[ "A","B","C" , " D"]'
>>> json.loads(x)
[u'A', u'B', u'C', u' D']
相似地
>>> x = u'[ "A","B","C" , {"D":"E"}]'
>>> json.loads(x)
[u'A', u'B', u'C', {u'D': u'E'}]
The json
module is a better solution whenever there is a stringified list of dictionaries. The json.loads(your_data)
function can be used to convert it to a list.
>>> import json
>>> x = u'[ "A","B","C" , " D"]'
>>> json.loads(x)
[u'A', u'B', u'C', u' D']
Similarly
>>> x = u'[ "A","B","C" , {"D":"E"}]'
>>> json.loads(x)
[u'A', u'B', u'C', {u'D': u'E'}]
回答 2
这eval
很危险-您不应该执行用户输入。
如果您使用2.6或更高版本,请使用ast而不是eval:
>>> import ast
>>> ast.literal_eval('["A","B" ,"C" ," D"]')
["A", "B", "C", " D"]
一旦有了,就可以strip
了。
如果您使用的是旧版Python,则可以使用简单的正则表达式非常接近所需的内容:
>>> x='[ "A", " B", "C","D "]'
>>> re.findall(r'"\s*([^"]*?)\s*"', x)
['A', 'B', 'C', 'D']
这不如ast解决方案好,例如,它不能正确处理字符串中的转义引号。但这很简单,不涉及危险的评估,如果您使用的是没有ast的旧Python,则可能足以满足您的目的。
The eval
is dangerous – you shouldn’t execute user input.
If you have 2.6 or newer, use ast instead of eval:
>>> import ast
>>> ast.literal_eval('["A","B" ,"C" ," D"]')
["A", "B", "C", " D"]
Once you have that, strip
the strings.
If you’re on an older version of Python, you can get very close to what you want with a simple regular expression:
>>> x='[ "A", " B", "C","D "]'
>>> re.findall(r'"\s*([^"]*?)\s*"', x)
['A', 'B', 'C', 'D']
This isn’t as good as the ast solution, for example it doesn’t correctly handle escaped quotes in strings. But it’s simple, doesn’t involve a dangerous eval, and might be good enough for your purpose if you’re on an older Python without ast.
回答 3
import ast
l = ast.literal_eval('[ "A","B","C" , " D"]')
l = [i.strip() for i in l]
import ast
l = ast.literal_eval('[ "A","B","C" , " D"]')
l = [i.strip() for i in l]
回答 4
有一个快速的解决方案:
x = eval('[ "A","B","C" , " D"]')
可以通过以下方式删除列表元素中不需要的空格:
x = [x.strip() for x in eval('[ "A","B","C" , " D"]')]
There is a quick solution:
x = eval('[ "A","B","C" , " D"]')
Unwanted whitespaces in the list elements may be removed in this way:
x = [x.strip() for x in eval('[ "A","B","C" , " D"]')]
回答 5
从上面适用于基本python软件包的一些答案的启发中,我比较了一些(使用Python 3.7.3)的性能:
方法1:AST
import ast
list(map(str.strip, ast.literal_eval(u'[ "A","B","C" , " D"]')))
# ['A', 'B', 'C', 'D']
import timeit
timeit.timeit(stmt="list(map(str.strip, ast.literal_eval(u'[ \"A\",\"B\",\"C\" , \" D\"]')))", setup='import ast', number=100000)
# 1.292875313000195
方法2:JSON
import json
list(map(str.strip, json.loads(u'[ "A","B","C" , " D"]')))
# ['A', 'B', 'C', 'D']
import timeit
timeit.timeit(stmt="list(map(str.strip, json.loads(u'[ \"A\",\"B\",\"C\" , \" D\"]')))", setup='import json', number=100000)
# 0.27833264000014424
方法3:不导入
list(map(str.strip, u'[ "A","B","C" , " D"]'.strip('][').replace('"', '').split(',')))
# ['A', 'B', 'C', 'D']
import timeit
timeit.timeit(stmt="list(map(str.strip, u'[ \"A\",\"B\",\"C\" , \" D\"]'.strip('][').replace('\"', '').split(',')))", number=100000)
# 0.12935059100027502
我很失望地看到我认为可读性最差的方法是性能最好的方法。选择可读性最高的选项时要权衡考虑…对于我通常使用python的工作量类型相对于性能稍高的选项,它更重视可读性,但通常情况下,它取决于。
Inspired from some of the answers above that work with base python packages I compared the performance of a few (using Python 3.7.3):
Method 1: ast
import ast
list(map(str.strip, ast.literal_eval(u'[ "A","B","C" , " D"]')))
# ['A', 'B', 'C', 'D']
import timeit
timeit.timeit(stmt="list(map(str.strip, ast.literal_eval(u'[ \"A\",\"B\",\"C\" , \" D\"]')))", setup='import ast', number=100000)
# 1.292875313000195
Method 2: json
import json
list(map(str.strip, json.loads(u'[ "A","B","C" , " D"]')))
# ['A', 'B', 'C', 'D']
import timeit
timeit.timeit(stmt="list(map(str.strip, json.loads(u'[ \"A\",\"B\",\"C\" , \" D\"]')))", setup='import json', number=100000)
# 0.27833264000014424
Method 3: no import
list(map(str.strip, u'[ "A","B","C" , " D"]'.strip('][').replace('"', '').split(',')))
# ['A', 'B', 'C', 'D']
import timeit
timeit.timeit(stmt="list(map(str.strip, u'[ \"A\",\"B\",\"C\" , \" D\"]'.strip('][').replace('\"', '').split(',')))", number=100000)
# 0.12935059100027502
I was disappointed to see what I considered the method with the worst readability was the method with the best performance… there are tradeoffs to consider when going with the most readable option… for the type of workloads I use python for I usually value readability over a slightly more performant option, but as usual it depends.
回答 6
如果只是一维列表,则无需导入任何内容即可完成:
>>> x = u'[ "A","B","C" , " D"]'
>>> ls = x.strip('[]').replace('"', '').replace(' ', '').split(',')
>>> ls
['A', 'B', 'C', 'D']
If it’s only a one dimensional list, this can be done without importing anything:
>>> x = u'[ "A","B","C" , " D"]'
>>> ls = x.strip('[]').replace('"', '').replace(' ', '').split(',')
>>> ls
['A', 'B', 'C', 'D']
回答 7
假设所有输入都是列表,并且输入中的双引号实际上并不重要,则可以使用简单的regexp替换来完成。它有点Perl-y,但是却像魅力一样。还要注意,输出现在是unicode字符串的列表,您没有指定所需的字符串,但是对于unicode输入,这似乎很有意义。
import re
x = u'[ "A","B","C" , " D"]'
junkers = re.compile('[[" \]]')
result = junkers.sub('', x).split(',')
print result
---> [u'A', u'B', u'C', u'D']
junkers变量包含一个我们不想使用的所有字符的正则表达式(用于速度),使用]作为字符需要一些反斜杠技巧。re.sub将所有这些字符全部替换为空,然后将结果字符串拆分为逗号。
请注意,这还会从内部条目u'[“ oh no”]’—> [u’ohno’]中删除空格。如果这不是您想要的,则需要增加正则表达式。
Assuming that all your inputs are lists and that the double quotes in the input actually don’t matter, this can be done with a simple regexp replace. It is a bit perl-y but works like a charm. Note also that the output is now a list of unicode strings, you didn’t specify that you needed that, but it seems to make sense given unicode input.
import re
x = u'[ "A","B","C" , " D"]'
junkers = re.compile('[[" \]]')
result = junkers.sub('', x).split(',')
print result
---> [u'A', u'B', u'C', u'D']
The junkers variable contains a compiled regexp (for speed) of all characters we don’t want, using ] as a character required some backslash trickery.
The re.sub replaces all these characters with nothing, and we split the resulting string at the commas.
Note that this also removes spaces from inside entries u'[“oh no”]’ —> [u’ohno’]. If this is not what you wanted, the regexp needs to be souped up a bit.
回答 8
如果您知道列表仅包含带引号的字符串,则此pyparsing示例将为您提供剥离字符串的列表(甚至保留原始Unicode-ness)。
>>> from pyparsing import *
>>> x =u'[ "A","B","C" , " D"]'
>>> LBR,RBR = map(Suppress,"[]")
>>> qs = quotedString.setParseAction(removeQuotes, lambda t: t[0].strip())
>>> qsList = LBR + delimitedList(qs) + RBR
>>> print qsList.parseString(x).asList()
[u'A', u'B', u'C', u'D']
如果你的列表可以有更多的数据类型,甚至包含列表中列出,那么你将需要一个更完整的语法-像这样一个在pyparsing wiki,它可以处理的元组,列表,整数,浮点数,和引用字符串。将适用于2.4之前的Python版本。
If you know that your lists only contain quoted strings, this pyparsing example will give you your list of stripped strings (even preserving the original Unicode-ness).
>>> from pyparsing import *
>>> x =u'[ "A","B","C" , " D"]'
>>> LBR,RBR = map(Suppress,"[]")
>>> qs = quotedString.setParseAction(removeQuotes, lambda t: t[0].strip())
>>> qsList = LBR + delimitedList(qs) + RBR
>>> print qsList.parseString(x).asList()
[u'A', u'B', u'C', u'D']
If your lists can have more datatypes, or even contain lists within lists, then you will need a more complete grammar – like this one on the pyparsing wiki, which will handle tuples, lists, ints, floats, and quoted strings. Will work with Python versions back to 2.4.
回答 9
为了进一步使用json完成@Ryan的答案,这里发布的一个非常方便的函数来转换unicode: https
例如,用双引号或单引号引起来:
>print byteify(json.loads(u'[ "A","B","C" , " D"]')
>print byteify(json.loads(u"[ 'A','B','C' , ' D']".replace('\'','"')))
['A', 'B', 'C', ' D']
['A', 'B', 'C', ' D']
To further complete @Ryan ‘s answer using json, one very convenient function to convert unicode is the one posted here: https://stackoverflow.com/a/13105359/7599285
ex with double or single quotes:
>print byteify(json.loads(u'[ "A","B","C" , " D"]')
>print byteify(json.loads(u"[ 'A','B','C' , ' D']".replace('\'','"')))
['A', 'B', 'C', ' D']
['A', 'B', 'C', ' D']
回答 10
我想用正则表达式提供一个更直观的模式解决方案。下面的函数将包含任意字符串的字符串化列表作为输入。
分步说明:
删除所有whitespacing,花括号和value_separators(前提是它们不是要提取的值的一部分,否则会使正则表达式更复杂)。然后,将清洗后的字符串用单引号或双引号引起来,并采用非空值(或奇数索引值,无论使用哪种首选项)。
def parse_strlist(sl):
import re
clean = re.sub("[\[\],\s]","",sl)
splitted = re.split("[\'\"]",clean)
values_only = [s for s in splitted if s != '']
return values_only
testsample:“ [’21’,” foo“’6’,’0’,” A“]”
I would like to provide a more intuitive patterning solution with regex.
The below function takes as input a stringified list containing arbitrary strings.
Stepwise explanation:
You remove all whitespacing,bracketing and value_separators (provided they are not part of the values you want to extract, else make the regex more complex). Then you split the cleaned string on single or double quotes and take the non-empty values (or odd indexed values, whatever the preference).
def parse_strlist(sl):
import re
clean = re.sub("[\[\],\s]","",sl)
splitted = re.split("[\'\"]",clean)
values_only = [s for s in splitted if s != '']
return values_only
testsample: “[’21’,”foo” ‘6’, ‘0’, ” A”]”
回答 11
并使用纯python-不导入任何库
[x for x in x.split('[')[1].split(']')[0].split('"')[1:-1] if x not in[',',' , ',', ']]
and with pure python – not importing any libraries
[x for x in x.split('[')[1].split(']')[0].split('"')[1:-1] if x not in[',',' , ',', ']]
回答 12
在处理存储为Pandas DataFrame的抓取数据时,您可能会遇到这样的问题。
如果值列表以文本形式出现,则此解决方案的工作方式类似于魅力。
def textToList(hashtags):
return hashtags.strip('[]').replace('\'', '').replace(' ', '').split(',')
hashtags = "[ 'A','B','C' , ' D']"
hashtags = textToList(hashtags)
Output: ['A', 'B', 'C', 'D']
无需外部库。
You may run into such problem while dealing with scraped data stored as Pandas DataFrame.
This solution works like charm if the list of values is present as text.
def textToList(hashtags):
return hashtags.strip('[]').replace('\'', '').replace(' ', '').split(',')
hashtags = "[ 'A','B','C' , ' D']"
hashtags = textToList(hashtags)
Output: ['A', 'B', 'C', 'D']
No external library required.
回答 13
因此,按照所有答案,我决定为最常见的方法计时:
from time import time
import re
import json
my_str = str(list(range(19)))
print(my_str)
reps = 100000
start = time()
for i in range(0, reps):
re.findall("\w+", my_str)
print("Regex method:\t", (time() - start) / reps)
start = time()
for i in range(0, reps):
json.loads(my_str)
print("json method:\t", (time() - start) / reps)
start = time()
for i in range(0, reps):
ast.literal_eval(my_str)
print("ast method:\t\t", (time() - start) / reps)
start = time()
for i in range(0, reps):
[n.strip() for n in my_str]
print("strip method:\t", (time() - start) / reps)
regex method: 6.391477584838867e-07
json method: 2.535374164581299e-06
ast method: 2.4425282478332518e-05
strip method: 4.983267784118653e-06
因此,最终正则表达式获胜!
So, following all the answers I decided to time the most common methods:
from time import time
import re
import json
my_str = str(list(range(19)))
print(my_str)
reps = 100000
start = time()
for i in range(0, reps):
re.findall("\w+", my_str)
print("Regex method:\t", (time() - start) / reps)
start = time()
for i in range(0, reps):
json.loads(my_str)
print("json method:\t", (time() - start) / reps)
start = time()
for i in range(0, reps):
ast.literal_eval(my_str)
print("ast method:\t\t", (time() - start) / reps)
start = time()
for i in range(0, reps):
[n.strip() for n in my_str]
print("strip method:\t", (time() - start) / reps)
regex method: 6.391477584838867e-07
json method: 2.535374164581299e-06
ast method: 2.4425282478332518e-05
strip method: 4.983267784118653e-06
So in the end regex wins!
回答 14
您可以通过从列表的字符串表示中切下第一个和最后符来节省.strip()fcn(请参见下面的第三行)
>>> mylist=[1,2,3,4,5,'baloney','alfalfa']
>>> strlist=str(mylist)
['1', ' 2', ' 3', ' 4', ' 5', " 'baloney'", " 'alfalfa'"]
>>> mylistfromstring=(strlist[1:-1].split(', '))
>>> mylistfromstring[3]
'4'
>>> for entry in mylistfromstring:
... print(entry)
... type(entry)
...
1
<class 'str'>
2
<class 'str'>
3
<class 'str'>
4
<class 'str'>
5
<class 'str'>
'baloney'
<class 'str'>
'alfalfa'
<class 'str'>
you can save yourself the .strip() fcn by just slicing off the first and last characters from the string representation of the list (see third line below)
>>> mylist=[1,2,3,4,5,'baloney','alfalfa']
>>> strlist=str(mylist)
['1', ' 2', ' 3', ' 4', ' 5', " 'baloney'", " 'alfalfa'"]
>>> mylistfromstring=(strlist[1:-1].split(', '))
>>> mylistfromstring[3]
'4'
>>> for entry in mylistfromstring:
... print(entry)
... type(entry)
...
1
<class 'str'>
2
<class 'str'>
3
<class 'str'>
4
<class 'str'>
5
<class 'str'>
'baloney'
<class 'str'>
'alfalfa'
<class 'str'>