如何将字符串拆分为列表?

问题:如何将字符串拆分为列表?

我希望我的Python函数拆分一个句子(输入)并将每个单词存储在列表中。我当前的代码拆分了句子,但没有将单词存储为列表。我怎么做?

def split_line(text):

    # split the text
    words = text.split()

    # for each word in the line:
    for word in words:

        # print the word
        print(words)

I want my Python function to split a sentence (input) and store each word in a list. My current code splits the sentence, but does not store the words as a list. How do I do that?

def split_line(text):

    # split the text
    words = text.split()

    # for each word in the line:
    for word in words:

        # print the word
        print(words)

回答 0

text.split()

这应该足以将每个单词存储在列表中。 words已经是句子中单词的列表,因此不需要循环。

其次,这可能是一个错字,但是您的循环有点混乱。如果您确实确实想使用附加,它将是:

words.append(word)

word.append(words)
text.split()

This should be enough to store each word in a list. words is already a list of the words from the sentence, so there is no need for the loop.

Second, it might be a typo, but you have your loop a little messed up. If you really did want to use append, it would be:

words.append(word)

not

word.append(words)

回答 1

text在任何连续的空格运行中拆分字符串。

words = text.split()      

text在分隔符上分割字符串","

words = text.split(",")   

单词变量将为a,list并包含text分隔符上的split 单词。

Splits the string in text on any consecutive runs of whitespace.

words = text.split()      

Split the string in text on delimiter: ",".

words = text.split(",")   

The words variable will be a list and contain the words from text split on the delimiter.


回答 2

str.split()

返回字符串中的单词列表,使用sep作为定界符…如果未指定sep或为None,则应用不同的拆分算法:连续空格的运行被视为单个分隔符,并且结果将包含如果字符串的开头或结尾有空格,则开头或结尾不得有空字符串。

>>> line="a sentence with a few words"
>>> line.split()
['a', 'sentence', 'with', 'a', 'few', 'words']
>>> 

str.split()

Return a list of the words in the string, using sep as the delimiter … If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.

>>> line="a sentence with a few words"
>>> line.split()
['a', 'sentence', 'with', 'a', 'few', 'words']
>>> 

回答 3

根据您打算如何处理列表中的句子,您可能需要查看Natural Language Took Kit。它主要处理文本处理和评估。您也可以使用它来解决您的问题:

import nltk
words = nltk.word_tokenize(raw_sentence)

这具有将标点符号分开的额外好处。

例:

>>> import nltk
>>> s = "The fox's foot grazed the sleeping dog, waking it."
>>> words = nltk.word_tokenize(s)
>>> words
['The', 'fox', "'s", 'foot', 'grazed', 'the', 'sleeping', 'dog', ',', 
'waking', 'it', '.']

这使您可以过滤掉不需要的标点,而仅使用单词。

请注意,string.split()如果您不打算对句子进行任何复杂的处理,则使用其他解决方案会更好。

[编辑]

Depending on what you plan to do with your sentence-as-a-list, you may want to look at the Natural Language Took Kit. It deals heavily with text processing and evaluation. You can also use it to solve your problem:

import nltk
words = nltk.word_tokenize(raw_sentence)

This has the added benefit of splitting out punctuation.

Example:

>>> import nltk
>>> s = "The fox's foot grazed the sleeping dog, waking it."
>>> words = nltk.word_tokenize(s)
>>> words
['The', 'fox', "'s", 'foot', 'grazed', 'the', 'sleeping', 'dog', ',', 
'waking', 'it', '.']

This allows you to filter out any punctuation you don’t want and use only words.

Please note that the other solutions using string.split() are better if you don’t plan on doing any complex manipulation of the sentence.

[Edited]


回答 4

这个算法怎么样?在空白处分割文本,然后修剪标点符号。这会仔细删除单词边缘的标点符号,而不会损害单词内的撇号,例如we're

>>> text
"'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'"

>>> text.split()
["'Oh,", 'you', "can't", 'help', "that,'", 'said', 'the', 'Cat:', "'we're", 'all', 'mad', 'here.', "I'm", 'mad.', "You're", "mad.'"]

>>> import string
>>> [word.strip(string.punctuation) for word in text.split()]
['Oh', 'you', "can't", 'help', 'that', 'said', 'the', 'Cat', "we're", 'all', 'mad', 'here', "I'm", 'mad', "You're", 'mad']

How about this algorithm? Split text on whitespace, then trim punctuation. This carefully removes punctuation from the edge of words, without harming apostrophes inside words such as we're.

>>> text
"'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'"

>>> text.split()
["'Oh,", 'you', "can't", 'help', "that,'", 'said', 'the', 'Cat:', "'we're", 'all', 'mad', 'here.', "I'm", 'mad.', "You're", "mad.'"]

>>> import string
>>> [word.strip(string.punctuation) for word in text.split()]
['Oh', 'you', "can't", 'help', 'that', 'said', 'the', 'Cat', "we're", 'all', 'mad', 'here', "I'm", 'mad', "You're", 'mad']

回答 5

我希望我的python函数拆分一个句子(输入)并将每个单词存储在列表中

str().split()方法执行此操作,它需要一个字符串,并将其拆分为一个列表:

>>> the_string = "this is a sentence"
>>> words = the_string.split(" ")
>>> print(words)
['this', 'is', 'a', 'sentence']
>>> type(words)
<type 'list'> # or <class 'list'> in Python 3.0

您遇到的问题是由于输入错误,print(words)而不是您写的print(word)

word变量重命名为current_word,这就是您所拥有的:

def split_line(text):
    words = text.split()
    for current_word in words:
        print(words)

..什么时候应该完成:

def split_line(text):
    words = text.split()
    for current_word in words:
        print(current_word)

如果出于某种原因要在for循环中手动构造列表,则可以使用list append()方法,也许是因为您想对所有单词都小写(例如):

my_list = [] # make empty list
for current_word in words:
    my_list.append(current_word.lower())

或者使用list-comprehension更加整洁:

my_list = [current_word.lower() for current_word in words]

I want my python function to split a sentence (input) and store each word in a list

The str().split() method does this, it takes a string, splits it into a list:

>>> the_string = "this is a sentence"
>>> words = the_string.split(" ")
>>> print(words)
['this', 'is', 'a', 'sentence']
>>> type(words)
<type 'list'> # or <class 'list'> in Python 3.0

The problem you’re having is because of a typo, you wrote print(words) instead of print(word):

Renaming the word variable to current_word, this is what you had:

def split_line(text):
    words = text.split()
    for current_word in words:
        print(words)

..when you should have done:

def split_line(text):
    words = text.split()
    for current_word in words:
        print(current_word)

If for some reason you want to manually construct a list in the for loop, you would use the list append() method, perhaps because you want to lower-case all words (for example):

my_list = [] # make empty list
for current_word in words:
    my_list.append(current_word.lower())

Or more a bit neater, using a list-comprehension:

my_list = [current_word.lower() for current_word in words]

回答 6

shlex具有.split()功能。它的不同之处str.split()在于,它不保留引号,并且将带引号的词组视为一个单词:

>>> import shlex
>>> shlex.split("sudo echo 'foo && bar'")
['sudo', 'echo', 'foo && bar']

shlex has a .split() function. It differs from str.split() in that it does not preserve quotes and treats a quoted phrase as a single word:

>>> import shlex
>>> shlex.split("sudo echo 'foo && bar'")
['sudo', 'echo', 'foo && bar']

回答 7

如果要在列表中包含单词/句子的所有字符,请执行以下操作:

print(list("word"))
#  ['w', 'o', 'r', 'd']


print(list("some sentence"))
#  ['s', 'o', 'm', 'e', ' ', 's', 'e', 'n', 't', 'e', 'n', 'c', 'e']

If you want all the chars of a word/sentence in a list, do this:

print(list("word"))
#  ['w', 'o', 'r', 'd']


print(list("some sentence"))
#  ['s', 'o', 'm', 'e', ' ', 's', 'e', 'n', 't', 'e', 'n', 'c', 'e']

回答 8

我认为您因错字而感到困惑。

更换print(words)print(word)您的循环内已印刷在另一条线路的每一个字

I think you are confused because of a typo.

Replace print(words) with print(word) inside your loop to have every word printed on a different line