主页:https://textblob.readthedocs.io/
TextBlob是一个Python(2和3)库,用于处理文本数据。它提供了一个简单的API,用于深入研究常见的自然语言处理(NLP)任务,如词性标记、名词短语提取、情感分析、分类、翻译等
from textblob import TextBlob
text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.
Snide comparisons to gelatin be damned, it's a concept with the most
devastating of potential consequences, not unlike the grey goo scenario
proposed by technological theorists fearful of
artificial intelligence run rampant.
'''
blob = TextBlob(text)
blob.tags # [('The', 'DT'), ('titular', 'JJ'),
# ('threat', 'NN'), ('of', 'IN'), ...]
blob.noun_phrases # WordList(['titular threat', 'blob',
# 'ultimate movie monster',
# 'amoeba-like mass', ...])
for sentence in blob.sentences:
print(sentence.sentiment.polarity)
# 0.060
# -0.341
TextBlob站在NLTK和pattern,并且两者都玩得很好
功能
- 名词短语提取
- 词性标注
- 情绪分析
- 分类(朴素贝叶斯、决策树)
- 标记化(将文本拆分成单词和句子)
- 词频和词频
- 解析
- N元语法
- 词形变化(复数和单数)与词汇化
- 拼写更正
- 通过扩展添加新模型或语言
- Wordnet集成
现在就去拿吧
$ pip install -U textblob
$ python -m textblob.download_corpora
示例
查看更多示例,请参阅Quickstart guide
文档
有关完整文档,请访问https://textblob.readthedocs.io/
要求
项目链接
许可证
麻省理工学院有执照。请参阅捆绑的LICENSE有关更多详细信息,请提交文件