>LookupError:>*********************************************************************>Resource'tokenizers/punkt/english.pickle'not found.Please use the NLTK Downloader to obtain the resource: nltk.download().Searchedin:>-'C:\\Users\\Martinos/nltk_data'>-'C:\\nltk_data'>-'D:\\nltk_data'>-'E:\\nltk_data'>-'E:\\Python26\\nltk_data'>-'E:\\Python26\\lib\\nltk_data'>-'C:\\Users\\Martinos\\AppData\\Roaming\\nltk_data'>**********************************************************************
I had this same problem. Go into a python shell and type:
>>> import nltk
>>> nltk.download()
Then an installation window appears. Go to the ‘Models’ tab and select ‘punkt’ from under the ‘Identifier’ column. Then click Download and it will install the necessary files. Then it should work!
import nltk
nltk.download('punkt')
from nltk import word_tokenize,sent_tokenize
You can download the tokenizers by passing punkt as an argument to the download function. The word and sentence tokenizers are then available on nltk.
If you want to download everything i.e chunkers, grammars, misc, sentiment, taggers, corpora, help, models, stemmers, tokenizers, do not pass any arguments like this.
# Do this in a separate python interpreter session, since you only have to do it onceimport nltk
nltk.download('punkt')# Do this in your ipython notebook or analysis scriptfrom nltk.tokenize import word_tokenize
sentences =["Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.","Professor Plum has a green plant in his study.","Miss Scarlett watered Professor Plum's green plant while he was away from his office last week."]
sentences_tokenized =[]for s in sentences:
sentences_tokenized.append(word_tokenize(s))
# Do this in a separate python interpreter session, since you only have to do it once
import nltk
nltk.download('punkt')
# Do this in your ipython notebook or analysis script
from nltk.tokenize import word_tokenize
sentences = [
"Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.",
"Professor Plum has a green plant in his study.",
"Miss Scarlett watered Professor Plum's green plant while he was away from his office last week."
]
sentences_tokenized = []
for s in sentences:
sentences_tokenized.append(word_tokenize(s))
sentences_tokenized is a list of a list of tokens:
from nltk.data import load
from nltk.tokenize.treebank importTreebankWordTokenizer
sentences =["Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.","Professor Plum has a green plant in his study.","Miss Scarlett watered Professor Plum's green plant while he was away from his office last week."]
tokenizer = load('file:C:/english.pickle')
treebank_word_tokenize =TreebankWordTokenizer().tokenize
wordToken =[]for sent in sentences:
subSentToken =[]for subSent in tokenizer.tokenize(sent):
subSentToken.extend([token for token in treebank_word_tokenize(subSent)])
wordToken.append(subSentToken)for token in wordToken:print token
nltk have its pre-trained tokenizer models. Model is downloading from internally predefined web sources and stored at path of installed nltk package while executing following possible function calls.
E.g. 1
tokenizer = nltk.data.load(‘nltk:tokenizers/punkt/english.pickle’)
E.g. 2
nltk.download(‘punkt’)
If you call above sentence in your code, Make sure you have internet connection without any firewall protections.
I would like to share some more better alter-net way to resolve above issue with more better deep understandings.
Please follow following steps and enjoy english word tokenization using nltk.
Step 1: First download the “english.pickle” model following web path.
Step 2: Extract the downloaded “punkt.zip” file and find the “english.pickle” file from it and place in C drive.
Step 3: copy paste following code and execute.
from nltk.data import load
from nltk.tokenize.treebank import TreebankWordTokenizer
sentences = [
"Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.",
"Professor Plum has a green plant in his study.",
"Miss Scarlett watered Professor Plum's green plant while he was away from his office last week."
]
tokenizer = load('file:C:/english.pickle')
treebank_word_tokenize = TreebankWordTokenizer().tokenize
wordToken = []
for sent in sentences:
subSentToken = []
for subSent in tokenizer.tokenize(sent):
subSentToken.extend([token for token in treebank_word_tokenize(subSent)])
wordToken.append(subSentToken)
for token in wordToken:
print token
i came across this problem when i was trying to do pos tagging in nltk.
the way i got it correct is by making a new directory along with corpora directory named “taggers” and copying max_pos_tagger in directory taggers.
hope it works for you too. best of luck with it!!!.
In Spyder, go to your active shell and download nltk using below 2 commands.
import nltk
nltk.download()
Then you should see NLTK downloader window open as below, Go to ‘Models’ tab in this window and click on ‘punkt’ and download ‘punkt’
The punkt tokenizers data is quite large at over 35 MB, this can be a big deal if like me you are running nltk in an environment such as lambda that has limited resources.
If you only need one or perhaps a few language tokenizers you can drastically reduce the size of the data by only including those languages .pickle files.
If all you only need to support English then your nltk data size can be reduced to 407 KB (for the python 3 version).
Somewhere in your environment create the folders: nltk_data/tokenizers/punkt, if using python 3 add another folder PY3 so that your new directory structure looks like nltk_data/tokenizers/punkt/PY3. In my case I created these folders at the root of my project.
Extract the zip and move the .pickle files for the languages you want to support into the punkt folder you just created. Note: Python 3 users should use the pickles from the PY3 folder. With your language files loaded it should look something like: example-folder-stucture
Now you just need to add your nltk_data folder to the search paths, assuming your data is not in one of the pre-defined search paths. You can add your data using either the environment variable NLTK_DATA='path/to/your/nltk_data'. You can also add a custom path at runtime in python by doing:
from nltk import data
data.path += ['/path/to/your/nltk_data']
NOTE: If you don’t need to load in the data at runtime or bundle the data with your code, it would be best to create your nltk_data folders at the built-in locations that nltk looks for.
In Python-3.6 I can see the suggestion in the traceback. That’s quite helpful.
Hence I will say you guys to pay attention to the error you got, most of the time answers are within that problem ;).
And then as suggested by other folks here either using python terminal or using a command like python -c "import nltk; nltk.download('wordnet')" we can install them on the fly.
You just need to run that command once and then it will save the data locally in your home directory.
回答 14
使用分配的文件夹进行多次下载时,我遇到了类似的问题,我不得不手动添加数据路径:
一次下载,可以按以下步骤完成(工作)
import os as _os
from nltk.corpus import stopwords
from nltk import download as nltk_download
nltk_download('stopwords', download_dir=_os.path.join(get_project_root_path(),'temp'), raise_on_error=True)
stop_words: list = stopwords.words('english')
import os as _os
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk import download as nltk_download
nltk_download(['stopwords','punkt'], download_dir=_os.path.join(get_project_root_path(),'temp'), raise_on_error=True)print(stopwords.words('english'))print(word_tokenize("I am trying to find the download path 99."))
错误:
找不到资源点。请使用NLTK下载器获取资源:
导入nltk nltk.download(’punkt’)
现在,如果我将ntlk数据路径附加到我的下载路径中,则它可以正常工作:
import os as _os
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk import download as nltk_download
from nltk.data import path as nltk_path
nltk_path.append( _os.path.join(get_project_root_path(),'temp'))
nltk_download(['stopwords','punkt'], download_dir=_os.path.join(get_project_root_path(),'temp'), raise_on_error=True)print(stopwords.words('english'))print(word_tokenize("I am trying to find the download path 99."))
I had similar issue when using an assigned folder for multiple downloads, and I had to append the data path manually:
single download, can be achived as followed (works)
import os as _os
from nltk.corpus import stopwords
from nltk import download as nltk_download
nltk_download('stopwords', download_dir=_os.path.join(get_project_root_path(), 'temp'), raise_on_error=True)
stop_words: list = stopwords.words('english')
This code works, meaning that nltk remembers the download path passed in the download fuction. On the other nads if I download a subsequent package I get similar error as described by user:
Multiple downloads raise an error:
import os as _os
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk import download as nltk_download
nltk_download(['stopwords', 'punkt'], download_dir=_os.path.join(get_project_root_path(), 'temp'), raise_on_error=True)
print(stopwords.words('english'))
print(word_tokenize("I am trying to find the download path 99."))
Error:
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
import nltk
nltk.download(‘punkt’)
Now if I append the ntlk data path with my download path, it works:
import os as _os
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk import download as nltk_download
from nltk.data import path as nltk_path
nltk_path.append( _os.path.join(get_project_root_path(), 'temp'))
nltk_download(['stopwords', 'punkt'], download_dir=_os.path.join(get_project_root_path(), 'temp'), raise_on_error=True)
print(stopwords.words('english'))
print(word_tokenize("I am trying to find the download path 99."))
This works… Not sure why works in one case but not the other, but error message seems to imply that it doesn’t check into the download folder the second time.
NB: using windows8.1/python3.7/nltk3.5