问题:找不到资源u’tokenizers / punkt / english.pickle’

我的代码:

import nltk.data
tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')

错误信息:

[ec2-user@ip-172-31-31-31 sentiment]$ python mapper_local_v1.0.py
Traceback (most recent call last):
File "mapper_local_v1.0.py", line 16, in <module>

    tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')

File "/usr/lib/python2.6/site-packages/nltk/data.py", line 774, in load

    opened_resource = _open(resource_url)

File "/usr/lib/python2.6/site-packages/nltk/data.py", line 888, in _open

    return find(path_, path + ['']).open()

File "/usr/lib/python2.6/site-packages/nltk/data.py", line 618, in find

    raise LookupError(resource_not_found)

LookupError:

Resource u'tokenizers/punkt/english.pickle' not found.  Please
use the NLTK Downloader to obtain the resource:

    >>>nltk.download()

Searched in:
- '/home/ec2-user/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''

我正在尝试在Unix机器上运行此程序:

根据错误消息,我从unix机器登录python shell,然后使用以下命令:

import nltk
nltk.download()

然后我使用d-down加载程序和l-list选项下载了所有可用的内容,但问题仍然存在。

我尽力在Internet中找到解决方案,但得到的解决方案与上述步骤中提到的解决方案相同。

My Code:

import nltk.data
tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')

ERROR Message:

[ec2-user@ip-172-31-31-31 sentiment]$ python mapper_local_v1.0.py
Traceback (most recent call last):
File "mapper_local_v1.0.py", line 16, in <module>

    tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')

File "/usr/lib/python2.6/site-packages/nltk/data.py", line 774, in load

    opened_resource = _open(resource_url)

File "/usr/lib/python2.6/site-packages/nltk/data.py", line 888, in _open

    return find(path_, path + ['']).open()

File "/usr/lib/python2.6/site-packages/nltk/data.py", line 618, in find

    raise LookupError(resource_not_found)

LookupError:

Resource u'tokenizers/punkt/english.pickle' not found.  Please
use the NLTK Downloader to obtain the resource:

    >>>nltk.download()

Searched in:
- '/home/ec2-user/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''

I’m trying to run this program in Unix machine:

As per the error message, I logged into python shell from my unix machine then I used the below commands:

import nltk
nltk.download()

and then I downloaded all the available things using d- down loader and l- list options but still the problem persists.

I tried my best to find the solution in internet but I got the same solution what I did as I mentioned in my above steps.


回答 0

要添加到alvas的答案中,您只能下载punkt语料库:

nltk.download('punkt')

all对我来说下载声音听起来像是过分杀了。除非那是您想要的。

To add to alvas’ answer, you can download only the punkt corpus:

nltk.download('punkt')

Downloading all sounds like overkill to me. Unless that’s what you want.


回答 1

如果您只想下载punkt模型:

import nltk
nltk.download('punkt')

如果不确定所需的数据/模型,可以从NLTK 安装流行的数据集,模型和标记器:

import nltk
nltk.download('popular')

使用上面的命令,无需使用GUI下载数据集。

If you’re looking to only download the punkt model:

import nltk
nltk.download('punkt')

If you’re unsure which data/model you need, you can install the popular datasets, models and taggers from NLTK:

import nltk
nltk.download('popular')

With the above command, there is no need to use the GUI to download the datasets.


回答 2

我得到了解决方案:

import nltk
nltk.download()

NLTK下载器启动后

d)下载l)列表u)更新c)配置h)帮助q)退出

下载器> d

下载哪个软件包(l = list; x = cancel)?标识符> punkt

I got the solution:

import nltk
nltk.download()

once the NLTK Downloader starts

d) Download l) List u) Update c) Config h) Help q) Quit

Downloader> d

Download which package (l=list; x=cancel)? Identifier> punkt


回答 3

您可以从外壳执行:

sudo python -m nltk.downloader punkt 

如果要安装流行的NLTK语料库/模型:

sudo python -m nltk.downloader popular

如果要安装所有 NLTK语料库/模型:

sudo python -m nltk.downloader all

列出您已下载的资源:

python -c 'import os; import nltk; print os.listdir(nltk.data.find("corpora"))'
python -c 'import os; import nltk; print os.listdir(nltk.data.find("tokenizers"))'

From the shell you can execute:

sudo python -m nltk.downloader punkt 

If you want to install the popular NLTK corpora/models:

sudo python -m nltk.downloader popular

If you want to install all NLTK corpora/models:

sudo python -m nltk.downloader all

To list the resources you have downloaded:

python -c 'import os; import nltk; print os.listdir(nltk.data.find("corpora"))'
python -c 'import os; import nltk; print os.listdir(nltk.data.find("tokenizers"))'

回答 4

import nltk
nltk.download('punkt')

打开Python提示符并运行以上语句。

sent_tokenize函数使用的一个实例PunktSentenceTokenizernltk.tokenize.punkt模块。该实例已经过培训,并适用于许多欧洲语言。因此,它知道哪些标点符号和字符标记了句子的结尾和新句子的开头。

import nltk
nltk.download('punkt')

Open the Python prompt and run the above statements.

The sent_tokenize function uses an instance of PunktSentenceTokenizer from the nltk.tokenize.punkt module. This instance has already been trained and works well for many European languages. So it knows what punctuation and characters mark the end of a sentence and the beginning of a new sentence.


回答 5

最近我也发生了同样的事情,您只需要下载“ punkt”软件包,它就可以工作。

在“下载所有可用内容”之后执行“列表”(l)时,所有内容是否都标记为以下行?:

[*] punkt............... Punkt Tokenizer Models

如果看到带有星星的这条线,则表示您已经拥有它,并且nltk应该能够加载它。

The same thing happened to me recently, you just need to download the “punkt” package and it should work.

When you execute “list” (l) after having “downloaded all the available things”, is everything marked like the following line?:

[*] punkt............... Punkt Tokenizer Models

If you see this line with the star, it means you have it, and nltk should be able to load it.


回答 6

通过键入转到python控制台

$Python

在您的终端中。然后,在python shell中键入以下2条命令以安装相应的软件包:

>> nltk.download(’punkt’)>> nltk.download(’averaged_perceptron_tagger’)

这为我解决了这个问题。

Go to python console by typing

$ python

in your terminal. Then, type the following 2 commands in your python shell to install the respective packages:

>> nltk.download(‘punkt’) >> nltk.download(‘averaged_perceptron_tagger’)

This solved the issue for me.


回答 7

我的问题是我叫nltk.download('all')root用户,但是最终使用nltk的进程是另一个用户,该用户无权访问下载内容的/ root / nltk_data。

因此,我只是以递归方式将所有内容从下载位置复制到NLTK希望找到的路径之一,如下所示:

cp -R /root/nltk_data/ /home/ubuntu/nltk_data

My issue was that I called nltk.download('all') as the root user, but the process that eventually used nltk was another user who didn’t have access to /root/nltk_data where the content was downloaded.

So I simply recursively copied everything from the download location to one of the paths where NLTK was looking to find it like this:

cp -R /root/nltk_data/ /home/ubuntu/nltk_data

回答 8

  1. 执行以下代码:

    import nltk
    nltk.download()
  2. 此后,将弹出NLTK下载器。

  3. 选择所有软件包。
  4. 下载punkt。
  1. Execute the following code:

    import nltk
    nltk.download()
    
  2. After this, NLTK downloader will pop out.

  3. Select All packages.
  4. Download punkt.

回答 9

尽管导入了以下内容,但还是出现错误,

import nltk
nltk.download()

但是对于谷歌colab,这解决了我的问题。

   !python3 -c "import nltk; nltk.download('all')"

I was getting an error despite importing the following,

import nltk
nltk.download()

but for google colab this solved my issue.

   !python3 -c "import nltk; nltk.download('all')"

回答 10

简单的nltk.download()无法解决此问题。我尝试了以下方法,它对我有用:

在nltk文件夹中,创建一个tokenizers文件夹,然后将您的punkt文件夹复制到tokenizers文件夹中。

这将起作用。 文件夹结构必须如图所示

Simple nltk.download() will not solve this issue. I tried the below and it worked for me:

in the nltk folder create a tokenizers folder and copy your punkt folder into tokenizers folder.

This will work.! the folder structure needs to be as shown in the picture


回答 11

您需要重新排列文件夹将tokenizers文件夹移到nltk_data文件夹中。如果您的nltk_data文件corpora夹中包含 tokenizers文件夹,则此方法不起作用

You need to rearrange your folders Move your tokenizers folder into nltk_data folder. This doesn’t work if you have nltk_data folder containing corpora folder containing tokenizers folder


回答 12

对我而言,上述方法均无效,因此我只是从网站http://www.nltk.org/nltk_data/手动下载了所有文件,并将它们也手动放置在“ nltk_data”内部的“ tokenizers”文件中”文件夹。不是一个漂亮的解决方案,但仍然是一个解决方案。

For me nothing of the above worked, so I just downloaded all the files by hand from the web site http://www.nltk.org/nltk_data/ and I put them also by hand in a file “tokenizers” inside of “nltk_data” folder. Not a pretty solution but still a solution.


回答 13

添加此行代码后,该问题将得到解决:

nltk.download('punkt')

After adding this line of code, the issue will be fixed:

nltk.download('punkt')

回答 14

我遇到了同样的问题。下载所有内容后,仍然存在“ punkt”错误。我在Windows机器上的C:\ Users \ vaibhav \ AppData \ Roaming \ nltk_data \ tokenizers中搜索了程序包,在那里可以看到“ punkt.zip”。我意识到,该zip尚未以某种方式提取到C:\ Users \ vaibhav \ AppData \ Roaming \ nltk_data \ tokenizers \ punk中。一旦我解压缩后,它就像音乐一样工作。

I faced same issue. After downloading everything, still ‘punkt’ error was there. I searched package on my windows machine at C:\Users\vaibhav\AppData\Roaming\nltk_data\tokenizers and I can see ‘punkt.zip’ present there. I realized that somehow the zip has not been extracted into C:\Users\vaibhav\AppData\Roaming\nltk_data\tokenizers\punk. Once I extracted the zip, it worked like music.


回答 15

只要确保您使用的是JupyterNotebook,然后在笔记本中执行以下操作:

import nltk

nltk.download()

然后将出现一个弹出窗口(显示信息https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml),您必须从中下载所有内容。

然后重新运行您的代码。

Just make sure you are using Jupyter Notebook and in a notebook, do the following:

import nltk

nltk.download()

Then one popup window will appear (showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml) From that you have to download everything.

Then rerun your code.


回答 16

对我来说,它可以通过使用“ nltk”来解决:

http://www.nltk.org/howto/data.html

使用nltk.data.load加载english.pickle失败

sent_tokenizer=nltk.data.load('nltk:tokenizers/punkt/english.pickle')

For me it got solved by using “nltk:”

http://www.nltk.org/howto/data.html

Failed loading english.pickle with nltk.data.load

sent_tokenizer=nltk.data.load('nltk:tokenizers/punkt/english.pickle')

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。