I'm having same problem from a python command-line program. There is a very
explicit traceback as to the cause of the problem.
Traceback (most recent call last):
File "classify.py", line 118, in <module>
nb.train_from_data(data)
File "classify.py", line 42, in train_from_data
self.train(doc, category)
File "classify.py", line 101, in train
features = self.get_features(item)
File "classify.py", line 57, in get_features
all_words = [w for w in word_tokenize(document) if len(w) > 3 and len(w) < 16]
File "/usr/local/lib/python2.7/dist-packages/nltk/tokenize/init.py", line 93, in word_tokenize
return [token for sent in sent_tokenize(text)
File "/usr/local/lib/python2.7/dist-packages/nltk/tokenize/init.py", line 81, in sent_tokenize
tokenizer = load('tokenizers/punkt/english.pickle')
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 774, in load
opened_resource = open(resource_url)
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 888, in _open
return find(path, path + ['']).open()
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 618, in find
raise LookupError(resource_not_found)
LookupError:
Resource u'tokenizers/punkt/english.pickle' not found. Please
use the NLTK Downloader to obtain the resource: >>>
nltk.download()
Searched in:
- '/home/funderburkjim/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''
So a particular tokenizer resource is required, for tokenizing English.
This word_tokenizer is such a frequent feature that it's lack of functioning in
PythonAnywhere should be considered a bug in the PythonAnywhere installation of the
NLTK library. At least that's my opinion and suggestion.
Incidentally, I didn't understand the solution mentioned above, namely
"downloading the nltk package using nltk.download() -> d -> book"