I'm looking for help with the following error "LookupError(resource_not_found)", full error below. Doing some research, this error is thrown when the NLTK cannot find the downloaded resources e.g. punkt.
**NO MATCH**
2024-12-06 17:40:39,661: Exception on /submit/eh4RJ1TfyR2hilLTKEsxAMrHrrl3tp [POST]
Traceback (most recent call last):
File "/home/my_username/.local/lib/python3.10/site-packages/flask/app.py", line 1473, in wsgi_app
response = self.full_dispatch_request()
File "/home/my_username/.local/lib/python3.10/site-packages/flask/app.py", line 882, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/my_username/.local/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request
rv = self.dispatch_request()
File "/home/my_username/.local/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
File "/home/my_username/my_username_v1/embeddedAppEngine_v1/app.py", line 377, in submit
similarity_results = compare_sd(submitted_details)
File "/home/my_username/my_username_v1/embeddedAppEngine_v1/app.py", line 343, in compare_sd
processed_text1 = preprocess_text(sd)
File "/home/my_username/my_username_v1/embeddedAppEngine_v1/app.py", line 324, in preprocess_text
tokens = word_tokenize(text.lower())
File "/usr/local/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 129, in word_tokenize
sentences = [text] if preserve_line else sent_tokenize(text, language)
File "/usr/local/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
tokenizer = load(f"tokenizers/punkt/{language}.pickle")
File "/usr/local/lib/python3.10/site-packages/nltk/data.py", line 750, in load
opened_resource = _open(resource_url)
File "/usr/local/lib/python3.10/site-packages/nltk/data.py", line 876, in _open
return find(path_, path + [""]).open()
File "/usr/local/lib/python3.10/site-packages/nltk/data.py", line 583, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource #033[93mpunkt#033[0m not found.
Please use the NLTK Downloader to obtain the resource:
**NO MATCH**
#033[31m>>> import nltk
>>> nltk.download('punkt')
#033[0m
For more information see: https://www.nltk.org/data.html
**NO MATCH**
Attempted to load #033[93mtokenizers/punkt/PY3/english.pickle#033[0m
**NO MATCH**
Searched in:
- '/home/my_username/nltk_data'
- '/usr/local/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/local/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- ''
**********************************************************************
**NO MATCH**
As you can see, I've already downloaded both the 'punkt' and 'stopwords' used in my code.
[nltk_data] Downloading package punkt_tab to
[nltk_data] /home/my_username/nltk_data...
[nltk_data] Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package stopwords to
[nltk_data] /home/my_username/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.
I can validate that the data is downloaded at my root directory ('home/my_username').
17:02 ~ $ ls
README.txt nltk_data
17:51 ~ $
Here is my code. Any insight would be much appreciated.
def compare_sd(submitted_details):
# read the csv
df_trained_data = pd.read_csv('UpdateNotesTraining.csv')
trained_data = df_trained_data[[Input', 'Output']].values.tolist()
hs = 0
for line in trained_data:
sd = submitted_details
td = line[0]
sr = line[1]
# Preprocess texts
processed_text1 = preprocess_text(sd)
processed_text2 = preprocess_text(td)
# Create a DataFrame
df = pd.DataFrame({'text': [processed_text1, processed_text2]})
# Create TF-IDF vectors
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(df['text'])
# Calculate cosine similarity
cosine_sim = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])
similarity_score = cosine_sim[0][0]
if similarity_score > hs:
hs = similarity_score
return sr, hs