Forums

NLTK Error - "LookupError(resource_not_found)"

I'm looking for help with the following error "LookupError(resource_not_found)", full error below. Doing some research, this error is thrown when the NLTK cannot find the downloaded resources e.g. punkt.

**NO MATCH**
2024-12-06 17:40:39,661: Exception on /submit/eh4RJ1TfyR2hilLTKEsxAMrHrrl3tp [POST]
Traceback (most recent call last):
  File "/home/my_username/.local/lib/python3.10/site-packages/flask/app.py", line 1473, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/my_username/.local/lib/python3.10/site-packages/flask/app.py", line 882, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/my_username/.local/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/my_username/.local/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/home/my_username/my_username_v1/embeddedAppEngine_v1/app.py", line 377, in submit
    similarity_results = compare_sd(submitted_details)
  File "/home/my_username/my_username_v1/embeddedAppEngine_v1/app.py", line 343, in compare_sd
    processed_text1 = preprocess_text(sd)
  File "/home/my_username/my_username_v1/embeddedAppEngine_v1/app.py", line 324, in preprocess_text
    tokens = word_tokenize(text.lower())
  File "/usr/local/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 129, in word_tokenize
    sentences = [text] if preserve_line else sent_tokenize(text, language)
  File "/usr/local/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
    tokenizer = load(f"tokenizers/punkt/{language}.pickle")
  File "/usr/local/lib/python3.10/site-packages/nltk/data.py", line 750, in load
    opened_resource = _open(resource_url)
  File "/usr/local/lib/python3.10/site-packages/nltk/data.py", line 876, in _open
    return find(path_, path + [""]).open()
  File "/usr/local/lib/python3.10/site-packages/nltk/data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource #033[93mpunkt#033[0m not found.
  Please use the NLTK Downloader to obtain the resource:
**NO MATCH**
  #033[31m>>> import nltk
  >>> nltk.download('punkt')
  #033[0m
  For more information see: https://www.nltk.org/data.html
**NO MATCH**
  Attempted to load #033[93mtokenizers/punkt/PY3/english.pickle#033[0m
**NO MATCH**
  Searched in:
    - '/home/my_username/nltk_data'
    - '/usr/local/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************
**NO MATCH**

As you can see, I've already downloaded both the 'punkt' and 'stopwords' used in my code.

[nltk_data] Downloading package punkt_tab to
[nltk_data]     /home/my_username/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/my_username/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.

I can validate that the data is downloaded at my root directory ('home/my_username').

17:02 ~ $ ls
README.txt  nltk_data
17:51 ~ $

Here is my code. Any insight would be much appreciated.

def compare_sd(submitted_details):
    # read the csv
    df_trained_data = pd.read_csv('UpdateNotesTraining.csv')
    trained_data = df_trained_data[[Input', 'Output']].values.tolist()
    hs = 0

    for line in trained_data:
        sd = submitted_details
        td = line[0]
        sr = line[1]

        # Preprocess texts
        processed_text1 = preprocess_text(sd)
        processed_text2 = preprocess_text(td)

        # Create a DataFrame
        df = pd.DataFrame({'text': [processed_text1, processed_text2]})

        # Create TF-IDF vectors
        vectorizer = TfidfVectorizer()
        tfidf_matrix = vectorizer.fit_transform(df['text'])

        # Calculate cosine similarity
        cosine_sim = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])

        similarity_score = cosine_sim[0][0]

        if similarity_score > hs:
            hs = similarity_score
    return sr, hs

If you look in the directory /home/notemage/nltk_data/tokenizers/, do you see the downloaded punkt directory that it appears to be looking for?

....oh my word... I spent 2 hours on this before posting... You are right, I downloaded "punkt_tab" instead of just "punkt". Thank you very much.

Glad to hear that you found it!