Forums

Using NLTK in a file

Hi, everyone.

I'm new to PA and trying to figure out how this works. Please excuse my ignorance because I suspect this is a basic question.

If I start up a Python shell, I can use NLTK with no trouble. Specifically, I did

import nltk
ntlk.download()
from nltk.book import *
text1.concordance("monstrous")

At the download step, I chose the "book" option for everything that goes with the NLTK book. This approach worked fine.

Now, moving over to a file, I'm trying to do the same thing but contained within nltk.py rather than from the python shell. So something like

import nltk

fname = raw_input('Enter file name: ')
fh = open(fname)

print fh.concordance("the")

But this method doesn't seem to connect to the NLTK commands, and I also don't know how to point it to the texts (text1, text2, etc.) that were downloaded with the first approach.

Any help would be much appreciated!

hi- when you say it doesn't connect to the NLTK commands, what error are you seeing?

Thanks for your question.

If I run the code above, I get "AttributeError: 'file' object has no attribute 'concordance'." That means it's not finding the NLTK commands, I think?

That's when I enter a filename that I've uploaded myself. If I enter "text1" as the filename, as I would in the python shell version after downloading the sample files, I get "No such file or directory: 'text1'." In this case, I understand the error, in that I know I haven't gotten the text1 file ready for the program to use, but I don't know how to solve the problem.

HI there, are you sure you're using the same version of python in the file as in the console? some tips here http://help.pythonanywhere.com/pages/SaveAndRunPythonVersion

Following those instructions, I changed the file to 2.7, which is what I am using in the console, but I get the same error.

Another clue? In the file, the "import nltk" line has a warning that "nltk is imported but not used." It seems not to be connecting the concordance command to nltk.

Ah, I think the line you're missing is this one:

from nltk.book import *

import * is a convenient shorthand in python, but it tends to be recommended against when you're actually writing scripts (because it's hard to tell what actually gets imported).

it might be better to use something like:

from nltk.book import text1

also: don't name your file nltk.py because that will confuse the import system. when python tries to import nltk, it won't know whether to import the package nltk, or just try and import your file.

finally, that fh.concordance bit doesn't look like it's going to work. you may need to read a couple more nltk tutorials to understand how to use nltk on a file that you're loading from disk

Thanks a lot. The filename issue was a big part of the problem, I think. I'm not all the way there yet but getting a lot closer.

:) we're here if you get stuck again!