403 httperror in schedule but runs fine in django shell : Forums : PythonAnywhere

403 httperror in schedule but runs fine in django shell

Hi,

This simple code works in Django shell (python manage.py shell) but I get "HTTP Error 403: Forbidden " when I schedule it.

I have done extensive search, can't find the solution.

Here is the code:

import pandas as pd

df = pd.read_html('http://www.domanname.com')

The code run in the shell without issue. I can also run it in Jupyter without isse. It was running fine with the scheduler on a couple of days ago without a issue. I now get this error. I have tried changing python version, various suggestions on this forum, still doesn't work. The folder is also fine because I can use the same code to connect to the database. It's just the read_html part that throws the 403 error. I have been a paying customer for more that 2 weeks, so whitelisting is not the issue.What else could cause this issue?

Michael

mcisse | 12 posts | July 28, 2017, 11:38 a.m. | permalink

If you also print out the content body, you could see if the 403 is due to our proxy (which shouldn't be the case, since paying users should not be going through our proxy), or if it is due to the endpoint (eg: dominname.com is blocking your access).

conrad | 4232 posts | PythonAnywhere staff | July 28, 2017, 12:13 p.m. | permalink

Is it possible for domain.com to block the scheduler but not block the shell? I can run it in the console without any issue. The problem is with the scheduler.

mcisse | 12 posts | July 28, 2017, 1:09 p.m. | permalink

Sure. There are many ways they could be implementing the 403 that would block it from the task server, but not from a console server.

glenn | 10043 posts | PythonAnywhere staff | July 28, 2017, 2:22 p.m. | permalink

Is there any solution to this? If you work for PA, can you check my scheduled task log file?

mcisse | 12 posts | July 28, 2017, 2:35 p.m. | permalink

I have not been able to make any progress over the last 2 days because of this issue.Maybe some of my settings are off. I was having a timezone issue and tried to fix it by adding a snippet of code here and there. But I have removed all of those codes. Could that cause any issue?

mcisse | 12 posts | July 28, 2017, 2:38 p.m. | permalink

I doubt it could be the timezone, but it's not impossible. As for your log files, they just confirm that you're getting a 403. Have you tried getting the same URL using a different method in your scheduled task (like directly using requests or curl)?

glenn | 10043 posts | PythonAnywhere staff | July 28, 2017, 3:22 p.m. | permalink

Not very familiar with the schedule. How can I put the code itself in the scheduler?

mcisse | 12 posts | July 28, 2017, 3:31 p.m. | permalink

can you rewrite this code differently for the scheduler?

import pandas as pd

readthis = read_html('http://domainename.com')

mcisse | 12 posts | July 28, 2017, 3:34 p.m. | permalink

So I would suggest using requests to send a request to that domain, and printing out the response content to see what it says. (eg: it might have body that is more descriptive than just the 403 status code)

conrad | 4232 posts | PythonAnywhere staff | July 28, 2017, 3:52 p.m. | permalink

Do you always use the same server for the cheduler? If not, it is possible that one server is blocked and the other is not.

mcisse | 12 posts | July 28, 2017, 3:57 p.m. | permalink

Now it is working. I have not changed anything. That is troubling!

mcisse | 12 posts | July 28, 2017, 4 p.m. | permalink

How can I fix this issue if it ever comes up again?

mcisse | 12 posts | July 28, 2017, 4:05 p.m. | permalink

You could catch and handle the exception, and maybe try again when it fails.

glenn | 10043 posts | PythonAnywhere staff | July 28, 2017, 4:44 p.m. | permalink

Will try that.

Thanks, Glenn!

mcisse | 12 posts | July 28, 2017, 7:11 p.m. | permalink