Forums

403 httperror in schedule but runs fine in django shell

Hi,

This simple code works in Django shell (python manage.py shell) but I get "HTTP Error 403: Forbidden " when I schedule it.

I have done extensive search, can't find the solution.

Here is the code:

import pandas as pd

df = pd.read_html('http://www.domanname.com')

The code run in the shell without issue. I can also run it in Jupyter without isse. It was running fine with the scheduler on a couple of days ago without a issue. I now get this error. I have tried changing python version, various suggestions on this forum, still doesn't work. The folder is also fine because I can use the same code to connect to the database. It's just the read_html part that throws the 403 error. I have been a paying customer for more that 2 weeks, so whitelisting is not the issue.What else could cause this issue?

Michael

If you also print out the content body, you could see if the 403 is due to our proxy (which shouldn't be the case, since paying users should not be going through our proxy), or if it is due to the endpoint (eg: dominname.com is blocking your access).

Is it possible for domain.com to block the scheduler but not block the shell? I can run it in the console without any issue. The problem is with the scheduler.

Sure. There are many ways they could be implementing the 403 that would block it from the task server, but not from a console server.

Is there any solution to this? If you work for PA, can you check my scheduled task log file?

I have not been able to make any progress over the last 2 days because of this issue.Maybe some of my settings are off. I was having a timezone issue and tried to fix it by adding a snippet of code here and there. But I have removed all of those codes. Could that cause any issue?

I doubt it could be the timezone, but it's not impossible. As for your log files, they just confirm that you're getting a 403. Have you tried getting the same URL using a different method in your scheduled task (like directly using requests or curl)?

Not very familiar with the schedule. How can I put the code itself in the scheduler?

can you rewrite this code differently for the scheduler?

import pandas as pd

readthis = read_html('http://domainename.com')

So I would suggest using requests to send a request to that domain, and printing out the response content to see what it says. (eg: it might have body that is more descriptive than just the 403 status code)

Do you always use the same server for the cheduler? If not, it is possible that one server is blocked and the other is not.

Now it is working. I have not changed anything. That is troubling!

How can I fix this issue if it ever comes up again?

You could catch and handle the exception, and maybe try again when it fails.

Will try that.

Thanks, Glenn!