Hi there, this site was so unbeliveable cool I just had to sign up for an account and then try to find some tasks that justifies my (work) time spent here...
Actually, I was looking for a quick way of logging response times (and down times) for some new web mapping services our agency has established. Without going into much detail let's just say our IT department doesn't exactly embrace neither open source nor free data service policies. Our WMS geoserver is firmly locked behind a firewall (which is good) and then exposed through some XML gateway (which I'm highly sceptical about, both in terms of overhead (response time) as well as stability). So, I want to log the uptime, response codes (such as any occurence of 401 unauthorized) and response times from somewhere outside our corporate network. Pythonanywhere seems like a cool place to do that,
I've been reading and tinkering, and through these forums I've learned that the "requests" module is there to save me from the utter hell of using urllib2. Cool.
Anyway, I'm a puzzled about why r = requests.get( "pythonanywhere.com/terms" ) returns a 501 (service unavailable) http response code. Urllib2, wget and curl has no problems retrieving that. Changing the URL to somewhere else on the whitelist (such as wikipedia.org) produces the opposite effekt: Both curl and wget produce a "403 (forbidden)" response from the proxy server. Urllib also fails, I guess for the same reason (I have not bothered investigating the arcane API of urllib2 to get the actual response codes). However, the requests module retrieves www.wikipedia.com with shiny colours....
I guess this has something to do with proxy setup and behaviour for non-paying clients. While I scrutinize my backup harddisk in my attic for my paypal details (It's been a while and a computer change since last time I used paypal...) it would be nice if someone could shed a light on this.
Code example:
import requests
import urllib2
import subprocess
mylink = 'https://www.pythonanywhere.com/terms/'
# mylink = 'http://wikipedia.org/'
r = requests.get( mylink )
print 'REQUESTS HTTP response code for the url ', mylink, ' => ', r.status_code
# Why does URLLIB2 FAIL if it encounters a http 401 or similar????
try:
response = urllib2.urlopen(mylink)
print 'URLLIB2 HTTP Response code for the url ', mylink, response.getcode()
except:
print 'Could not retrieve url ', mylink, ' using URLLIB2'
print 'Testing using wget..'
subprocess.call(["wget", '-O', 'wgetresult', '-S', mylink])
print 'Testing using curl...'
subprocess.call(["curl", '--head', mylink])