Forums

website is redirected from unknown domain name

My we site name is www.tvseriesonline.in. Today i got message from google that my website is redirected by unknown website. for example if i am putting domain name http://www.lastdaysofmanonearth.com in browser it showing the exact content of http://www.tvseriesonline.in/ which is weird and i am pretty much sure that i do not own the domain www.lastdaysofmanonearth.com. I believe because of which my actual traffic is way lower then usual. Below are few other domain name which is showing the exact content. http://pokerjitu.com/ hydroxel.com

Please help me out why its happening like this and any resolution for this.

That must be really annoying! I've taken a look at the sites that are copying your content, they're using CloudFlare to mask where their site is. That means there are two possibilities that come to mind (not an exhaustive list, just the things I can think of):

  • They've set up their domain using CloudFlare so that when they receive a request, it's sent to CloudFlare, and CloudFlare are querying your site and presenting the results.
  • They've set up their domain on some other web host, and are scraping your site for content and then serving the content from their own server via CloudFlare.

Now, CloudFlare are a respectable company and probably won't want to be associated with content theft like this. So if you contact them -- there will be some way to do that on their site -- then they may well be able to help.

Can we restrict it from pythonanywhere. I believe its not just web-scrapping because the site is exactly the same the layout, css style, even the site has exact same content which is coming from backend python script. Does Cloudflare can copy this level of content. From server log i found one particular suspicious IP 138.201.19.77 which is frequently accessing the site. I tried to block it by using below code but the attempt was unsuccessfull.

@app.before_request
def limit_remote_addr():
    block_ip =[]
    s= open("/home/atish9937/.virtualenvs/myvirenv1/webscraping/block_ip.txt","r") #Contain the mentioned IP
    for i in s.readlines():
        i=i.strip()
        block_ip.append(i)
    s.close()
    if request.remote_addr in block_ip:
        abort(403)  # Forbidden

Is it feasible from pythonanywhere to block the IP or any suggestion how i can block the IP

[edit by admin: formatting]

Try using request.headers['X-Real-IP'] instead of request.remote_addr -- there's more information on this help page.

Thank you I modified the code and its giving 403 error for the IP in server log. But still lot of request is coming from same IP which is killing the traffic i guess. I will try change the domain name will see if that works.

Maybe you could speed things up by only reading the blocked IP list once at web app startup? Of course, that would mean that you'd need to reload it every time you added an IP -- but it would be more efficient, so you could reject hits with less of an impact on app performance.

I am sorry I could not understand the first line. Could you please explain how can i achieve that.

In your limit_remote_addr function, you're reading and parsing your block file for every request. Instead, read the file at import time so that it only gets read once by each web worker. The easiest way is probably to put the list into a global variable and use that in the function.