Gigabot Gigablast Downtime

September 29, 2005 by Tom M
Filed under: General 

All the downtime of this server appears to have been caused by Gigablast’s search crawler hammering my site. I’ve now blocked the IP range it’s using (64.62.*.*) by adding the following line to my .htaccess file - it was hitting pages at a phenominal rate, from a large range of source IP addresses…

deny from 64.62.

Yes, that’s denying access to all traffic which originated from any IP address starting 64.62 - it’s not a typo!

I advise eveyone else to do the same, and since doing so, the server loads have plummetted and the machine seems stable again.

It still is trying to hammer the server, but the new htaccess rule is limiting it’s effect on the server loads… as it’s now being blocked at connection rather than causing a php process and several database retrievals… it’s still trying to retrive pages every few seconds, and it’s ignored all changes to robots.txt and the META tags on these pages - which they claim it respects. A week of stress and downtime for an entire shared server and all the sites on it (which includes www.coasterclub.org, www.themeparks.ie, and www.ratscoasters.co.uk amongst others) ,all caused by one little crawler indexing just one site…

I only changed the robots.txt to exclude the “gigabot” once I worked out that was the cause of the huge increase in traffic - however it appears that their robot doesn’t re-read the robots.txt very often (if at all). First symptoms were in my photo gallery, which displays the last viewed images at the bottom of the page - I noticed something was hitting multiple galleries very fast. At the time I assumed someone was downloading my photo galleries - but it was gigabot trawling and retrawling the same pages over and over again.

Gigablast - you should be ashamed of yourselves, this is tantamount to a denial of service attack! This little bug put so much load on my site that it brought the entire shared sever down in a very short time.

If you use Gigablast for your searches, please stop - it’s having a devastating effect on servers all over the net.

Comments

3 Responses to “Gigabot Gigablast Downtime”

  1. James on September 30th, 2005 9:59 am

    There’s a few lessons in here for anybody who’s writing a robot:

    1. Before you release it into the wild, test it to make sure it works as you’d expect.
    2. Make sure your robot sleeps between page retrievals from the same domain to prevent hammering the target’s bandwidth
    3. Make sure your robot leaves contact details - at least an e-mail address you can be contacted at
    4. NEVER leave your robot to run unattended

    Its not hard to write a robot - but some people manage to make it look like it is!

  2. Tom on September 30th, 2005 10:55 am

    This one was running from at least 20 different IP addresses - and was making multiple page requests per second, from each of them.

    This robot did leave it’s “home” (gigablast.com) in it’s identifier tag - but the part of the challenge was keeping the server running long enough whilst it was under attack to actually read the log!

    I’ve now blocked the entire IP range that Gigablast uses - and have done the same on the ECC website - I’d advise anyone else who doesn’t want to go through all this to do the same.

  3. Tom on January 23rd, 2006 8:18 am

    gigablast has reappeared in my server logs, and it looks like they’ve got a new IP range… so I’ve got a new block:

    deny from 66.154.

Feel free to leave a comment...
and oh, if you want a pic to show with your comment, go get a gravatar!