Bandwidth Problems? Search Engine Spiders May Be the Culprit
Recently, we’ve had a couple of clients come to us with a big problem: their websites were suddenly spiking in server load (also called bandwidth and CPU seconds), causing site slowdowns. In some cases, their hosting companies were even taking their sites down completely until they either remedied the issue or coughed up significantly more money for hosting packages (we’re talking from $12 a month to $80!).
A spike in server traffic might not be a bad thing, if it were accompanied by a spike in page views. That would mean that your website went viral, or you launched a new product, and lots of people were paying attention. But for these clients, there was no corresponding spike in page views. They were being dinged for high bandwidth usage without getting the benefits of more traffic, more website visitors, and more paying customers.
So what was actually happening? One of our project managers and website gurus, Cyndi Fleming-Alton, did some sleuthing and was able to find the problem (as well as how to fix it). You probably know that major search engines like Google and Bing have automated processes – often called “spiders”, “crawlers”, or “bots” – to help them essentially take inventory of the internet.
These spiders check websites constantly, thousands of times a day, to ensure that the search engine always has up-to-date content to provide in their search results. That’s incredibly valuable if you’re CNN.com or FoxNews.com and need to have breaking news updates appearing in search results pretty much instantly. But if you’re a typical small or medium business owner who is only updating your site once a day or so, it may well be overkill.
There are a couple of ways to get around this issue. Depending on your goals and the nature of your website, you may need to use one or more of these options to help solve your bandwidth problems.
- Manually adjust Google’s crawl rate
Google checks all the content on your website with the adorably-named “Googlebot”. But while they have a lot of algorithms that are designed to determine the optimal rate to crawl your site without overwhelming your server, they don’t always get it perfectly right. If you’re finding that you’re getting tons of server traffic from Google crawling your site, it’s time to set a limit. Google’s Webmaster Tools allow you to set a maximum crawl rate for your website. Note: these limits are only good for 90 days, so if you notice the problem cropping up again in a couple of months, you’ll have to repeat this step.
- Configure your robots.txt file
A robots.txt file is an instruction file that spiders and bots can check before crawling the rest of your website. You can configure your robots.txt to disallow all bots from checking anything on your site at all; to only exclude them from parts of the server; to allow all bots complete access; and to include or disallow specific robots.
- Check your visitor IP addresses
Finally, you may be getting lots and lots of bot visits from scammers and spammers overseas. Even if you have a well-configured plug-in suite that takes care of spam comments, the amount of visits can seriously impact your server load. If you see a whole lot of IP addresses from places that don’t make sense – for example, non-English speaking countries sending a ton of traffic to your English-only website – you may want to try blocking specific IPs or IP ranges.
Feeling a bit overwhelmed by all the bot talk? Don’t worry. If you think the spiders might be running amok on your website, WPBlogsites can help. You can book a call with us right now using our online calendar, or drop us a line via our contact form and we will get in touch within 24 hours to see how we can help.