Activity log
The first thing to do is to check the activity log of your installation. Are you seeing real search traffic? I.e. a random assortment of terms being searched for, from lots of different IP addresses? Or is there a pattern to it? Look for the same terms being searched for again and again, or for searches coming from identical or similar IP addresses.
If you don't see a pattern, then you are indeed dealing with real search traffic, and you have to increase server capacity or look for different solutions for handling search on your site. But if you are seeing a pattern, you are probably dealing with search engine spiders crawling all over your site.
What is happening?
Most likely you are publishing blog entries that have lots of different tags associated with them. Each tag will be visible below the published entry as a link. A few hundred entries can easily mean a few thousand tag links on your site. Each one of these links points to the search results page for that tag...
Now imagine Google or Bing crawling your site: thousands of juicy links waiting to be visited... And each one of these links puts load on your server when it is followed... ouch! And the worst part is, the search results pages that are being indexed probably will have changed by the time the first search engine traffic starts arriving on them, so visitors will be frustrated because they are not even finding what they were looking for. Double ouch!
So now what?
Ideally you want search engines indexing your article pages and maybe your archives. They really have no business indexing search results pages. You can try setting a robots.txt file, warning the search engine spiders to stay away from the mt-search.cgi script or even the entire cgi-bin/mt folder.
If that doesn't help, simply look up the IP addresses that make the most requests in your activity log, then add lines like these to your apache configuration:
The first thing to do is to check the activity log of your installation. Are you seeing real search traffic? I.e. a random assortment of terms being searched for, from lots of different IP addresses? Or is there a pattern to it? Look for the same terms being searched for again and again, or for searches coming from identical or similar IP addresses.
If you don't see a pattern, then you are indeed dealing with real search traffic, and you have to increase server capacity or look for different solutions for handling search on your site. But if you are seeing a pattern, you are probably dealing with search engine spiders crawling all over your site.
What is happening?
Most likely you are publishing blog entries that have lots of different tags associated with them. Each tag will be visible below the published entry as a link. A few hundred entries can easily mean a few thousand tag links on your site. Each one of these links points to the search results page for that tag...
Now imagine Google or Bing crawling your site: thousands of juicy links waiting to be visited... And each one of these links puts load on your server when it is followed... ouch! And the worst part is, the search results pages that are being indexed probably will have changed by the time the first search engine traffic starts arriving on them, so visitors will be frustrated because they are not even finding what they were looking for. Double ouch!
So now what?
Ideally you want search engines indexing your article pages and maybe your archives. They really have no business indexing search results pages. You can try setting a robots.txt file, warning the search engine spiders to stay away from the mt-search.cgi script or even the entire cgi-bin/mt folder.
If that doesn't help, simply look up the IP addresses that make the most requests in your activity log, then add lines like these to your apache configuration:
<Directory /usr/lib/cgi-bin/>This will just block the search engine spiders from your entire cgi-bin folder, leading to an immediate drop in server load.
Deny from 1.2.3.4
...
</Directory>


hey, thanks for addressing this. we had major problems with mt-search.cgi load -- in fact, it's still not working, as we had to remove it for crashing our (shared) server. I'll see if I can work around the robots.txt and give it a try.
and thanks for the blog! rare place covering the MT world :)
An improvement on using the IP addresses, which may change, or end up being a very long difficult to manage list, is to use the referrer information in requests. You could do this (see below) in the appropriate config file. It's proved extremely effective in my case.