In some ways, you never have enough pages in the search index, because every extra page that sneaks in there is a lottery ticket in the search sweepstakes–you’ve got to be in it to win it. So, the more pages you have in the search index, the more chances you have to be found. But clearly there is some amount of pages that seem like you are doing OK and a different amount that seems bad–like, zero would be bad. How do you figure out how many pages you have in the search index and how do you know if that is OK?
First off, you need to understand that there is no single search index–each search engine has its own search index. Google has its own, Bing has its own, and so do many other search engines. So, you need to know which search engines are worth worrying about–in the U.S., it’s Google and Bing.
So how do you find out how many pages are in Google’s index and how many are in Bing’s?
Both Google and Bing have a tool called the “site:” command. You can just enter into each one the word “site:” along with your domain name (Such as “site:biznology.com”). For some sites, this handy command works just fine and you can see how many pages are stored in each index. If your results look right, great. But sometimes the results just look nuts. For example, “site:ibm.com” yields 2.8 million pages on Bing but a crazy 12.2 million pages on Google.
To avoid such inaccuracies, use each search engine’s Webmaster Tools sites. Both Google and Bing will tell your Webmaster exactly how many pages are in the index and will even let you know which pages they are having trouble grabbing. It’s possible that the IBM Webmaster is aware that there actually is a big discrepancy between Google and Bing, which might be just fine or might be something they are working on.
I’ve spoken to a few experts and they have varying theories. One told me that Bing stops crawling when more than 1% of the pages get errors–the Bing Webmaster site will clue you in on this. Another speculated that Bing is only returning counts of pages that get search visits, not every page in their index. No one I spoke with knew for sure why this is happening, but it shows you the importance of checking your numbers.
Likewise, big swings in indexed pages (1,000 pages indexed in Google today vs. 5,000 yesterday) mean that you should look into it. And, in general, an inclusion ratio (pages indexed divided by actual pages) below 70% is something that should give you pause, although with these Bing errors who knows what a good inclusion ration is for Bing right now.
Regardless. knowing how many pages are indexed is the first step to seeing if you have a problem.