Google Caffeine: Towards a Fresher Google Index

Wednesday, 4 August 2010

Back in December 2009, we wrote a post on the issues that Google faced with real-time search. At this time the integraton of real-time search and especially Twitter updates in Google's search engine result pages showed some weaknesses.

Since then, things have changed and Google has reviewed its way of indexing documents and build a completely new search index called Google Caffeine. On July 8th, Google has officially announced and explained the Google Caffeine.

What is new with Caffeine


The orginal Google search index was built some time ago, when the Internet was much smaller and very different. In the last couple of years and even months, the content on the internet has increased dramatically and the need for instantaneous and fresh information has put a lot of pressure on the Google search index resulting in large delays to display the latest information.

The 'old' Google search index was build around layers. Each layer was updated at different rates (some layers were updated more frequently than others for example for Google News), but the main layer or main index required to crawl a very large amount of web pages to be fully updated. This process could take up to 30 days to integrate newly created pages in its index.

Google Caffeine Search Index Schema

Although Google does not give a lot of information on what it is radiacally different, Caffeine appears to have higher crawling capabilities. The crawling process has also been reviewed. Rather than crawling the entire web at once, Caffeine crawls smaller portions and update the index on constant basis, which would improve the freshness of information.

At this stage, it is still early days to see the difference and measure the real effect of the changes.

Update:
After having published a few posts since the Google Caffeine release, I have noticed that most of my posts were indexed in the next 30 mins after the post has been published. Not sure if it is because we use Blogger, but it seems to be working quite well so far!

2 Comments:

Clinics Of World said...

Google and other search engines should do more to update things quickly, not just with Caffeine, but with their regular search as well.
When I search for stuff on SEO, I would like to see the freshest content, rather than something that's 2 years old and hold no value today.

Alban G. said...

Hi,

Thanks for your comment. I do agree it is quite frustrating to see outdated content in SERPs.
From experience, all the new posts I published were indexed in the next 30 mins after the post was published, so I must say there is a significant improvement.

To get fresher content in your SERPs, you might want to try the left hand side Google functionality, which lets you sort the content by date (i.e: Last Week etc.)