Google digitalize the world with Google News Archive

Wednesday, 10 September 2008

Earlier this week, Google has officially announced on their blog the beginning of a new initiative aiming at digitalizing old newspapers and make their content searchable through the Google News Archive program.

As mentioned on their blog, more than 200 years of quality content has been distributed in a print format, which is not available online. This represents billion of print pages with content that is currently not available. The amount of content on the internet is getting bigger every year but Google does not seem to want to wait for someone to put it online so they have taken the step forward and decided to do it themselves.
As part of this program, Google is partnering with newspapers publishers to get access to their archive, "digitalize" (scan) the content and make it indexable and searchable with OCR. All this at no cost!

Such initiative is great for newspaper publishers that want to increase the reach of their content and monetize it! Each piece of content, will be able to rank in Google News Archive. Each page of content will be served contextual ads as part of the Adsense program, and each click will bring revenue to the publisher! However the downfall is that they will lose the "hosting" of the content. Google do all the work for free, but do host all this content on their server and therefore do get all the traffic and part of the advertising revenue (fair enough!).

However, hypothetical issues could be search ranking cannibalization and duplicate content. If a publisher decide to publish online, one of its old article already in the Google News archive, who will be penalised for duplicate content? or will it be penalised at all?
Moreover, Google News Archive pages are highly likely to rank high for the keywords included in the content of the digitalized articles, which could at some point "compete" with existing content on the publisher website.

In summary, Google News Archive is great for the publishers as they can publish and monetize old content at no cost. It is also great for Google because they will be able to exclusively serve advertising to a very large range of highly searched content. The entire initiative still raise a few questions, which I am looking forward to get the answers!


Anonymous said...

This will be a big break through if Google successed to do this.