Inside the Google Books Algorithm

Google is famous for the brilliance of its algorithm for searching web pages. While the company looks at dozens of factors in determining which results to display, the heart of the search engine is using links between pages to rank their relevancy. We have come to depend on Google to give us exactly what we want.

But what about when the company has to reach outside the web? The printed volumes represented on Google Books form a completely different kind of problem. Google's famous algorithm can't be deployed to search through books because they don't link to each other in the way that webpages do. There is no perfect BookRank corollary for PageRank.

All of which made me wonder: How does Google Books work? What makes it tick? It turns out that it's actually a great place for the company's engineers to learn how to function in a linkless, physical world.

Article from The Atlantic.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

why not the same algorithm?

re: "Google's famous algorithm can't be deployed to search through books because they don't link to each other in the way that webpages do. "

why not? Google can run their own Internet on their servers and each book can be hosted as text on this ghost web intranet.

then when someone runs a search for a title on the ghost web, Google builds search results from the book texts. I don't see why the digitized books can't exist in an index like any other web page. and when we click on search results, we help make the algorithm smarter. we are the link between the search and the book (IMO).

once something is digitzed, it becomes part of the Internet, somehow, even if it's invisible to the rest of us.

Post your comment below. Now fortified with cuddly kittens!

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote> <img> <b> <strike> <del> <p>
  • Lines and paragraphs break automatically.

More information about formatting options

Syndicate content