As complex as Google's PageRank may be, search experts at Yahoo seem to think it's not complex enough. Based on patent filings, Yahoo is dabbling in ranking algorithms that incorporate more user behavior data in advance of the company's next run at toppling Google's haloed relevance.
Seeing will be believing when it happens, of course, as Google is highly secretive about how its search engine calculates PageRank. If history is any indication, they're already way ahead on behavioral factoring.
Nonetheless, Yahoo can afford the best search engineers in the business (if they can get them before Google does, anyway) and the patent filings shed some light on how PageRank is currently calculated and ways it might be improved in the future.
Bill Slawski, Director of Search Marketing at KeyRelevance, goes into painstaking detail of Yahoo's user data challenges at his SEObytheSea blog. Patent language, especially when dealing with algorithms, can be confusing and dense, so we'll just highlight a few interesting points and leave the lexicographical deciphering to you.
Some Yahoo assumptions about PageRank and flaws associated:
- Internal and external links are often weighed equally even though internal links can be less reliable and more self-promotional. Some links, like disclaimer links, are rarely followed.
- PageRank ignores that webpages are often purchased and repurposed, decay or become less valuable over time at variable rates.
- Current calculations, like TrustRank, are engineered more to combat webspam than to reflect actual user behavior.
- Sometimes PageRank deals with links in bulk, aggregating according host or domain, also known as blocked PageRank.
What Yahoo plans to do about it:
- Measure link weight – influenced by the frequency with which users follow a link
- Note when links are ignored and users leave (teleport) to another page of their choosing
- Calculate the probability that a user stops and reads a webpage rather than views it and moves on.
- Incorporate user data into the algorithm – "User Sensitive PageRank could reflect "the navigational behavior of the user population with regard to documents, pages, sites, and domains visited, and links selected."
- Personalize PageRank based on demographic information – age, gender, income, user location)
- Emphasize recent information
- Weigh anchor text more heavily – the patent filing calls anchor text "one of the most useful features used in ranking retrieved Web search results"