Monday, February 4, 2008

Google Reader API

Problem: To perform social network analysis on blog data you need consistent data over a period of time. Periodically retrieving the content directly from the blog's feed has its limitations because you can only retrieve current blog content. Thus if you decide to begin retrieving content from a specific blog, you have no way at getting at the archived blog content.

Solution: Use the unofficial Google Reader API to retrieved archived feed content. The API was first documented two years ago at Nial Kennedy's blog and its reality was confirmed by several Google employees associated with the project. Little information has been published since as to an official release of the API, but the unofficial API still works great for retrieving archived feed content.

In our research the framework we use for interacting with the API is pyrfeed. The creators or pyrfeed also did some additional documentation on the capabilities of the API. The Google Code site has two downloadable files. The Google Reader stand alone is a simple interface for interacting with the API to perform simple actions such as feed retrieval. The other file, which is the full pyrfeed release, also provides gui and command line interfaces for interacting with the API and automated blog content storage in a mysqlite3 database. An example how to interact with the Google Reader stand alone package can be seen below.
In summary, if you are looking for a simple way to retrieve archived blog content, the Google Reader API and pyrfeed framework are cheap and easy tools for doing so. The blogosphere is at your fingertips.

No comments: