Sprote Rsrch. (sprote) wrote in lj_dev,
Sprote Rsrch.

How to efficiently aggregate friend feeds?

Assume an LJ client wants to download and display all friend posts, as if it were a news aggregator. There are two basic ways to do this, and I'm wondering which mechanism is friendliest to the LJ servers (which is why I'm asking here and not on lj_clients.)

(1) Fetch the user's friends page from its regular URL. Directly scraping the HTML doesn't work very well, but I've got a custom style that formats the contents as XML that the client can then parse. Current versions of Journalert use this to determine the author and subject of the latest post, to show in the new-post notification alert.

(2) Iterate over all friends and fetch the RSS or Atom feed for each journal.

In either case the client would as usual be polling the checkfriends command and only performing the above fetches when checkfriends indicated new friend posts.

Method 1 seems more efficient on the client side since it generates only one HTTP request instead of n, where n may be several hundred in the case of Some People We Could Name.

But Method 2 might be more efficient on the server side since generating a newsfeed for a single journal is a lot more efficient than aggregating all the friends' recent posts and running them through the style system. And in general all but one of those queries will return a quick 304 Not-Modified result with no extra data. Right?

Still, it seems like neither of these is exactly optimal. It ought to be possible for the server to keep a cached friends-page feed for each user and only regenerate it when a friend posts. That would result in one page-load per client fetch. With the increasing popularity of RSS news aggregators, and since I'm told that most of the server load comes from people endlessly reloading their friend pages, this seems like it would be a good thing for LJ to implement.

I don't want to re-open the whole can of worms about event-based notification or serving IMAP or whatever -- but this seems like it would be a relatively straightforward thing to do. Whenever a user posts, look up their friend-of list and delete each such user's cached friend-feed. Then when a friend-feed is requested, rebuild the cache if necessary, then send back the data. If I knew any Perl or SQL I'd be offering to do it myself...

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded