Chris Schmidt (crschmidt) wrote in lj_dev,
Chris Schmidt

Memcaching Friends List

Currently, userinfo is largely unmemcached. Specifically, the friends list functions are totally unmemcached, and this is one of the things preventing further work on bug 1321, supporting FOAF.

This query was worked on a lot to make it not overload for communities with huge friends lists - similar work was also done in supporting the ability to turn off the friends list for certain communities like "news" and "paidmembers". So, these journals were causing a large drain on the database, thus the mess of code to prevent these functions from overloading.

First question, which has to be answered by someone who has experience in how costly these queries actually are, is whether if this query is memcached, we can drop some of these limits.

Currently, there are 6 possible SELECTS from the friends table involved in creating the userinfo page. Three of thse 6 selects are counts, and only occur when the number of friends or friends of is over a certain value. Although these functions should probably be APIified, I'm not sure that memcaching them is crucial, and since these counts are never displayed in FOAF data, they aren't at all neccesary for FOAF output.

Otherwise, there are three queries. One is the query that determines who on this userinfo page you have friended. This list is used for bolding when generating the friends lists:

my $sth = $dbr->prepare("SELECT friendid FROM friends WHERE userid=? LIMIT $MAX_FRIENDS_SHOW");

Another is the query that loads the actual friends:

my $sth = $dbr->prepare("SELECT u.userid, u.user, u.journaltype, u.statusvis ". 
              "FROM friends f, user u WHERE u.userid=f.friendid AND f.userid=? ".
              "LIMIT $limit");

The last SELECT gets the friend ofs:

my $sth = $dbr->prepare("SELECT u.userid, u.user, u.journaltype, u.statusvis ".
              "FROM friends f, user u WHERE u.userid=f.friendid AND f.userid=? ".
              "LIMIT $limit");

Memcaching the first query is going to save a lot for pretty much everyone: assuming who you friend doesn't change often, this data can be stored for a long time (with a memcache_delete called from any places where you can edit friends: protocol and web interface come to mind), and currently it's reloaded every time you hit a userinfo page. Even with fast, indexed queries, this has to be sucking some kind of database power. It is a simple query, but it's commonly called, so caching it should help speed things up.

The second and third queries are more complex, and require more work on the database. A match has to be made between two fields, which is obviously more complex than just a simple WHERE clause. This means that caching them has a more obvious affect per query. This data can be cached based on f.userid, and store all the information selected (userid, username, journaltype, statusvis). I'm not sure exactly how this data is stored, mostly because my knowledge of how memcache works is decidedly limited.

The first bit is easy.

New function in ljlib: "load_remote_friendid".

sub load_remote_friendid($userid, $max) {
  my $memkey = [$userid, "remotefriends:$userid"];
  my $remote_friendid = LJ::MemCache::get($memkey);
  unless (defined $remote_friendid) {
    my $dbr = LJ::get_db_reader();
    my $sth = $dbr->prepare("SELECT friendid FROM friends WHERE userid=? LIMIT $max");
    $remote_friendid{$_} = 1 while $_ = $sth->fetchrow_array;
    LJ::MemCache::set($memkey, $remote_friendid);
  return $remote_friendid;

Then, calls to load this data in userinfo.bml (lines 241-243) change to:

} else { 
    $remote_friendid = LJ::load_remote_friendid($remote->{'userid'}, $MAX_FRIENDS_SHOW); 

Now, the code to drop the memkey (LJ::Memcache::delete([$userid,"remotefriends:$userid"]);) has to be added to the protocol. It doesn't need to be anywhere else, because the code in /friends/edit.bml and friends/add.bml (ew, capitalized html tags), simply calls the protocol based editfriends routines. Because it's such a lightweight query, dropping the memkey is better than updating it, as far as I can understand it.

However, this kind of select leaves the memecache data to be rather limited - in this state, it can only be used on userinfo, because it needs the $MAX. In order to use it in other places, it might be better to change how it works slightly. However, it appears that the protocol gets different data, as do the friends adding and editing pages, so perhaps this key is only useful on the userinfo page, meaning that argument is incorrect.

This is getting kind of long, and my laptop battery has said "1%" for the past 25 minutes, so I'm going to wrap it up here. Basically, I'm examining userinfo, looking for ways to helps speed it up, and wondering if my thoughts on how this would work make any sense. If I've made a mistake in my code or in understanding how memcache works, or anything like that, please feel free to let me know. If this looks right, I'll be happy to write up a patch doing exactly as I described and then doing the same for the other two queries on userinfo.bml.

Thanks in advance for any help or thoughts.
  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded