Brad Fitzpatrick (bradfitz) wrote in lj_dev,
Brad Fitzpatrick
bradfitz
lj_dev

friends view cookie caching

The other day evan linked to a post about continuations for modeling web applications and it got me thinking how nice it could be to improve the slower parts of the site, like friends view generation. It'd be awesome to think of the friends view page as a generator which only generates things that are new.

How often do you reload your friends page and get no or just a few new entries, out of 25-50? I suppose I could write some tool to figure this out from the logs (though it might be difficult) but instead I'll just suppose it's the case that most people reload way too often. I know I do. All day.

So, how to make it faster? We've tried various forms of server-side caching, but that just turns what would otherwise be lots of reads into less reads followed by a slow write. We could cache to memory instead, but then if the memory cache goes down, we lose up to 2 weeks of data (the age of a friends view, by default).

Unfortunately, Perl can't do continuations, and server-side caching doesn't work... but it got me thinking about the root problem. How do we make a generator without more server-side IO? Let's store the state of the friends view algorithm (the "Not-A-Continuation") in a client-side cookie. No extra disk I/O for us. Little more network I/O in and out, but negligible. When we get a subsequent friends page request (for a logged-in user on their own friends page) we can start the construction algorithm where we left off the previous time, saving tons of db queries, while still returning the full set to the user (including the duplicate entries they've already seen).

I started thinking about the format of the cookie and added some things from the fv_caching code already in the codebase from the server-side caching experiment. (buggy)

Cookie format, with number of bytes:

1: version (if different from server's version, ignore)
4: userid of journal
4: time (if less than n minutes, no re-check)
2: number of db clusters seen last time
foreach cluster {
.. 2: clusterid
.. 4: last logtime seen on that cluster
}+
2: number of entries in client-side cache
foreach entry {
.. 4: userid of friend
.. 4: jitemid (3) & anum (1)
}

Encoded in hex for the cookie, that's 2 bytes for each byte above. Assuming, say, 6 clusters and a 100 limit entry cache, that's:

2 * (1 + 4 + 4 + 2 + 6 * (2 + 4) + 2 + 100 * (4 +4)) = 1.7k

Totally within the constraints of a 4k max cookie.

We'd set the cookie to expire in 2 weeks, since that's the max age an item can exist on a friends view.

We send a new cookie every time a default friends view is loaded to a logged in user.

If the cookie is sent and well-formed, we use it to start the existing algorithm mid-way through.

Entries are still validated for security before being returned.

The cluster last logtime field is there to counter against possible database replication lag.

Any thoughts or comments before I start implementing?

(I'll probably fix the ?skip= URL problem at the same time, changing them all to ?before=date, since we now have the metadata to do that efficiently.)
Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 40 comments