Brad Fitzpatrick (bradfitz) wrote in lj_dev,
Brad Fitzpatrick
bradfitz
lj_dev

friends of friends of friends, etc

I'm sure you've all seen this already, right?

http://www.livejournal.com/tools/sixdegrees.bml

But it only does 4 degrees at the moment. I'm working on fixing that, and making it even faster. (it's already pretty fast)

I'm playing with pre-calculating friends of friends (friends^2), and friends of friends of friends (friends^3). This includes visible, person accounts only (no community/syn):

$ dev/soc-network.pl evan
Friends : 210
Friends^2: 8550
Friends^3: 154885
Friend-ofs: 457

$ dev/soc-network.pl brad
Friends : 160
Friends^2: 6361
Friends^3: 135705
Friend-ofs: 626

After the first run (the friends^1 are memcached for all users), computing the two things above only takes 44 seconds, but that's without any caching of friends^2 to compute friends^3, which is the timely part.

People with less friends take less time (5.6 seconds)

$ dev/soc-network.pl daveman692
Friends : 26
Friends^2: 911
Friends^3: 35262
Friend-ofs: 15

Friend-ofs^2 and ^3 are equally easy to calculate, but it was boring while I was just tinkering, so I haven't done it yet.

Anyway, the plan is to cache these sets, so six degrees actually works, and does so with a max of two DB calls. (an async process will update the sets)

The caching plan is:

friends^1: only in memcache. if miss, requery the DB. (which involves then loading each friend from memcache/db and filtering out the non-people. but we already have APIs for this, and the existing six-degree code uses it.... not a problem)

friends^2: memcache for short duration, but also stored as a blob in the database (4 byte userids packed) so that's like 34k-100k. database version will be updated as needed by an async job which we can run on lots of web nodes. (which are a lot more idle at night and in need of work)

friends^3: only stored in db. not memcached. for evan, this is 605 KB to put in the database, and his friend-ofs are probably equally or a bit larger. so ~1.0-1.5 MB to store all that. but that seems acceptable. 2 million users * 1.5 MB = few GB. not bad at all TB, er, so maybe we won't back this in the database. just memcache. should still work well.

It'd probably make sense to cap the f^3 set sizes, but I'm not sure where..... maybe 250,000 users? (double evan)

So once it's quick to find a user's f^2/f^2/fo^2/fo^3 sets, a lot of cool shit is possible.

Discuss.
Subscribe

  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 20 comments