|friends of friends of friends, etc
||[Jan. 29th, 2004|06:43 pm]
I'm sure you've all seen this already, right?|
But it only does 4 degrees at the moment. I'm working on fixing that, and making it even faster. (it's already pretty fast)
I'm playing with pre-calculating friends of friends (friends^2), and friends of friends of friends (friends^3). This includes visible, person accounts only (no community/syn):
$ dev/soc-network.pl evan
Friends : 210
$ dev/soc-network.pl brad
Friends : 160
After the first run (the friends^1 are memcached for all users), computing the two things above only takes 44 seconds, but that's without any caching of friends^2 to compute friends^3, which is the timely part.
People with less friends take less time (5.6 seconds)
$ dev/soc-network.pl daveman692
Friends : 26
Friend-ofs^2 and ^3 are equally easy to calculate, but it was boring while I was just tinkering, so I haven't done it yet.
Anyway, the plan is to cache these sets, so six degrees actually works, and does so with a max of two DB calls. (an async process will update the sets)
The caching plan is:
friends^1: only in memcache. if miss, requery the DB. (which involves then loading each friend from memcache/db and filtering out the non-people. but we already have APIs for this, and the existing six-degree code uses it.... not a problem)
friends^2: memcache for short duration, but also stored as a blob in the database (4 byte userids packed) so that's like 34k-100k. database version will be updated as needed by an async job which we can run on lots of web nodes. (which are a lot more idle at night and in need of work)
friends^3: only stored in db. not memcached. for evan, this is 605 KB to put in the database, and his friend-ofs are probably equally or a bit larger. so ~1.0-1.5 MB to store all that. but that seems acceptable. 2 million users * 1.5 MB = few
GB. not bad at all TB, er, so maybe we won't back this in the database. just memcache. should still work well.
It'd probably make sense to cap the f^3 set sizes, but I'm not sure where..... maybe 250,000 users? (double evan)
So once it's quick to find a user's f^2/f^2/fo^2/fo^3 sets, a lot of cool shit is possible.