Brad Fitzpatrick (bradfitz) wrote in lj_dev,
Brad Fitzpatrick

Directory status

In the old directory code (you'll have to go find it in an old tarball), we cache directory seach sub-queries ... if you did a search for 21 year-old females in seattle, it'd do 3 searches, save each part, then join them (in perl... fast, since mysql sucked at it), then show it to the user.

On a subsequent load (paging) it was pretty fast... only had to do the join.

However, the table that stored the query results used to crash MySQL (and still might). Its schema was:

dsid INT (index)
userid INT

The new directory code had each sub-query function return a data structure representing what tables it needed to select from to get its information, then the caller would get all those datastructures and form a massive SQL query that did everything. I thought running this on a slave would be okay.

This second method then cached the final search results (limited to 1000 results).

Its schema was:

dsrid INT (index)
userids BLOB (comma-separated)

And also, as a key there was a md5(sql-query) ... so the SQL would be made everytime, but then it'd see if it knew the results already.

The directory code needs to be re-written to incorporate methods 1 & 2 together.... cache both the search parts and the final join. However, when storing the search parts, don't use the schema I originally used (although ideal) ... join it all together comma-separated and split it in perl land. Philosophy being: web server CPU cheap, db CPU expensive. Also, don't run into db crashing code... making so many damn rows makes MySQL do a ton of work maintaining indexes. So much so that they're very often behind so queries don't benefit anyway, especially given the rate of DELETE FROMs we do (while cleaning the tables up).

So... want to fix? :-)

----- Original Message -----
From: <>
To: <>
Sent: Thursday, July 12, 2001 11:19 PM
Subject: Stuff with the .. er, d thing..

> Hey,
> sigh, so, what did you have to do to the directory before you put it back
> up again? Adding duplock support and what else? I'll hack in the duplock
> support for you, if you want. just have to get it going so people will
> stop harassing us... I'd rather not see a "where's the directory!?"
> comment on an announcement of S2 or similar.
> hell, if it's not too much I'll fix the whole thing, I should be a lot
> busier than I am.
> well, maybe.
> -Alan

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded