Brad Fitzpatrick (bradfitz) wrote in lj_dev,
Brad Fitzpatrick


I've been running the script which modifies the database for clusters on a copy of the real LJ data. I knew it'd take a long time, but it's painful actually watching it run. I've been making it faster, but it's still going to take hours to convert.

Which means when we're ready to run it, there's going to be a lot of downtime.

We need two versions of the code very soon here:

Old code/schema + Read-only aware
Any place we write to the database, check first if $LJ::READONLY is set and complain, or just don't make available the option to write anything in the first place. I figure instead of a "Site down" message for a few/many hours, we can make available a read-only copy of LJ.

New code/schema
Once the db conversion script is done, we replace all the old code with the new code which knows about the new schema, then sit back and relax. This new code could optionally also support $LJ::READONLY, though that won't be as critical. Yet another advantage of clustering the databases is that any necessary ALTER TABLEs in the future are fast: instead of doing one operation on n units of data, you do the operation on s clusters in parallel, each with n/s units of data. If we keep each cluster small (as we will), alters are relatively painless.

I'm putting cluster patches at:

the first one did most the work in perl-space. clusterfun.2 is faster, doing more on the db side, but mysql or DBI drops the connection in the middle of the 17GB logtext.MYD alter, so I'm switching that back to be more perl-ish, doing it more in chunks.

If you're interested in working on either the $LJ::READONLY stuff or converting areas of the site to the new DB schema, let me know.

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded