Brad Fitzpatrick (bradfitz) wrote in lj_dev,
Brad Fitzpatrick

Conversion: Take 3


Plan 1: I knew conversion of the data would take awhile, but didn't know how long. I wrote the conversion code and ran it on my test machine. A few seconds and it was done. Ran it on a copy of the real database.... whoa, this is going to take weeks.

Plan 2: Did a billion things to make the conversion quicker. Kept working on this for about a week, each day making it faster. Timed it on a fraction of the real database. A full conversion without even the logtext/talktext tables would take several days. I figured we'd do the *text tables on demand, as previously mentioned. Even with the read-only operation of livejournal running while the database was converting, people would get angry after 76 hours of read-only downtime.

Third time's a charm: We convert nothing immediately. I'll have to alter user (a few minutes), memorable (a few seconds), and topic_map (a few milliseconds). A new column in user says what cluster they're on (this was already planned). But instead of cluster "0" meaning "the first cluster", it'll now mean "no cluster... the old tables". The tables that clustered data store into will be called log2, logtext2, talk2, talktext2, talkprop2, etc. Those can coexist in the same database as the old tables.

In where we define clusters, cluster 1 will be on the same host for now, and we'll convert a user at a time. Here's how that'll happen: a new capability class named "in conversion" is allocated (bit 6 is free?), with a class capability of "readonly = 1". All the code should check if that user's readonly cap is set. If so, don't allow writes, because some other process is the middle of moving the user's stuff all around. At the end of that user's move, the clusterid is changed, and then the capability class bit is turned off, and then all the old stuff is deleted from the previous cluster (or cluster 0, as in the case of a conversion from non-cluster).

  • No global downtime. Each user will have a bit of downtime (proportional to their amount of data), but everything else still works during that time.
  • No code fork. We won't need the pre-cluster code and post-cluster code. (I don't need to learn more CVS crap! hooray!)
  • No double copying. Before we would've needed to 1) convert everything, and then 2) copy everything to their destination cluster. Now we can do the conversion and copying to destination cluster at the same time.
Questions / Objections / Brilliant foresight / Fellatio offers?

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded