Brad Fitzpatrick (bradfitz) wrote in lj_dev,
Brad Fitzpatrick
bradfitz
lj_dev

TODO

Want to get involved and help out? Here's a list of things we need help on.... it is by no means exhaustive. These are just the first things I thought of. We really need help but we only want help from people with experience. If we have to hold your hand the whole way through, we might as well do it ourselves. That said, though, we'll help you out with any livejournal-related stuff, but we won't teach you perl, unix, or SQL... you should have the skillset necessary (sans knowledge of LJ interworkings) to take on one of these projects before you volunteer for it.

So, here's the list....

Maintain the lj_dev todo list -- we have a great todo system but don't use it. We need somebody to keep it up to date, reliably.

Finish the todo system -- lots of the todo system was never implemenented. If you know Perl & SQL, talk to me... I'll explain what else needs to be done.

InnoDB tables -- have experience with using MySQL and InnoDB tables? Cool. And with Debian? Even cooler. We need InnoDB running on the slaves for better concurrency... right now it's shit. This is very important., which should be obvious by the double accents I did on the word 'very'.

More slave usage -- all of the code needs to use the new LJ APIs ... stuff in LJ::*, not the old global crap with a global $dbh. While you're at it, getting rid of connect_db(), instead do:

my $dbs = LJ::get_dbs(); # db handle set
my $dbh = $dbs->{'dbh'}; # writing (master)
my $dbr = $dbs->{'reader'}; # reading (some slave)

And change the code below to use $dbr when possible, instead of $dbh.

The other convention we use is $dbarg, which is often the first parameter in a function. A $dbarg means it can take either a $dbh (which might be a $dbr, if the function does no writing) or a $dbs. The function make_dbs_from_arg (or something, I forget the exact name) then takes a dbarg and returns a set. If the arg was a $dbh, then the master & reader are both $dbh, but if it was a $dbset already, it returns the set.

So often in a function you'll see:

my foo {
my $dbarg = shift;
my $dbs = make_dbset_from_arg($dbarg);;
my $dbh = $dbs->{'dbh'};
my $dbr = $dbs->{'reader'};
...
}

That convention makes it possible to slowly upgrade the code over time to use multiple databases without doing it all at once.

Then, in the rest of foo(), use $dbr for read-only operations and if foo is called with only a master, it'll just end up reading from the master like before... now the job is to go fix all the callers to pass in a $dbs instead of a $dbh so the function actually benefits

Squid -- any squid mastas in the house? We want to put a squid cache in front of the userpic process (that gets userpics from the database). We currently spread these out amongst all the slaves, modulo the userpicid to get better cache hits on each process, but there are so many now that the in-memory cache is pointless. And if we restart the webserver, bye bye cache. So--- time to use squid. We'll have the load balancer send all requests matching "^/userpic/*" to the squid daemon which will then proxy it on to a real process if it's not in the cache. (what the hell am I explaining web caching for?) Anyway, we'll set this up on two machines and have the load balancer always give it to the primary unless the primary goes down. (better cache hits than spreading it over two)

Directory -- know Perl & SQL? The directory needs a'fixin. But it's not all that bad... I have a new design which I'm pretty damn confident will work. And the databases are totally ready for it now too. Dormando was going to do this, but he might be busy. If Dormando doesn't want to do this, somebody else needs to talk to me and I'll explain the new plan.

<LJDEP> -- every file in the tree is getting dependency documentation inline, so we can generate cool flow graphs. this is halkeye's side project. i've made it an informal rule that any patches sent to me must include LJDEP info if it's not already there. more people working on this the better... it's an easy one too and a good way to learn the code.

Auditing -- speaking of learning the code, go read it. find bugs. find stupid things. change old APIs to new APIs. let's start removing the old APIs from ljlib.pl that just wrap the new APIs.

Documentation -- document the new APIs, in a similiar way as LJDEP is done... I'd suggest POD, but I don't like it too much. I want more structure.

effective/actual userids -- we run ljrpc lj:syncsoon which sets a flag on slaves. every minute the slaves check the flag, then rsync from the master. Why can't we just say ljrpc lj:syncnow ? Because ljrpcd runs as root (the lj: prefix makes the ljmaint command run as lj) but when we change userids, we must still have the root userid around somehow, then rsync (over ssh) fails, because it looks up the wrong ssh private key files or something. Somebody with two boxes should set this up and make it work and let me know how. No SQL experience necessary... just Perl & Unix. go read bin/ljrpc[d]

status.livejournal.com -- so, uh... where is it? put it up on a host somewhere and I'll setup DNS to point to it. then give some people FTP access or something. (the point here is to have static file status information outside of our network totally)

Net::FTPServer photo server identity --- we need a new identity module for Net::FTPServer to do LJ photo server work. Again, this was Dormando's job, but I think he has enough on his plate.

ljrpc --now -- ljrpc needs a --now option where it sends the command out immediately, rather than pinging, waiting 5 seconds, and listening to pongs. the fileedit tool should then invoke system("$LJ::HOME/bin/ljrpc", "--now", "lj:syncsoon"); (or better, lj:syncweb ... see above)

Debian packaging -- evan's setting up debian packages for livejournal, both for use by others, and for our own use internally... we should be able to say soon, "apt-get install task-ljcom-webslave" and nothing else and put the thing on the network. This will rule. Evan might need help? If there are pro debian packagers out there, let Evan know.

Installation documentation/tools -- the installation documentation & tools in CVS are pretty good, but one vital step is missing... the bin/upgrading/update-db.pl sets up the db schema, but doesn't populate data. alanj was going to look into fixing this, but I haven't heard back from him. we need to basically ship a livejournal-data-replace.sql like we did before, but we need a tool to autogenerate that for releases, like the old ljcom::bin/release.pl did, but better. And the installation documentation needs to be proof-read and tested.
Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 70 comments
Previous
← Ctrl ← Alt
Next
Ctrl → Alt →
Previous
← Ctrl ← Alt
Next
Ctrl → Alt →