December 15th, 2003

  • novasoy

introducing myself

Hello. I am a web developer for a sewer company in Kentucky. I've been programming using Microsoft and Oracle tools for about five years now. In the new year, I want to learn to program using open tools and methods, and to that end, I wanted to lurk around this community and see how this project evolves.

I can help around with beta testing, if that's needed. I've added lj_test to my friends list, so I'll look for testing opportunities there.

See you around!

XML API and security

To get a user's friends list you must log in with that user's password. However, that restriction isn't completely general: I can get the same information much of the time on a user's info page.

Obviously there are good reasons to allow a user to hide their friends, but if they're permitting it, why not make it available through the API?

shrinking the user table

Over LiveJournal's lifetime the user table keeps shrinking, little by little. I propose we finish it off and get it down to its bare minimum.

Currently it's 226 bytes wide, with the following columns:

`userid` int(10) unsigned NOT NULL auto_increment,
`user` char(15) default NULL,
`caps` smallint(5) unsigned NOT NULL default '0',
`clusterid` tinyint(3) unsigned NOT NULL default '0',
`dversion` tinyint(3) unsigned NOT NULL default '0',
`email` char(50) default NULL,
`password` char(30) default NULL,
`status` char(1) NOT NULL default 'N',
`statusvis` char(1) NOT NULL default 'V',
`statusvisdate` datetime default NULL,
`name` char(80) NOT NULL default '',
`bdate` date default NULL,
`themeid` int(11) NOT NULL default '1',
`moodthemeid` int(10) unsigned NOT NULL default '1',
`opt_forcemoodtheme` enum('Y','N') NOT NULL default 'N',
`allow_infoshow` char(1) NOT NULL default 'Y',
`allow_contactshow` char(1) NOT NULL default 'Y',
`allow_getljnews` char(1) NOT NULL default 'N',
`opt_showtalklinks` char(1) NOT NULL default 'Y',
`opt_whocanreply` enum('all','reg','friends') NOT NULL default 'all',
`opt_gettalkemail` char(1) NOT NULL default 'Y',
`opt_htmlemail` enum('Y','N') NOT NULL default 'Y',
`opt_mangleemail` char(1) NOT NULL default 'N',
`useoverrides` char(1) NOT NULL default 'N',
`defaultpicid` int(10) unsigned default NULL,
`has_bio` enum('Y','N') NOT NULL default 'N',
`txtmsg_status` enum('none','on','off') NOT NULL default 'none',
`is_system` enum('Y','N') NOT NULL default 'N',
`journaltype` char(1) NOT NULL default 'P',
`lang` char(2) NOT NULL default 'EN',
`oldenc` tinyint(4) NOT NULL default '0',
PRIMARY KEY (`userid`),
UNIQUE KEY `user` (`user`),
KEY `email` (`email`),
KEY `status` (`status`),
KEY `statusvis` (`statusvis`),
KEY `idxcluster` (`clusterid`),
KEY `idxversion` (`dversion`)

I'm thinking that all it needs to be is:

clusterid tinyint -> smallint (256 clusters too small, when doing 10/cluster)
caps smallint -> int (16 classes looking limiting)

This is 14 bytes per user. (on the global cluster)

The clusterid would get bigger, since is hitting the 256 cluster limit soon. (We have 10 clusterids per physical database cluster....) Also, caps would grow from 16 bit capability classes to 32.

Now, where does everything else go? Most goes to userproplite2, which is clustered.

Currently is seeing an extreme amount of writes to the user table, combined with an extreme number of reads. Now, the reads are tolerable because they cache well, but the writes are hurting performance and they're just annoying, for things like user.defaultpicid, which should be clustered.

Things which wouldn't be going to userproplite2:

-- user: this would go into a new table that maps a domainid+user to a userid. domainid=0 would be the local domain (, but future domains could be other LJ sites, or separate namespaces within LJ. or even multiple usernames in one domain pointing at the same userid. (think adding to your friends list, without him getting a dorky dj_bob username.... or adding RSS feeds without them being lj-username-itized.... the could just not have a username, and their unique ID is their userid only, and for display we use the RSS feed name or something)

-- statusvisdate: we only care about this in the case of 30 day expirations for deleted accounts, so we'll move it to its own table, and the row is only there when the journal is in deleted state. and we could index it too, then.

-- email: this would probably go in userprop (global, indexed) and not userproplite2 (clustered, not indexed) so users could still do by-email searches in the multisearch.bml form.


Feedster searching

Most of feedster's stored RSS data is stored using the /users/user/rss URL, while LiveJournal is currently searching /users/user/data/rss. The result is that feedster searching basically doesn't pull up anything useful, instead ignoring the URL and pulling up random information on any feed.

Should this be changed, or should LiveJournal just live with broken searching until feedsters databases become more useful?

Example Results:


Edit: After I made this post, Scott from Feedster IMed me and we hashed this out. He let me know that newer results were using the canonical URL, but older results still had the non /data/ URL. He's currently running something to update them. According to him, the changes should be all fixed up within the next day or so. (Thanks, Scott!)

Result: No need to update the code.