Greg Connor (gconnor) wrote in lj_dev,
Greg Connor

LJ as an anti-spam engine... (Topic system, high volumes, pgp question)

Hello all,

I am considering livejournal for use as a spam tracking tool. The idea is that users would post email spam that they received, and it could be searched/sorted by IP, domain name, and a few other key data points.

I have downloaded and installed livejournal from the tar (2003-08 or so) and it seems to be running OK, without much trouble really, which is cool. I am using perl-5.8 instead of the recommended perl-5.6 but so far there are only a couple minor issues.

Regarding Topics, I would really like to use this feature to create a "library" of documents sorted into categories. But, it doesn't seem to work right out of the box and the docs are a bit bare on that feature. I will need to decide whether to jump in and try to fix it or to implement my own thing from scratch. Any pointers to more docs or other information would be helpful.

Regarding large volumes of messages, I would like the system to be able to process about a million "spam reports" every 30 days. The messages would drop off after a time and be replaced daily with new ones. I thought about creating just a regular community and having every report be a new post, but I'm concerned that the high churn rate would hurt performance, or that deleting the old ones wouldn't reclaim all the space. The alternative is to track the "reports" in their own separate table, and if anyone wants to comment on a report, their comment could be a post. Any feedback on this is appreciated.

Finally, regarding signatures... we might export the data to other servers (all of which may have limited trust for each other) so I would like to have each data item signed by its author using PGP or similar. Is there any built-in support for PGP that I could leverage? Ideally, users would upload their public key, and then when posting a message they would be prompted to sign the message, and the server would check the signature.

Thanks for any feedback. I realize it's going to be a lot of work, and I think I'm ready. I'm not fishing for help, but mostly just looking for docs and info so that I don't have to reinvent the wheel (much).

Thanks. gregc

