January 11th, 2005

amused, happy
  • mart

Efficient Linking of Weblog/Journal Sites

It would be very cool if certain aspects of the social networks created by the different LJ-based sites could be absorbed together. Right now we have the rather lame solution of adding RSS feeds from other sites, which taken to the logical extreme means that all of the sites end up sharing one namespace, but all of the journals have different names depending on where you are. This is a mess. RSS works okay for pulling in content from outside, but when it comes to other LJ-based sites we could do so much better, and do it much more efficiently. I should be able to add scsi@deadjournal.com to my friends (or "watch"!) list on LiveJournal and have the entries from that journal appear on my friends page. We talked about this a few years back, but the situation has got a lot better since then as other people have done some of the things we would have had to do.

On the surface this doesn't seem too hard: we need some efficient mass-transfer protocol so that the sites can pull updates (create, edit, delete) in batches between each other, and some way to express what journals each site wants. Sending the list of journals over each time would be a bit lame, so each site could instead maintain subscription lists for the other sites it can link with. Of course, this requires co-operation between the different sites, so it's not brilliant. Also, I'm not entirely sure if we have the right data to be able to distinguish the create, edit and delete operations... but syncitems does this for a single journal, right? Since these special accounts won't have any of their own journal views, the entries can be safely deleted once they're too old to appear on a friends view.

By now the Atom folks might well have something we could use for this. Back when we originally discussed it I was pushing for creating our own XML format, but that was before the Atom folks came along and did basically what I was proposing. However, last I was tracking Atom they were still trying to decide on the XML format and not really near any kind of API for pushing content around in an organised fashion.

Obvious caveats: need to be able to comment on other sites with a LiveJournal account and vice-versa (decentralised TypeKey-style auth fixes this), handling security-locked entries securely (can't without an user-trusts-site relationship), knowing how many comments there are on the remote entry (or just make it say "Read Comments" for those)... and do we let (say) DeadJournal users post in LiveJournal communities? Should communities be shared too?

Yeah, this entry doesn't have any concrete stuff in it. I think it's worth working towards a proper design for this, though, as it's something we've talked about forever.

amused, happy
  • mart

Content Distribution Network

While thinking about the problem I talked about in my previous entry, it occured to me that it is quite wasteful for every site to have to talk to every other site. Instead, we can borrow from the USENET model and create a structured distribution network. For example:

A completely hypothetical network layout, of course. The basic principle here is that each node has a set of peers and keeps track of which of those peers are interested in each journal. Subscription control messages as well as entry state changes are passed around the network through these channels, and since the links are created through co-operation between two nodes they can either be persistant sockets or pull-type connections depending on the needs of the two peers. Nodes must also track which journals should be forwarded on to neighbours, to avoid redundant forwarding and ensure that smaller sites don't get overwhelmed with data.

All of the nodes need to know about all nodes which produce content. To avoid nodes tampering with the data as it passes through the data is signed and each content-producing site has its own keypair. Key exchange is the tricky part, as it is the only part of the process where every node must connect to every other node directly so that everyone has everyone else's keys.

As you can imagine, this is a closed network as it requires co-operation between nodes. This is much like USENET, but the network will be a lot smaller. The obvious question is "What's in it for the sites?", and that is a good question. Big sites benefit from reciprocal links because they are trading valuable content, but bigger players have no real reason to let the little players in. As distasteful as it may seem, someone has to pay for these things, and so the worst case is that the USENET model is followed where a peer pays an upstream provider to let it feed from them. This isn't really that bad, as a bunch of smaller services can co-operate together to get a single link to the main network and share costs between them.

I think this, really, is the only feasible model for now. If we design it right it could be general enough to let producers and consumers that aren't LiveJournal-based in later, such as (for example) TypePad pushing content into the network via LiveJournal, and aggregator peers which suck data from a bunch of RSS feeds and republish them onto the network as well as user-oriented aggregators which only consume content and provide something not unlike a LiveJournal friends page for those who don't have any wish to publish but want to read. That's for the future, though... for now, it'll probably just start as a small network between LJ and DJ and perhaps Plogs.net. What do you think?