January 7th, 2005

  • vasaki

multi-dimensional categorization

I know that this topic has already been discussed several times here, but nevertheless.

I don't use LJ or any other popular blogger for publishing. Let me explain why. I personally could use blog for one reason only - I view blog as knowledge base with possibility to make certain parts of it accessible to people. The concept of "Knowledge base" is very much disputable, I see it as a carefully classified information snippets collection. The key thing here is "carefully classified". A simple categorization is not enough, each snippet has to be classified in several dimensions. For instance, I might have certain pieces related to IBM, and other pieces related to HPC (high-performance computing). Then, the article about "Deep Blue" I would assign to both classes - IBM and HPC to make the future references easier.

It is a very modern area of research nowadays - ontologies, Semantic web, semantic blogging and the like. However I don't think blogging needs all that technology now, all it needs is a multi-dimensional classifiaction, as I described it here. A certain central structure of available classes is also nice-to-have. For instance, in addition LJ could contain a central list of keywords, meaning of which are unambigous, which people can use as classes for their postings. For example, central directory could contain a tree-like list like this

- Companies
	- IT
		- IBM
		- Sun Microsystems
		- Microsoft
		- SAP
		- Nestle
		- Kraft
		- P&G
- IT area
	- ERP
	- Web

Thus, I could list, for instance, everything classified as "Microsoft" and "ERP" to find out any activity of Microsoft on ERP market. This is more close to so-called "Semantic blogging" and I tend to think it to be quite utopian idea, and not urgent at all. Simple multi-dimensional user-specific classification would be enough for the beginning.
Of course, to make any use of such classifications, we need to add some sort of query language.

http://nudnik.ru - russian blog engine that is exactly the concept I'm talking about (check small keywords after # sign. Some posts contain several keywords.

http://www.livejournal.com/community/lj_style/237565.html - simple one-dimensional categorization in LJ. The number of entries in each category is huge, very difficult for searching.

http://www.livejournal.com/community/lj_dev/664279.html - parts of the discussion is relevant, my post doesn't care at all about the "Friends" concept

http://jena.hpl.hp.com:3030/blojsom-hp/blog/ - Semantic Blogging demonstrator, scroll to the bottom to see the categories.

partial ack tcp corruption bug

Thanks to all your bug reports and especially the tcpdumps, F5's been able to find and fix (in only 3 hours!) the TCP corruption issues you've all seen.

The new code is running now and should fix the problems you've been seeing.

Please report any problems that happen past this point. And with pcap files, ideally... thanks!

(FYI: we're running some pre-release BIG-IP code because we do some bizarre HTTP and load balancing stuff and they wanted us to test it...)

But it's totally worth it. We can do things like:
Collapse )
amused, happy
  • mart

Corrupt Data

Something is corrupting data between the web servers and my client. The most obvious symptom of this is that userpics and picpix/pics.lj images are getting distorted, but it's also corrupting the gzip-encoded data streams on journal views and causing pages which have all of the right characters but in the wrong order. It's intermittent, though; refreshing usually fixes it or at least changes the nature of the corruption. Several other people have mentioned this, too.

Given the timing, I'm guessing the blame lies with the new network configuration.

Six Apart and ownership of code

I know some of you are still apprehensive about the Six Apart acquistion, so I wanted to give you guys some updates.

Six Apart doesn't want to own your code.
They don't even care too much about owning Danga's code. They just want a right to use it, and they want to make sure somebody is able to legally go after any company that tries to violate the GPL'ed code by selling server software based off LiveJournal code without releasing their modifications. (no, this does not mean attacking DeadJournal, because DeadJournal doesn't sell/distribute their changes, and DeadJournal even gives stuff back from time to time....)

We're in talks now with the Free Software Foundation to figure out how to best do this. (well, they sent us some mail offering help and we replied saying that'd rock, but that was just today, so "in talks" might be overstating it... but we're working to start talks)

Under our old pre-SixApart TOS it said something like we own all your contributions. That was just boilerplate stuff that I personally never wanted it in there, but I wasn't good at legal documents, so I left it. When SixApart bought the company they left that in, but their lawyers changed some words a little to make it clearer. So while technically SixApart could arguably own your contributions now (if you agreed to the TOS), they don't want them. We want you guys to own them, as long as we have a license to use them. (under the GPL, Artistic, BSD, whatever....)

I'll give you guys an update when we're further along in these talks. Probably next week ... not this weekend.

Future LiveJournal contributions
People think we're ditching this codebase and moving to TypePad or Movable Type. You know how hard that'd be? Think how many LiveJournal features and formats we'd have to port. We love the LiveJournal code and associated codebases and you're still going to see us hacking on them, adding new stuff, fixing stuff, etc.

To come..
More announcements when we know more. Feel free to flood us with questions and we'll answer. We don't want anybody left in the dark about stuff.