Arnhem (arnhem) wrote in lj_dev,

I'm in the process of making a fair number of changes to a local copy of LJ, and thought it was worth checking the sense of it all here.

Essentially, my purpose is to produce something usable by researchers (a kind of online lab book), but it really needs a clean way of including maths (errm, US: "math") content, for this to make sense.

This splits into two problems:
  • a clean way of inputing maths
  • how to render it well

One thing I'm aware of is that having this single focus makes me likely to make changes that don't mesh well with parts of LJ that I'm not interested in - I'll be very happy to be told about these 8-)

My current approach to the former has been to invent an <lj-math> tag, and implement an expand_embedded hook that looks for the opening and closing tags, and replaces it with appropriate text. My intention is that the content of such tags should be iTeX (ie a useful subset of LaTeX maths). I'll shortly be attempting to bolt itex2mml in to do conversion to mathml, although that's not necessarily the only appropriate conversion to do.

The rendering problem is more complicated - the Mozilla family can render mathml in xhtml pages of type application/xhtml+mml, and I believe that IE can do so with an appropriate (free) plugin.

However, this imposes a need to make all the pages thus rendered strict xhtml compliant. I've spent a happy day ploughing through much of the htdocs/ , cgi-bin/scheme/bml/ and bin/updating/ directories, finding all the upper cased tags, unquoted attribute values, <br> instead of <br />, unclosed <p>, and so on.

It seems uncontroversial to me that it would be good to clean up the lj cvs to remove the remaining relics of pre-xhtml usage, and I can provide some hopefully useful diffs ...

However, it's occurred to me that unilaterally changing all of lj to strict xhtml is likely to break badly, since people might well be depending on non-strictness in their own edited styles? This seems to be a difficult area - but I wonder if generation of the relevant (<?xml ...?> and DOCTYPE) headers could be done on a style-by-style basis, thus giving both authors and readers control over the situation?

In that case, a flag in each style that indicated whether it was strict xhtml or not (and whether it should be delivered with contenttype application/xhtml+mml), would enable the hook that parses the (hypothetical/proposed-by-me) <lj-math> tag to decide whether to render to MathML, or to some other representation (just dumping the pseudo-latex as-is would provide a pretty good fall-back, in fact).

Apologies for the length of this - in writing it, I'm horribly aware that I may be barking up completely the wrong tree in the ways that I'm thinking about this problem - I'd very much welcome feedback!

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded