Karl (supersat) wrote in lj_dev,

Minor HTML cleaner fix to produce valid HTML

As pointed out in this request, LiveJournal's HTML cleaner code has a minor bug. Basically, it'll convert the & entity into just a plain ampersand inside any tag attribute. I tracked down this source behavior to HTML::TokeParser. It decodes certain HTML entities before returning a token. This can be useful for some applications, but not in the case of the HTML cleaner. This patch fixes the problem by re-encoding the HTML entities inside of attributes.

I know Brad isn't going to look at this yet, but I thought that I'd post it anyway and see what other people think of it first.

Get the patch here.

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded