Brad Fitzpatrick (bradfitz) wrote in lj_dev,
Brad Fitzpatrick
bradfitz
lj_dev

The great English-removal project

The great English-removal project

Summary

We're removing all the hard-coded English text from all files on the site.

Why? Two reasons:

  1. So we can have LiveJournal in other languages:
    • Set Language -- this is unlinked until the project is done and the translators are done
    • Translate Area -- where the translators go to work
  2. So sites like DeadJournal can customize easier, while still running the latest code.

How it works

There are different instances of text domains, each with its own namespace. Domain types include: general (used by BML), faq, and journal (which takes an argument, on which journal to translate, like news or lj_maintenance).

The translation system is tied into everything, so translators need only visit one URL to translate all text in all domains, and the translation system manages revision control and publishing, based on the domain type.

Future domain types currently unimplemented include:

  • moods (there isn't a 1:1 mapping of moods from English to whatever, so each language needs its own tree structure)
  • fileedit (things that other users have permission to edit) but in fact, these will probably just die, and be moved into the general domain, then we get revision control for free, which fileedit currently doesn't provide)

Parent Types

Each language has a parent language from which it's derived, and the parent is classified as either similar or different. Further, each domain has a master language.

The master language for the 'general' domain is 'en', and is what we redistribute to other sites. However, since the FAQ and site journals are site-specific (think: "ljcom", not "livejournal" in CVS), the master language for those is en_LJ.

en_LJ is the "ljcom" (site-local to livejournal.com) language. Its parent language is 'en', with a type of similar, so anything we don't override from 'en' we get for free.

All the other languages on the site are derived from en_LJ. If we were sick enough, we could have en_GB (great britian) then derive from en_LJ, with parent type similar, so en_GB would only have to override translation items containing words that are spelled differently in British.

Translation items

In the new system, there are bunches of translation items which range in size from a word to a sentence to many paragraphs. Sometimes they'll contain HTML tags, BML tags, or variable insertion tags, but never code.

When a language editor changes something, and that language has children, they can mark the change as either typo (don't notify children), minor (notify, but don't require change), or major (which requires a translation update, given that the child-language is'different and not similar)

DeadJournal

So DeadJournal creates en_DJ, which derives from en, and they modify whatever they want. When we add new features/areas to livejournal general code, they update from CVS, then they have the base english text, but they can override it later then.

In CVS, look at livejournal's bin/upgrading/text.dat and ljcom's bin/upgrading/text-local.dat, which defines the language structure.

The command bin/upgrading/texttool.pl does all the magic, and will be documented shortly. Basically, the "load" command is all end-users (er, end-site-admins) need. It runs all the necessary commands in order, even if they're redundant, it can't fuck anything up.

What we need to do

We need to start removing hard-coded English from the site and putting it into en.dat and en_LJ.dat so it makes its way into the translation database so translators can work on it.

Look at ljcom's cgi-bin/bml/scheme/dystopia/generic.look for examples there (search for _ML and BML::ml). The BML _ML tag inserts a general item. If the code starts with a period, and the file is in a normal BML file (not an include file, or a library, etc, then the item is prefixed with the URI.)

So, say in htdocs/friends/add.bml you said: (=_ML .title _ML=) that would expand to code: /friends/add.bml.title, and that's what you'd want to put in en.dat.

Why not en_LJ.dat? Only use en_LJ.dat when the code you're using is ONLY in an ljcom page. If the code is ever in a general page, put it in en.dat.

Keep in mind:

When making new phrase codes, you want to re-use things up as much as possible, but not to the extent that you don't give translators a chance to change things based on context.

For instance, the "Create Journal" text might be used in several places, but on the left side-bar of the dystopia schema, I used the code: "dystopia.nav.createjournal" just in case they want to use something that's less ideal but shorter and thus more fitting for the left side-bar. When in doubt, use two translation items.

Also, don't split sentences. For instance, some places on the site we say "Back to the FAQ." or "Back to the support area." Somebody that had never studied languages might make "Back to" one translation item and the "the FAQ" and "the support area" others. But you're not considering then the gender or count, which matters a lot more to other languages than it does in English.

See the text at the top of the dystopia scheme which says "Hello, username!". A naive translation item would have been just "Hello", with the comma and exclamation mark fixed. But Spanish requires the leading upside-down exclamation mark, and the Japanese translators have already put the syllable "san" after the username: See?. So give them room to work. Don't split crap up too much. Don't force structure or punctuation on them.

Think you can handle it?

If you got a grasp of what it takes, I'd love patches. But, a dozen people sending me patches to en.dat and en_LJ.dat won't fly, because they won't all apply. So, I'm making texttool.pl support a command-line option to read in a specified file and insert items from there.

So, when you send me a patch, send me the diffs of the BML file you chose to work on, and also a file of this format:

# this is a comment
# this next line switches destination file language:
==LANG: en

/friends/add.bml.title=Add Friends

/friends/add.bml.intro1<<
This is my multi-line text.
If a line begins with a period, it needs two periods:
..This line starts with one period only!
This is how SMTP works, btw.
And a line with a single period ends the item:
.

# now we're inserting text for livejournal.com-only strings:
==LANG: en_LJ

dystopia.navhead.legal=Legal
dystopia.searchsite=Search LiveJournal:

/paidaccounts/index.bml.title=Paid Accounts
/paidaccounts/index.bml.why.head=Why get a paid account? 
/paidaccounts/index.bml.why.text<<
Because they're rad!
You should get one
.

Easy enough? Name that file something unique, maybe your username and a dash and some number you increment. "bradfitz-34.trans". Whatever.

Coordination

In the interest of toe safety, please email me (bradfitz@livejournal.com) and ask for an area to work on, and I'll give you a range of BML pages to do.

Questions?

I'll update this area with common questions and answers as they come in.

Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 56 comments