February 25th, 2004

Bot policy

Barring major, well-founded objections, I'm going to enforcing a new policy on bots that scrape the site (userinfo/FOAF/fdata/etc).

The policy is:

If you're scraping, your useragent string must include a contact email. (And ideally a URL of the project)

For instance:

http://fooland.com/ljtoy.html; bob@fooland.com

I'd like to be very bot-friendly, but that requires bots be friendly back.

I'd also like to get up a URL ($LJHOME/bots/ ?) which explains:

-- the rules (user agent, rates)
-- what we provide in machine-readable format
-- who to contact for other access

Then when we block a new bot that hasn't read the rules, the block message will include:

"You've been banned because you seem to be a new bot. Please read the bot rules at $LJHOME/bots/ and contact us to get unblocked."

Sound acceptable?

Please pass word in relevant communities: lj_clients, lj_research, etc. (I don't follow everything.)