?

Log in

No account? Create an account
LiveJournal Development [entries|archive|friends|userinfo]
LiveJournal Development

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

October 18th, 2001

Friends Groups [Oct. 18th, 2001|04:32 am]
LiveJournal Development

lj_dev

[thorshammer]
[mood |Other...]
[music |The Beauty Shop: Lies]

Ok this problem has been thwarting me for some time. I can't seem to figure out how to derive the bits used for Friends Group posting, at first I thought it was the sort-order, but I was mistaken. I have no idea, could someone help?

Thanks:)
~Chris
link3 comments|post comment

(no subject) [Oct. 18th, 2001|01:21 pm]
LiveJournal Development

lj_dev

[halkeye]
[music |Macross Plus - Voices (japanese)]

Is the field on personal info for who directed you to LJ (or whatever its called) ever accually used for anything?
link1 comment|post comment

crawling lj [Oct. 18th, 2001|06:26 pm]
LiveJournal Development

lj_dev

[confuseme]
So, I'm writing an lj crawler for a research project. I think it should be fairly polite, but I'd like to get some comments first to make sure I'm not missing anything. I don't want to hit the lj servers any harder than an ordinary user might, and I don't think I will. Let me know if you think otherwise, or if you have any suggestions. Here's how it works:

First, it hits the "random user" URL (http://www.livejournal.com/random.bml), and gets a username from the resulting redirect. It closes the connection as soon as it gets a user name, and never follows the redirect to request the actual user page.

Then, it figures out the URL for the user's info page, requests that page, and parses it for some information about the user (Birthdate and Location). It also closes that connection as soon as it has the data it needs. If that data doesn't meet certain criteria, the crawler stops here.

If the user does meet the criteria, it figures out the URL for the user's calendar page, requests it and parses it to get a list of journal entry links.

Finally, it requests the journal entries, one by one, with a pause between each request (I'm thinking 10 seconds or so, suggestions are welcome.)

I plan to run the script in a loop, with a significant pause between each run -- maybe something like 20 seconds.

I don't think that should be particularly different from an ordinary user browsing lj. Is there anything I'm missing here?
link70 comments|post comment

navigation
[ viewing | October 18th, 2001 ]
[ go | Previous Day|Next Day ]