I'm a mac software engineer with about 15 years in 'the biz'. I've been trying to write some LJ centric data collection and analysis tools (more aimed at the Meme) crowd. Several times however this has led me on a chase back to looking at the code of interest.bml
My current curiosity seems to stem from the fact that in high quantity interest lists (take lotr for example), the list tops out at 500 matches. Now, while my perl is not entirely to the level I want it I think I've found the segment of the code that handles it.
Now the code on CVS (which I'm aware may not match the code in production) suggests that the search lists are throttled to 1000. My curiosity is how the search routine is choosing which n users to display. As far as I can tell, the database search always returns the same throttled selection. This list does not seem to rotate with a random sampling of the pool available.
Now, my issue with this mechanism is that if this is occurring, for example, from the order of registration, then the list is heavily biased to the older users and the new users will have less and less likelihood of being found in a search. Further, as attrition often grows from oldest to newest, the likelihood of listed users being active and current users is bound to constantly decrease.
This was similar to the situation with the "match the score" function which was coded to bias people at the top of the alphabet.
Any further information on this would be appreciated as I'm glad to try to suggest algorithms to help make this functionality more useful.
Standard Disclaimer: This seemed like the most appropriate place to post this query. If this was wrong please direct as appropriate with my apologies.