Gaal Yahas (gaal) wrote in lj_dev,
Gaal Yahas
gaal
lj_dev

  • Music:

getcomments

Hi! I'm a contributor to logjam, the GTK+ client. I've been on LJ for just over a year and have been interested in its technological sides since I joined. I've had jobs as a programmer (Perl, C, databases, unix) in the past but I find that as long as I can get away with it, programming is much more fun when I'm not payed to do it. Now, to business.

This is a proposal for getting entry comments sync into the protocol. I know this has been discussed before but has never been implemented yet, primarily because it's hard to get right. I'm sure my proposal isn't complete, but it tries to take into account the existing protocol style, load demands on the server, and client functionality. Please feel very much invited to discuss this here; I want to get it off the ground.

Purpose

To give LJ clients a well-defined and efficient way to archive comments. Today clients can archive entries with syncitems and getevents(selecttype=syncitems); here I am concerned with extending this in a similar fasion to user comments.


Server considerations

The main consideration on the server side is to ensure the response doesn't get too big, but to make the limit useful to the client if it does. There are three mechanisms that I know of to tackle this problem; I adopted two of them. The mechanism this proposal does not use is chunking the response in equal-sized bits (like search results on Google, for instance, or viewing the last n entries in your friends list, or the several-paged view in huge discussions in news). What it does use is stateful sync (like getevents does) and folding threads and sending only information about the thread parent (like talkread.bml does). Assuming the hard limit on response count remains what it is today (500), this will probably always allow the server to give the client a full response, even if all the data it sends is folded. In the unlikely case where a really huge post gets lots of direct responses, there's the lightweight foldthread_n_parentid protocol response field on which I believe we can increase the limit substantially.

Issue: since currently the databases do not maintain comment sync information, this information will have to be fed into the tables sometime. There's no way about this, I'm afraid; I know Brad very recently commented he won't consider suggestions that affect very large tables, so the only workaround I can offer is to make populating these tables lazy, that is, have them generated the first time a client requests a comment sync. This will probably mean the very first request a user makes will fail with a timeout. This needs to be discussed; I'll leave it aside for now.

Issue #2: this feature will likely be limited to the user's own journal and communities he or she owns, for the same reasons getevents is.


syncitems

The syncitems protocol mode already defines a comment type ("C") to be returned for comment items in the sync_n_item response field, so there's almost nothing to change in it. Except adding one request field:

synctype - An optional single value that specifies the server should only return items of the given type (either "L", "C", or "T"). If this is not set, the server sends items of all sorts.

The rationale behind this field is that if you are syncing only entries from a journal with 100 entries and 800 comments, the comments might clog your syncitems requests (there's a limit, currently of 500, of how many syncitems the server returns. In this scenario, since you're not syncing comments, you might have the server tell you there are 500 comments, and no more--and that's useless for you, since you'll never see the entries).


getcomments

This is the proposed new mode for fetching comments, which mimics the existing getevents mode. The major difference between the two modes (apart from this one being for comments and not journal entries) is that getcomments will sometimes return folded "threadid" instead of the actual data, if otherwise the response would be too big. These threadids should be kept by the client and used on demand, e.g. when the user clicks on "expand thread".

Description
Download comments to parts of the user's journal.

Request

mode - The protocol request mode: getcomments
user, password, hpassword, truncate, prefersubject, noprops, lastupdate, beforedate, lineendings, year, month, day, lastn usejournal - akin to the same-name fields in the getevents mode.
setsync - Optional. If present and set to "1", marks downloaded information as archived by the client, even if it is selected with selecttypes other than "syncitems". The meaning of this field is only to prevent future syncs from downloading this data. This field need not be used when using the "syncitems" selection type; it is implicitly on then always.
selecttype - Here you choose how you want to select your comments. The possible values here fall into four groups:

  1. one - gets exactly one specific comment by itemid, without replies to it. If you use this selecttype and the request succeeds, you are guaranteed to get all the information that you asked for.
  2. thread, entry - using itemid as parent, return child entries of that itemid. (In the case of thread, return also the parent item itself since it is a comment; but not in the case of entry, where the entry itself was presumably already sent.) This selecttype returns a bunch of comments and/or threadids (see below), depending on the size of the output. That is, it does not guarantee to give you all the information you asked for, but it provides you with handles for getting more of it if you need.
  3. day, lastn - using the information in year/month/day (for day) or lastn (for lastn) to select a bunch of entries you're interested in, the server will respond as if you had made several getcomments requests of selectttype entry. Note that since this potentially selects a large number of comments, your chances of receiving threadids rather than actual comments increases, so in lastn you should keep howmany small.
  4. syncitems - get some number of items (which the server decides) that have changed since a given time (specified in the lastsync parameter). Not that because the server decides what items to send, you may or may not be getting everything that's changed. But unlike all of the modes above with the exception of one, this mode is guaranteed to give you comments and not folded threads.

Response

success, errmsg - as in getevents.
events_count, events_n_itemid events_n_eventtime, events_n_event, events_n_subject - as in getevents
events_n_parentid - The id of the entry or comment to which this event is a reply to.
events_n_from - Optional. The userid of the author of this comment. Not sent if the comment is anonymous.
prop_count, prop_n_itemid, prop_n_name, prop_n_value - Used as in getevents (but used for: poster userpic, time, mood icon, ip address, etc.)
foldthread_count - Optional. If the server detected it was about to send too many comments in full, it "folds" some of the threads (at its own discretion) and sends only the leading comment in them, together with a note that this threadid is not "complete". If the user is interested in the thread, another request must be made to get it. (For very long threads this process has the potential of taking several iterations.)
foldthread_n_threadid - The itemid of a comment that leads a folded thread. Use this in a later getcomments/selecttype=thread request to unfold the thread.
foldthread_n_parentid - Optional. If the reply was so loaded that the server decided not to even send the the full event of the thread leader, it will supply the parentid of this thread so that the client at least knows where to place the fold.

That's it! Thanks for reading so far, and if you have any comments, please let me know about them! (So that I can one day download them? :-)

Subscribe

  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 5 comments