Brad Fitzpatrick (bradfitz) wrote in lj_dev,
Brad Fitzpatrick
bradfitz
lj_dev

Help needed: profiling LJ

Anybody good at Perl profiling?

I sped up the HTML cleaner a bunch today, using the Benchmark module. That works when you already know a major area that's a hot spot and slow, which I knew the HTML cleaner was.

But, when you don't know where to look, you turn to profilers, like Devel::DProf, which apparently works okay in Perl 5.8, but not Perl 5.6 (where it just segfaults).

Anyway, I obtained the following from Apache::DProf on a Perl 5.8 development machine:

http://www.danga.com/temp/tmon.out

(just haphazardly clicking around my dev install trying to emphasize hits on where we get the most: lastn, friends, comments, userinfo, etc)

# dprofpp -u -O 20
Apache::LiveJournal::journal_content has -1 unstacked calls in outer
Exporter::import has -1 unstacked calls in outer
AutoLoader::__ANON__ has 1 unstacked calls in outer
Compress::Zlib::__ANON__ has 1 unstacked calls in outer
IO::Socket::import has -1 unstacked calls in outer
AutoLoader::AUTOLOAD has -1 unstacked calls in outer
Exporter::Heavy::heavy_export has 2 unstacked calls in outer
Total Elapsed Time = 98.84301 Seconds
         User Time = 3.843011 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 31.7   1.219  0.047     20   0.0610 0.0024  Apache::LiveJournal::journal_content
 31.3   1.204  1.709     10   0.1204 0.1709  Apache::BML::handler
 6.25   0.240  0.322   1464   0.0002 0.0002  BML::__ANON__
 5.85   0.225  0.232    785   0.0003 0.0003  DBD::mysql::db::prepare
 2.84   0.109  0.139     56   0.0020 0.0025  Apache::LiveJournal::db_logger
 2.84   0.109  0.127     56   0.0019 0.0023  Apache::LiveJournal::trans
 2.81   0.108  0.125      8   0.0135 0.0156  LJ::Talk::load_comments
 1.04   0.040  0.050      8   0.0050 0.0062  LJ::load_codes
 1.01   0.039  0.049     16   0.0025 0.0030  LJ::load_user_props
 1.01   0.039  0.038     86   0.0005 0.0004  DBD::mysql::dr::connect
 0.96   0.037  0.037    842   0.0000 0.0000  DBI::st::fetch
 0.78   0.030  0.030     68   0.0004 0.0004  BML::Cookie::FETCH
 0.78   0.030  0.030     76   0.0004 0.0004  LJ::img
 0.75   0.029  0.037     57   0.0005 0.0007  LJ::get_remote
 0.75   0.029  0.037     56   0.0005 0.0007  LJ::end_request
 0.70   0.027  0.027    871   0.0000 0.0000  DBD::_mem::common::DESTROY
 0.52   0.020  0.020     19   0.0010 0.0010  Apache::BML::reset_codeblock
 0.52   0.020  0.020      6   0.0033 0.0033  LJ::get_bio
 0.52   0.020  0.020      1   0.0198 0.0196  LJ::Talk::Post::init
 0.52   0.020  0.039     14   0.0014 0.0028  LJ::load_userpics


Now, why are Apache::LiveJournal::journal_content and Apache::BML::handler reporting so much CPU usage? They hardly do anything. They do, however, call the XS-based gzip code, which I believe Devel::DProf can't trace, so it's reported in the caller instead. (bleh) So I guess we have to ignore both those, which is 60% of the runtime.

Unless: I wonder if a Debian package of the Compress::Zlib built for 686 would help at all. Anybody able to profile the difference for me, or give me pointers on the best Debian way to go about that?

Moving on, we can't get at the profiling of BML::__ANON__ stuff (which is all BML pages). But... we for profiling have a special BML mode that compiles all BML pages to named subs instead of anonymous subs. That might prove interesting.

DBI stuff: ignore. I wasn't profiling with memcache. That stuff mostly goes away.

Apache::LiveJournal::db_logger/::trans: again, those just call a lot of XS stuff, they're not actually slow. (::trans might help to improve, but db_logger is simple)

load_comments.... it is big and perhaps CPU heavy, but I don't see any obvious way to improve it, and it's fragile, so I don't want to really mess with it.

I think the big thing to do here is investigate how to make gzip stuff faster, if possible, and break up the BML reporting so we can see where we really need to work on.

The good thing is that the HTML cleaner doesn't even show up in that list anymore.

Curious: why don't I see any S1 or S2 function calls in the dprofpp trace? (dprofpp -T shows no match for "create_view" or "LJ::S2", which it should?) Does Devel::Dprof exclude certain things? I don't see the pattern. Oh, probably because all that S1/S2/"make_journal" stuff is called from a closure within Apache::LiveJournal::journal_content .... ah, hell. I think Devel::Dprof is pretty much useless for us, then?

Update: Rebuilt libcompress-zlib-perl with apt-build to see if gcc could make faster code for it if it didn't have to be 386-safe. Benchmarks, unfortunately, are exactly the same.
Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 19 comments