Friday, August 04, 2006

Profile and ruby-prof

I've been playing with profile (from the Standard Library) and ruby-prof (the profile replacement written by Shugo Maeda and Charlie Savage. I really like ruby-prof — it's faster (from 5x on some short runs to 100x on longer runs), it provides more detail, and it will generate call graphs. I've noticed some differences in the way they run. I'll show you what I mean.

ruby-prof doesn't sort it's output in terms of total time used, so you'll need to sort it. After a bit of munging, the (cropped) output of running ruby-prof call_rotator test_file looks something like this:


%self  cumulative total  self  children  calls self/call total/call  name
 23.64     0.13   0.13   0.13  0.00     10002   0.00     0.00     String#split
 21.82     0.30   0.39   0.12  0.27     10002   0.00     0.00     CallRot#read_line
 18.18     0.55   0.54   0.10  0.44         1   0.10     0.54     #foreach
 12.73     0.42   0.07   0.07  0.00     10002   0.00     0.00     #local
  5.45     0.33   0.05   0.03  0.02     10002   0.00     0.00     CallRot#line_is_future?
  3.64     0.44   0.02   0.02  0.00     10003   0.00     0.00     #allocate
  3.64     0.35   0.02   0.02  0.00     20004   0.00     0.00     Array#pop
  3.64     0.17   0.02   0.02  0.00     10002   0.00     0.00     Comparable#>
  3.64     0.15   0.02   0.02  0.00     10002   0.00     0.00     OnCall#initialize
  1.82     0.45   0.01   0.01  0.00         1   0.01     0.01     #open

Since ruby-prof sends it's output to stdout, it's easy to munge with standard tools (at least on Unix/Linus). It does mean that you'll need to be careful about comingling any output from your original program though. I used ruby-prof call-rotator test_file | tail +3 | sort -rn to print the methods in order of time spent in them. Since the header doesn't change, you can add that back in to make reading the table easier if you like.

Interestingly, ruby-prof runs so quickly that system activity can interfere quite a bit with the recorded timings (even to the point of changing the order of the sorted methods). Over the course of a longer running program, this should even out. I've not seen it change the order of methods actually defined in your program, but that doesn't mean it won't.

Here's the output of ruby -r profile call_rotator test_file:


  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 34.79     2.31      2.31    10002     0.23     0.42  CallRot#read_line
 13.40     3.20      0.89        1   890.00  6630.00  IO#foreach
 12.95     4.06      0.86    10002     0.09     0.15  CallRot#line_is_future?
 11.14     4.80      0.74    10003     0.07     0.11  Class#new
  7.98     5.33      0.53    10002     0.05     0.07  Comparable.>
  5.42     5.69      0.36    10002     0.04     0.04  OnCall#initialize
  4.67     6.00      0.31    20004     0.02     0.02  Array#pop
  3.61     6.24      0.24    10002     0.02     0.02  String#split
  3.46     6.47      0.23    10002     0.02     0.02  Time#local
  2.41     6.63      0.16    10002     0.02     0.02  Time#<=>

You can see that the ruby-prof output has two columns that are missing from the stock profile output. self and children break the total time into more discrete measures. self shows how much time is spent in the method itself, exclusive calls out to child methods. children shows how much time is spent in calls to those child methods. (total is the same as self in the stock output.

It's also interesting that the two methods of profiling don't pick up the same set of method calls in the program being profiled, nor is one a superset of the other. Here's a side by side comparison of the methods found by each in profiling runs of another program:


  ruby-prof                  profile
Array#<<                   Array#<<     
Array#[]     Array#[]     
Array#each     Array#each     
<Class::IO>#allocate    
<Class::IO>#open    
<Class::Time>#allocate    
<Class::Time>#now    
Enumerable#each_with_index Enumerable.each_with_index   
File#initialize     File#initialize     
Fixnum#*     Fixnum#*        
Fixnum#%     Fixnum#%        
Fixnum#+     Fixnum#+        
Fixnum#to_s     Fixnum#to_s     
Integer#downto     Integer#downto     
Integer#upto     Integer#upto     
                           IO#open   
IO#puts      IO#puts       
IO#write     IO#write       
Kernel#load     Time#-         
Time#-      Time#+       
Time#+      Time#day       
Time#day     Time#hour        
Time#hour     Time#initialize      
Time#initialize     Time#min       
Time#min     Time#mon         
Time#mon     Time#now         
Time#year     Time#year        
#toplevel     #toplevel         

The four methods not caught by profile take up significant amounts of time, but won't change your basic profiling approach. Likewise the one method missed by ruby-prof isn't a game breaker, but it's good to be aware of things like this.

In both sets of output, you can see that the read_line is the more expensive of the methods, and the one where you can focus if you need to speed things up. The ruby-prof output also shows you that more time is spent in its children than locally, so you gains will be minimized. You'll get more information from a call graph, but that's whole 'nother blog entry — I'll get around to it next time.

If you found this post helpful, you might want to look at my ruby-prof post collection.

2 comments:

Charlie said...

Hey Pat - Great article. Ruby-prof sorts by total-time for a method, not self time, which it looks like you are after. Total time is used since it makes more sense in the call graph reports.

It would be easy to change this though. For flat profiles, take a look at line 53 in flat_printer.rb:

methods.sort.reverse.each do |pair|

Would change to something like this:

methods.sort_by {|method| method.self_time} ... etc.

Since sorting by self time or total time is useful, I think the best solution might be to add a command line parameter? Patches gladly accepted :)

Also interesting that IO.open doesn't get profiled. Will have to look into that.

Anonymous said...

GraphHtmlPrinter is pretty awesome. Unless you need to programmatically process the output, html printer is a much better report format than the flat printer.