I've been playing with profile (from the Standard Library) and ruby-prof (the profile replacement written by Shugo Maeda and Charlie Savage. I really like ruby-prof — it's faster (from 5x on some short runs to 100x on longer runs), it provides more detail, and it will generate call graphs. I've noticed some differences in the way they run. I'll show you what I mean.
ruby-prof doesn't sort it's output in terms of total time used, so you'll need to sort it. After a bit of munging, the (cropped) output of running ruby-prof call_rotator test_file looks something like this:
%self cumulative total self children calls self/call total/call name
23.64 0.13 0.13 0.13 0.00 10002 0.00 0.00 String#split
21.82 0.30 0.39 0.12 0.27 10002 0.00 0.00 CallRot#read_line
18.18 0.55 0.54 0.10 0.44 1 0.10 0.54 #foreach
12.73 0.42 0.07 0.07 0.00 10002 0.00 0.00 #local
5.45 0.33 0.05 0.03 0.02 10002 0.00 0.00 CallRot#line_is_future?
3.64 0.44 0.02 0.02 0.00 10003 0.00 0.00 #allocate
3.64 0.35 0.02 0.02 0.00 20004 0.00 0.00 Array#pop
3.64 0.17 0.02 0.02 0.00 10002 0.00 0.00 Comparable#>
3.64 0.15 0.02 0.02 0.00 10002 0.00 0.00 OnCall#initialize
1.82 0.45 0.01 0.01 0.00 1 0.01 0.01 #open
Since ruby-prof sends it's output to stdout, it's easy to munge with standard tools (at least on Unix/Linus). It does mean that you'll need to be careful about comingling any output from your original program though. I used ruby-prof call-rotator test_file | tail +3 | sort -rn to print the methods in order of time spent in them. Since the header doesn't change, you can add that back in to make reading the table easier if you like.
Interestingly, ruby-prof runs so quickly that system activity can interfere quite a bit with the recorded timings (even to the point of changing the order of the sorted methods). Over the course of a longer running program, this should even out. I've not seen it change the order of methods actually defined in your program, but that doesn't mean it won't.
Here's the output of ruby -r profile call_rotator test_file:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
34.79 2.31 2.31 10002 0.23 0.42 CallRot#read_line
13.40 3.20 0.89 1 890.00 6630.00 IO#foreach
12.95 4.06 0.86 10002 0.09 0.15 CallRot#line_is_future?
11.14 4.80 0.74 10003 0.07 0.11 Class#new
7.98 5.33 0.53 10002 0.05 0.07 Comparable.>
5.42 5.69 0.36 10002 0.04 0.04 OnCall#initialize
4.67 6.00 0.31 20004 0.02 0.02 Array#pop
3.61 6.24 0.24 10002 0.02 0.02 String#split
3.46 6.47 0.23 10002 0.02 0.02 Time#local
2.41 6.63 0.16 10002 0.02 0.02 Time#<=>
You can see that the ruby-prof output has two columns that are missing from the stock profile output. self and children break the total time into more discrete measures. self shows how much time is spent in the method itself, exclusive calls out to child methods. children shows how much time is spent in calls to those child methods. (total is the same as self in the stock output.
It's also interesting that the two methods of profiling don't pick up the same set of method calls in the program being profiled, nor is one a superset of the other. Here's a side by side comparison of the methods found by each in profiling runs of another program:
ruby-prof profile
Array#<< Array#<<
Array#[] Array#[]
Array#each Array#each
<Class::IO>#allocate
<Class::IO>#open
<Class::Time>#allocate
<Class::Time>#now
Enumerable#each_with_index Enumerable.each_with_index
File#initialize File#initialize
Fixnum#* Fixnum#*
Fixnum#% Fixnum#%
Fixnum#+ Fixnum#+
Fixnum#to_s Fixnum#to_s
Integer#downto Integer#downto
Integer#upto Integer#upto
IO#open
IO#puts IO#puts
IO#write IO#write
Kernel#load Time#-
Time#- Time#+
Time#+ Time#day
Time#day Time#hour
Time#hour Time#initialize
Time#initialize Time#min
Time#min Time#mon
Time#mon Time#now
Time#year Time#year
#toplevel #toplevel
The four methods not caught by profile take up significant amounts of time, but won't change your basic profiling approach. Likewise the one method missed by ruby-prof isn't a game breaker, but it's good to be aware of things like this.
In both sets of output, you can see that the read_line is the more expensive of the methods, and the one where you can focus if you need to speed things up. The ruby-prof output also shows you that more time is spent in its children than locally, so you gains will be minimized. You'll get more information from a call graph, but that's whole 'nother blog entry — I'll get around to it next time.
If you found this post helpful, you might want to look at my ruby-prof post collection.
Hey Pat - Great article. Ruby-prof sorts by total-time for a method, not self time, which it looks like you are after. Total time is used since it makes more sense in the call graph reports.
ReplyDeleteIt would be easy to change this though. For flat profiles, take a look at line 53 in flat_printer.rb:
methods.sort.reverse.each do |pair|
Would change to something like this:
methods.sort_by {|method| method.self_time} ... etc.
Since sorting by self time or total time is useful, I think the best solution might be to add a command line parameter? Patches gladly accepted :)
Also interesting that IO.open doesn't get profiled. Will have to look into that.
GraphHtmlPrinter is pretty awesome. Unless you need to programmatically process the output, html printer is a much better report format than the flat printer.
ReplyDelete