DIY Ruby CPU profiling - Part I

At Codemancers, we're building Rbkit, a fresh code profiler for the Ruby language with tonnes of cool features. I'm currently working on implementing a CPU profiler inside rbkit gem which would help rbkit UI to reconstruct the call graph of the profiled ruby process and draw useful visualizations on the screen. I learned a bunch of new things along the way and I'd love to share them with you in this series of blog posts.

We're going to start from the fundamentals and step by step, we're going to write a rudimentary CPU profiler for Ruby ourselves! By the time we finish we would learn :

What's CPU profiling
Profiling modes - Instrumentation and Sampling
CPU time and Wall time - what they mean and how to measure them
Writing a simple C extension and using it in Ruby land
Ruby tracepoints - call and return
Signal handling in C
Pausing Ruby process using a signal and peeking at the call stack
Some useful and silly experiments using profiling data

Part I. An Introduction to CPU Profiling

By doing a CPU profile of your program, you can find out how expensive your program is with respect to CPU usage. In order to profile your program, you'll need to use a profiling tool and follow the following steps :

Start CPU profiling
Execute the code you want to profile
Stop CPU profiling and get profiling result
Analyze result

By analyzing the profiling result, you can find the bottlenecks which slow down your whole program.

Profiling modes

CPU profiling is done in broadly 2 ways:

1. Instrumentation

In this mode, the profiling tool makes use of some hooks, either provided by the interpreter or inserted into the program, to understand the call graph and measure the execution time of each method in the call graph.

For example, consider the following piece of Ruby code :

def main
  3.times do
    find_many_square_roots
    find_many_squares
  end
end
 
def find_many_square_roots
  5000.times{|i| Math.sqrt(i)}
end
 
def find_many_squares
  5000.times{|i| i**2 }
end
 
main

I've inserted some comments to help understand how the hooks would get executed if the Ruby interpreter gave us method call and return hooks:

def main
  # method call hook gets executed
  3.times do
    find_many_square_roots
    find_many_squares
  end
  # method end hook gets executed
end
 
def find_many_square_roots
  # method call hook gets executed
  5000.times{|i| Math.sqrt(i)}
  # method end hook gets executed
end
 
def find_many_squares
  # method call hook gets executed
  5000.times{|i| i**2 }
  # method end hook gets executed
end
 
main

Now if we could print the current time and the name of the current method inside these hooks, we'd get an output which looks somewhat like this :

sec:00 usec:201007	called  	main
sec:00 usec:201108	called  	find_many_square_roots
sec:00 usec:692123	returned	find_many_square_roots
sec:00 usec:692178	called  	find_many_squares
sec:00 usec:846540	returned	find_many_squares
sec:00 usec:846594	called  	find_many_square_roots
sec:01 usec:336166	returned	find_many_square_roots
sec:01 usec:336215	called  	find_many_squares
sec:01 usec:484880	returned	find_many_squares
sec:01 usec:484945	called  	find_many_square_roots
sec:01 usec:959254	returned	find_many_square_roots
sec:01 usec:959315	called  	find_many_squares
sec:02 usec:106474	returned	find_many_squares
sec:02 usec:106526	returned	main

As you can see, this output can tell us how much time was spent inside each method. It also tells us how many times each method was called. This is roughly how instrumentation profiling works.

Pros :

High accuracy
We get method invocation count
Easy to implement

Cons :

Extra overhead of executing hooks for each and every method that's profiled

2. Sampling

In this mode of profiling, the profiler interrupts the program execution once in every x unit of time, and takes a peek into the call stack and records what it sees(called a "sample"). Once the program finishes running, the profiler collects all the samples and finds out the number of times each method appears across all the samples.

Hard to visualize? Let's look at the same example code and see how different the output would be, if we used a sampling profiler.

Output from a sampling profiler would look like this :

Call stack at 0.5sec: main/find_many_square_roots
Call stack at 1.0sec: main/find_many_square_roots
Call stack at 1.5sec: main/find_many_square_roots
Call stack at 2.0sec: main/find_many_squares

In this example, the process was interrupted every 0.5 second and the call stack was recorded. Thus we got 4 samples over the lifetime of the program and out of those 4 samples, find_many_square_roots is present in 3 and find_many_squares is present in only one sample. From this sampling, we say that find_many_square_roots took 75% of the CPU where as find_many_squares took only 25%. That's roughly how sampling profilers work.

Pros :

Negligible overhead compared to instrumentation profiling
Easy to find slow/long running methods

Cons :

Not good at measuring short running methods
We don't get method invocation count
Not easy to write a sampling profiler on your own

Recap

We just looked into what CPU profiling means and the 2 common strategies of CPU profiling. In part 2, we'll explore the 2 units of measuring CPU usage - CPU time and Wall time. We'll also get our hands dirty and write some code to get these measurements. Thanks for reading!