DIY Ruby CPU profiling - Part II

Emil Soman  - March 12, 2015  |   , ,

In Part I we learned what CPU profiling means and also about the two modes of CPU profiling. In this part we’re going to explore CPU time and Wall time, the units used to measure execution cost. We’ll also write some code to get these measurements as the first step towards building our CPU profiler.

Part II. CPU time and Wall time

Wall time

Wall time is just the regular real world time that elapses between a method call and return. So if you were to measure the “Wall clock time” taken for a method to run, it would be theoretically possible to do so with a stopwatch. Just start the stopwatch when the method starts, and stop when the method returns. This is also called real time.

One important point about wall time is that it’s unpredictable and you may get different results every time you try to measure the same piece of code. This is because wall time is affected by the number of processes running in the background. When the CPU has to work on a bunch of processes at the same time, the operating system does a scheduling of the processes which are running at the time and tries to give a fair share of CPU to each of them. This means the total time spent by the CPU is divided into many slices and our method gets only some of these slices and not all of them. So while the wall clock ticks away, our process may be sitting idle and giving way for other processes which are running in parallel. This means the time spent on other processes will add to our Wall time too!

CPU time

CPU time is the time for which the CPU is dedicated to run the method. CPU time is measured in terms of CPU cycles(or ticks) which are used to execute the method. We can convert this into time if we know the frequency of the CPU in units of cycles per second aka Hertz. So if the CPU took x ticks to execute a method, and the frequency of the CPU is y hertz, the time taken by the CPU to execute the method = x/y seconds. Sometimes the OS does this conversion for us so we don’t have to do this calculation by ourselves.

CPU time will not be equal to Wall time. The difference depends on the type of instructions in our method. We can categorize the instructions into broadly 2 types: CPU bound and I/O bound. When I/O instructions are being executed, the CPU becomes idle and can move on to process other CPU bound instructions. So if our method has a time consuming I/O instruction, the CPU stops spending time on our method and moves on to something else until the I/O operation is completed. During this time the Wall time keeps ticking but the CPU time stops and lags behind Wall time.

Let’s say a very slow running method took 5 minutes on your clock to finish running. If you were to ask how much time was spent on the method, your wall clock would say “It took 5 minutes to run this method”, but the CPU would say “I spent 3 minutes of my time on this method”. So who are you going to listen to? Which time more accurately measures the cost of executing the method?

The answer is, it depends™. It depends on the kind of method you want to measure. If the method spends most of its time doing I/O operations, or it doesn’t deal with CPU bound instructions directly, the cost of execution depicted by CPU time is going to be grossly inaccurate. For these types of methods, it makes more sense to use Wall time as the measurement. For all other cases, it’s safe to stick with CPU time.

Measuring CPU time and Wall time

Since we’re going to write a CPU profiler, we’ll need a way to measure CPU time and wall time. Let’s take a look at the code in Ruby’s Benchmark module which already measures CPU time and Wall time.

def measure(label = "") # :yield:
  t0, r0 = Process.times, Process.clock_gettime(BENCHMARK_CLOCK)
  yield
  t1, r1 = Process.times, Process.clock_gettime(BENCHMARK_CLOCK)
  Benchmark::Tms.new(t1.utime  - t0.utime,
                     t1.stime  - t0.stime,
                     t1.cutime - t0.cutime,
                     t1.cstime - t0.cstime,
                     r1 - r0,
                     label)
end

So Ruby uses 2 methods from the Process class to measure time :

  1. times to measure CPU time
  2. clock_gettime to measure real time aka Wall time

But unfortunately the resolution of time returned by the times method is 1 second which means if we use times to measure CPU time in our profiler, we’ll only be able to profile methods that take a few seconds to complete. clock_gettime looks interesting, though.

clock_gettime

Process::clock_gettime is a method added in Ruby 2.1 and it uses POSIX clock_gettime() function and falls back to OS specific emulations to get the value of time in case clock_gettime is not available in the OS or when the type of clock we are measuring with clock_gettime is not implemented in the OS. This function accepts a clock_id and the time resolution as arguments. There are a bunch of clock_ids you could pass in to pick the kind of clock to use, but the ones we’re interested in are:

  1. CLOCK_MONOTONIC: This clock measures the elapsed wall clock time since an arbitrary point in the past and is not affected by changes in the system clock. Perfect for measuring Wall time.
  2. CLOCK_PROCESS_CPUTIME_ID: This clock measures per-process CPU time, ie the time consumed by all threads in the process. We can use this to measure CPU time.

Let’s make use of this and write some code:

module DiyProf
  # These methods make use of `clock_gettime` method introduced in Ruby 2.1
  # to measure CPU time and Wall clock time.

  def self.cpu_time
    Process.clock_gettime(Process::CLOCK_PROCESS_CPUTIME_ID, :microsecond)
  end

  def self.wall_time
    Process.clock_gettime(Process::CLOCK_MONOTONIC, :microsecond)
  end
end

We could use these methods to benchmark code:

puts "****CPU Bound****"
c1, w1 = DiyProf::cpu_time, DiyProf::wall_time
10000.times do |i|
  Math.sqrt(i)
end
c2, w2 = DiyProf::cpu_time, DiyProf::wall_time
puts "CPU time\t=\t#{c2-c1}\nWall time\t=\t#{w2-w1}"

puts "\n****IO Bound****"
require 'tempfile'

c1, w1 = DiyProf::cpu_time, DiyProf::wall_time
1000.times do |i|
  Tempfile.create('file') do |f|
    f.puts(i)
  end
end
c2, w2 = DiyProf::cpu_time, DiyProf::wall_time
puts "CPU time\t=\t#{c2-c1}\nWall time\t=\t#{w2-w1}"

Running this code would give an output similar to this:

****CPU Bound****
CPU time	=	5038
Wall time	=	5142

****IO Bound****
CPU time	=	337898
Wall time	=	475864

This clearly shows that on a single CPU core, CPU time and Wall time are nearly equal when running purely CPU bound instructions whereas CPU time is always less that Wall time when running I/O bound instructions.

Recap

We’ve learned what CPU time and Wall time mean, their differences, and when to use which. We also wrote some Ruby code to measure CPU time and Wall time which would help us measure time in the CPU profiler we’re building. In part 3 we’ll take a look at Ruby’s tracepoint API and use it to build an instrumentation profiler. Thanks for reading! If you would like to get updates about subsequent blog posts in this DIY CPU profiler series, do follows us on twitter @codemancershq.