Profile Ruby 2.0 apps using DTrace - part 2

Hemant  - May 27, 2013  |   , , ,

This is Part - 2 in series of articles we are writing on Dtrace. If you haven’t already - do read Part 1.

Running Dtrace scripts

Before we delve deeper into guts of Dtrace, one of the questions which I often get asked is, How can I profile one off programs (i.e not a application running in background) with Dtrace.

There are basically couple of ways you can fire Dtrace.

~> sudo dtrace -qs <dtrace_script>.d
~> sudo dtrace -s count.d -c '/Users/gnufied/.rbenv/versions/20/bin/ruby leaky_app.rb'

What is a predicate?

In last part of tutorial we described what a probe description looks like. For a refresh:

probe description

Predicate is a way to filter the number of probes fired based on certain critirea. For example, in last tutorial we wrote a Dtrace script to keep track of all system calls being made by all currently running applications.

Lets say, now we want to restrict those probes only to system calls that are being made by Ruby applications. We can do this by introducing a predicate in our Dtrace script.

	printf("%s(%d) called %s\n", execname, pid, probefunc);

Or you can use pid predicate:

 * Print system calls made by pid 61465
/pid == 61465/
	printf("%s(%d) called %s\n", execname, pid, probefunc);

Now when I was learning Dtrace, I was mightly confused from where these things called execname, pid, probefunc are coming from? Turns out, Dtrace automatically provides built-in variable definitions for inspecting processes. They make Dtrace scripts quite powerful.

Built-in Variables

Here is a list of important built-in variables available in your DTrace scripts:

Now, I believe previous scripts would have started making sense. Above list does not include all built-in variables though.

Lets do an exercise and try to print number of bytes being read (from network or file system) by process with pid 61465. The script will look like:

/pid == 61465/
    printf("%s read %d\n", execname, arg2);

If you crack open man 2 read you will find out that read system call’s third argument contains number of bytes being read.

Having just ability to print numbers as probes fire is useful but you can easily imagine it would be nice, if we can somehow aggregate the data. For example, how much of your program’s memory is taken by string allocation or how much time is being spent in total in the function foo. Ability to aggregate such data would be really useful.

Aggregates and Collections

Agrregates, Collections and variables are most important tools DTrace has to offer for data gathering, analyzing and printing. We will start with different types of variables available to us.

Variable types

D scripting language has 3 types of variables which you can use to keep track of things.

Block local

You can define a block local variable with :

  this->currentTime = timestamp;

Scope of a block local variable is limited to the block in which it is defined.

Global scalar variables

You can define a global variable like:

  startTime = timestamp;

Variables defined like this are global and they are available in all the probe descriptions. Also, if you haven’t guessed it - BEGIN and END[2] are two special probes that fire at the beginning of a DTrace script and at the end of it. They can be very useful for initializing things or printing summary results at the end of execution.

Thread local variables

If you are trying to profile a multi-threaded application you will want to keep DTrace variables separate for different threads. You can do that with thread local variables:

  self->currentTime = timestamp;

Variables declared like this won’t be shared between threads and hence can be used reliably.

Associative Arrays

If you are coming from Ruby, think of associative array as a Hash where that keys can be a single value or a tuple of values. In either case once declared type of key shouldn’t change. For example, following script will calcuate time spent by each process in a system call.

  callTime[execname, probefunc] = timestamp;

syscall:::return {
    this->timeElapsed = timestamp - callTime[execname,probefun];

You can learn more about these different kind of variables at[1].


Armed with knowledge of variables and collections, we can finally dip our toes in aggregates. Aggregates is a way of gathering data from individual probes without you the programmer doing actual work of summing things up,calculating the median, frequency of occurence etc.

For example, lets write a DTrace script that prints number of top 20 objects Rails creates when it boots up. You are obviously going to need Ruby 2.0 for following script to work.

/* rails_boot.d
 * object-create is the name of the probe that fires 
 * when any new ruby object is created. 
 * Corresponding function called is - rb_obj_alloc
   /* If you look up code of rb_obj_alloc, first argument to 
    * function contains class which is being instantiated
   @objects[copyinstr(arg0)] = count();

  /* Truncate the aggregate to top 20 results */
  /* Print the aggregate */

The output should look something like:

  Mime::Type                                                       22
  Psych::Nodes::Scalar                                             27
  Bundler::LazySpecification                                       39
  Set                                                              54
  Sass::Script::Functions::Signature                               55
  Rails::Paths::Path                                               71
  Class                                                            73
  Pathname                                                         74
  Bundler::DepProxy                                                77
  Rails::Initializable::Collection                                 93
  Regexp                                                           95
  Rails::Initializable::Initializer                               143
  Range                                                           332
  Gem::Specification                                              410
  Hash                                                            579
  Gem::Dependency                                                1660
  Gem::Version                                                   2416
  Gem::Requirement                                               3061
  String                                                         4723
  Array                                                          4948

Keep in mind, running above script isn’t as simple as running sudo dtrace -qs rails_boot.d -c 'rails s' . Depending on whether you are using RVM or rbenv all those shell scripts shims create a problem when running dtrace scripts. What I ended up doing is, created a file called load_env.rb that had just one line require_relative config/environment and then ran the Dtrace script with:

sudo dtrace -qs rails_boot.d -c "/Users/gnufied/.rbenv/versions/20/bin/ruby load_env.rb" 

Now there are some new keywords in above script and we will to get them in a bit, but first let us talk about aggregates. DTrace allows you to store result of aggregate functions in objects called aggregates, they are defined like:

@name[ keys ] = aggfunc ( args );

Aggregate object name starts with @ and behave very similar to associative arrays and hence can use tuples as keys. Using aggregate functions frees you up from keeping counter yourself. In above script we are using count function which keeps track of number of times that particular probe fired. There are other aggergate functions available as well:

You can find out more about aggregates from [3].

That about sums up content for part 2 of DTrace tutorial. In the next part we are going to cover in built functions, printing and profiling of real world Ruby applications. You should also follow us on twitter @codemancershq if you don’t want to miss out Part 3.


Profile Ruby 2.0 apps using DTrace - Part 1

Hemant  - April 16, 2013  |   , , ,

This is a series of blog posts we are going to do in tutorial format, this is Part 1 of our tutorial on Dtrace.

DTrace is awesome. If your operating system supports it, using DTrace can give incredible insight into processes that are running on your system. If you are still not convinced about its usefullness you should checkout - How Dtrace helped twitter identify bottlenecks, Profiling Rails startup with Dtrace.

So what exactly is DTrace? According to Wikipedia - “DTrace is a dynamic tracing framework that allows users to tune and troubleshoot applications and the OS itself.” It is available on Solaris, Mac OSX, FreeBSD and recent versions of Oracle Linux. If you are running any of the above operating systems, there is nothing to install - just hop in.

Dtrace allows you to instrument a live process by defining trace points called probes, which can be turned on or off dynamically. The probes fire when a particular condition is met and it requires no change in application code, no special flags and there is very little penalty of running a production app with probes turned on.

What is a probe?

At its core a probe is like a light bulb that switches on when a condition is met. The condition could be entering a C function defined within a module, exiting a C function, entering a system call provided by Kernel etc. A probe is made of 5 parts,

All the processes currently running on operating system (including the kernel itself) have some probes defined. As an example, lets try and list all the probes defined by Ruby 2.0

~> sudo dtrace -l | grep ruby

Ruby 2.0 probes

Running Dtrace

Dtrace probes can be set and run via the command line itself or can be run via Dtrace scripts written in a programming language called D.

Some simple command line examples:

~> sudo dtrace -f write

Above probe will start firing each time any application makes write system call.

Dtrace scripting capabilities

The real power of Dtrace lies in ability to script - tracing of running applications, gathering results and then printing or plotting them.

A DTrace script is essentially made of 3 parts:

probe description

D language has some similarity with C but at its core it is a scripting language, probably closer to awk etc.

For example, lets write a D script for printing all system calls being made by currently running applications.

~> cat > dtrace_scripts/syscall.d
  printf("%s(%d) called %s\n", execname, pid, probefunc)

Above script when ran via sudo dtrace -qs syscall.d will start printing name,pid of processes that are making system calls and what system calls they are making. The script does not have predicate, but only probe description.

Probe description

A probe description can be written as:


Few simple examples:

# A probe that will fire each time a system call called write is entered
# This probe will fire each time someone creates a new ruby array

As you can see, you can skip parts of the probe and can use wildcards such as * or ?.

That about sums up content for part 1 of DTrace tutorial. In the next part we are going to cover predicates, aggregates and we will do profiling of real Ruby applications.I will recommend you to play with various probes and stay tuned for Part 2 which is coming up very soon. You should also follow us on twitter @codemancershq if you don’t want to miss out Part 2.