Understanding Exit Signals in Erlang/Elixir

"Process linking and how processes send and handle exit signals" - a very important topic to understand in order to build robust apps in Erlang or Elixir, but also the source of a lot of confusions for beginners. In this post, we'll cover this topic and understand it well, once and for all.

Note: Processes are the same in both Erlang and Elixir, so everything below is equally applicable to both languages.

Processes

Processes in Erlang are like threads that don't share any data. These processes are VM level, so don't confuse them with OS processes. These VM level processes are scheduled for execution by the Erlang VM, like how the OS schedules the OS level processes for execution. Since these Erlang processes own their own data, they are scheduled freely on all available CPUs and this is how Erlang makes concurrency easy for developers.

processes

But a bunch of processes that run in isolation are rarely of any use. To build anything useful, processes need to work together by communicating with each other.

Messages

In Erlang processes communicate among themselves by message passing. Every process has one mailbox. A process can read messages that appear in its mailbox and can also send messages to mailboxes that belong to other processes. This way, processes can communicate with each other without having to share any data. This frees developers from writing locks around data access when writing code that may run concurrently.

messaging

Apart from these regular messages, processes also communicate using "exit signals", a special type of message.

Exit Signals

Processes have a signalling mechanism by which they can let other processes know that they are exiting. These "exit signals" also contain an exit reason, which help other processes to decide how to react to the signal.

A process can terminate for 3 reasons:

A normal exit - This happens when a process is done with its job and ends execution. Since these exits are normal, usually nothing needs to be done when they happen. So these signals are usually ignored, but they are emitted anyway for the sake of interested processes. The exit reason for this kind of exit is the atom :normal.
Because of unhandled errors - This happens when an exception is raised inside the process and not caught. A pattern matching error is an example - a technique used by Erlang programmers to "fail fast". The exit reason for this kind of exit is the exception details - name of the exception and some stack trace.
Forcefully killed - This happens when another process sends an exit signal with the reason :kill, which forces the receiving process to terminate.

A process can subscribe to another process's exit signal by establishing a "link" with that process. When a process terminates, all the linked processes receive the exit signal from the terminating process.

links

exit_signals

The force-kill signals, the ones with exit reason :kill, will terminate the receiving process no matter what. But the other kinds of exit signals - those with reasons :normal or any other reason - can cause different effects on the receiving process depending on whether the receiving process is trapping exits or not. Let's see how trapping of exit signals works.

Trapping exits

When a process traps exit signals, the exit signals that are received from the links will be converted into messages which are then put inside the mailbox that belongs to the process. Here's how a process can trap exits in Elixir:

Process.flag(:trap_exit, true)

receive do
  {:EXIT, from, reason} ->
    # Handle exit as needed

When this process receives an exit signal other than :kill signal, it will be converted into a message that will be received inside the receive block. In Erlang/Elixir, this is what makes supervisor trees possible.

A supervisor is a process whose responsibility is to start child processes and keep them alive by restarting them if necessary. Let's see how a supervisor does that. If you look at the source code of supervisor in Erlang, you can see that the first thing that happens in the init function is trapping exit signals:

init(...) ->
    process_flag(trap_exit, true),
    ...

This means that exit signals from child processes will be converted into messages. The supervisor then handles these messages by restarting the child processes, based on the restart strategy of the supervisor. This is how you write fault tolerant apps in Elixir or Erlang - you let your processes fail fast, and the supervisor that spawned these processes will make sure they are restarted.

So if processes can trap exit signals, how is it possible to kill them? Using the :kill exit signal, of course. The exit reason :kill is used to forcefully terminate the receiving process even if it's trapping exits.

In Elixir, this is how you kill a process using its pid:

Process.exit(pid, :kill)

A process can also send an exit signal to itself using:

Process.exit(self(), <reason>)

The process responds to this signal from self in a similar manner it would respond to an exit signal it receives from another process, but with one exception. If a process sends itself an exit signal with the reason :normal, the process terminates and when it does, it sends a :normal exit signal to all linked processes.

Recap

:normal exit signals are harmless. These are ignored by the receiving process unless trapping exits, in which case, these will be received as messages. If it's sent by self, it will cause the process to terminate with a :normal exit reason.
:kill exit signals always result in the termination of the receiving process.
Exit signals with other reasons will terminate the receiving process unless trapping exits, in which case, these will be received as messages.

Here's a cheatsheet for your reference (click to enlarge) :

Thanks for reading! If you would like to get updates about subsequent blog posts about Elixir from Codemancers, do follows us on twitter: @codemancershq.