Process registration

Published October 25, 2018 by Toran Billups

In part 4 of the url shortener adventure we add another layer to the architecture and learn about the role of process registration.

Part 4: The case for process registration

In the previous post we bolted on a Supervisor that would restart the worker process when it blew up. One problem we found was that our process lost all state when it was restarted.

    [{id, pid, type, module}] = Supervisor.which_children(EX.Supervisor)
    EX.Worker.put(pid, "x", "google.com")
    EX.Worker.get(pid, "x")
    Process.exit(pid, :kill)
    [{id, pid, type, module}] = Supervisor.which_children(EX.Supervisor)
    # notice the worker state is reset
    EX.Worker.get(pid, "x")
  

In a future post this state will be persisted to the file system but today we will add a new process to solve the immediate problem. Start by adding a module called `EX.Cache`. This process will cache the state of our application and provide it to the `EX.Worker` process on start up.

    defmodule EX.Cache do
      use GenServer
    
      def start_link(args) do
        GenServer.start_link(__MODULE__, :ok, args)
      end
    
      @impl GenServer
      def init(:ok) do
        {:ok, %{}}
      end
    
      def all(pid) do
        GenServer.call(pid, {:all})
      end
    
      def put(pid, key, value) do
        GenServer.cast(pid, {:put, key, value})
      end
    
      @impl GenServer
      def handle_call({:all}, _timeout, state) do
        {:reply, state, state}
      end
    
      @impl GenServer
      def handle_cast({:put, key, value}, state) do
        {:noreply, Map.put(state, key, value)}
      end
    end
  

To read more about caching process state in Elixir with ETS checkout my question about it on elixirforum.com. Thanks to @peerreynders for the ETS resource he linked.

To see this module in action we next write a unit test to exercise the cache process. You can run this from the command line using `mix test`.

    defmodule CacheTest do
      use ExUnit.Case, async: true
    
      setup do
        %{pid: start_supervised!(EX.Cache)}
      end
    
      test "all returns state and put updates it", %{pid: pid} do
        assert EX.Cache.all(pid) === %{}
    
        EX.Cache.put(pid, "x", "google.com")
    
        assert EX.Cache.all(pid) === %{"x" => "google.com"}
      end
    end
  

Before we spin up the application with IEx first consider how this new cache process will hydrate `EX.Worker`.

    defmodule EX.Worker do
      use GenServer
    
      @impl GenServer
      def init(:ok) do
        state = EX.Cache.all(:cache)
        {:ok, state, {:continue, :init}}
      end
    
      @impl GenServer
      def handle_continue(:init, state) do
        {:noreply, state}
      end
    
    end
  

A handful of new things worth talking about here. First it seems before OTP 21 lazy initialization was tricky because `start_link` is blocking so if you had some long running, async or otherwise expensive operation it would hold up the linking process. I found a concise blog post on the subject detailing how you can use the new `handle_continue` callback to avoid blocking on initialization.

Next you may have noticed I'm not using a `pid` to identify the cache process when calling the function `EX.Cache.all`. The short version is that from `EX.Worker` we don't know what the `pid` of `EX.Cache` is at runtime. And if we decide to add a supervision tree in the future that `pid` value becomes a moving target. This represents a big shift in thinking and opens the door to a wider topic centered around process registration.

From a high level the concept is fairly straight forward. Instead of relying on the `pid` value we will instead use a name, type or some other human friendly key to uniquely identify each process. So what all is required to adopt this new process lookup and how can we remain productive in Elixir as we do it?

As luck would have it Elixir v1.4 added a Registry that we can bolt on without much effort. In its simplest form this Registry is nothing more than a key-value store. Typically this process registry is used to handle all the bookkeeping involved with mapping names to `pid` values. And as `pid` values change over time, because of restarts for example, the Registry does all the monitoring and re-mapping necessary.

Circling all the way back to `EX.Cache.all` ...behind the scenes this Registry is doing a lookup of the cache process with the name `:cache` allowing us to be ignorant of the actual `pid` value. The first step to use this involves updating the `EX.Supervisor` module to ensure we start up and link the Registry itself.

    defmodule EX.Supervisor do
      use Supervisor
    
      def start_link(opts) do
        Supervisor.start_link(__MODULE__, :ok, opts)
      end
    
      def init(:ok) do
        children = [
          {Registry, keys: :unique, name: EX.Registry},
          EX.Cache,
          EX.Worker
        ]
        Supervisor.init(children, strategy: :one_for_one)
      end
    end
  

Next add a module called `EX.Registry` that will act as an extension point. The `via` function returns a universally accepted tuple that many Elixir developers use to track and lookup processes. Today we accept any arbitrary `name` but you can just as easily imagine a combination of values that help you identify a given process.

    defmodule EX.Registry do
      def via(name) do
        {:via, Registry, {__MODULE__, {name}}}
      end
    end
  

The next change is in the `EX.Cache` module itself. As we link up it's important to explicitly set the name so other processes can find it without having to know the `pid`.

    defmodule EX.Cache do
      use GenServer
    
      def start_link(_args) do
        GenServer.start_link(__MODULE__, :ok, name: via(:cache))
      end
    
      defp via(name), do: EX.Registry.via(name)
    end
  

The next change is in the `EX.Worker` module. Similar to the `EX.Cache` module when this process links up we set an explicit name.

    defmodule EX.Worker do
      use GenServer
    
      alias EX.Shortener
    
      def start_link(_args) do
        GenServer.start_link(__MODULE__, :ok, name: via(:worker))
      end
    
      defp via(name), do: EX.Registry.via(name)
    end
  

Now that all the infrastructure is functional we can use `EX.Cache` from the `EX.Worker` module with ease. In the `handle_cast` method we push any hash/url to the cache so if we restart `EX.Worker` the state is rehydrated in `handle_continue`.

    defmodule EX.Worker do
      use GenServer
    
      @impl GenServer
      def init(:ok) do
        state = EX.Cache.all(:cache)
        {:ok, state, {:continue, :init}}
      end
    
      @impl GenServer
      def handle_continue(:init, state) do
        {:noreply, state}
      end
    
      @impl GenServer
      def handle_cast({:put, hash, url}, state) do
        new_state = Shortener.create_short_url(state, hash, url)
        EX.Cache.put(:cache, hash, url)
        {:noreply, new_state}
      end
    end
  
Note: clearly we don't get a ton of value pushing this state into the `EX.Cache` module just yet but I wanted to understand the conceptual model at its simplest before we start expanding out.

To play around with this module in IEx run this command `iex -S mix run`

    [{id, pid, type, module}, _, _] = Supervisor.which_children(EX.Supervisor)
    EX.Worker.put(:worker, "x", "google.com")
    EX.Worker.get(:worker, "x")
    Process.exit(pid, :kill)
    # notice the worker state was rehydrated
    EX.Worker.get(:worker, "x")
  

I also did a refactor of the test suite to accomidate this but because I'm still not happy with it I'll save the details for another post. You can see the updated tests in the repository below for reference. Running `mix test` does produce a green build but I'm unsure how predictable, repeatable this is at the moment.

You can track my progress on github commit by commit. If you just want the code for this post checkout this commit.


Buy Me a Coffee

Twitter / Github / Email