Functions, Programs, Workers and the supervisor

Requirements

We want flexibility regarding parallelism, ideally the full spectrum of:

The output of a function is sent to another function running in a separate machine;
The output of a function is sent to another function running in the same machine but another process;
The output of a function is sent to another function running in the same process but another thread;
The output of a function is sent to another function running in the same thread, asynchronously;
The output of a function is sent to another function running in the same thread, synchronously.

In all these cases we want back-pressure in case of overcapacity.

The first case is required to spread Ramen over several machines. The last is the fastest one possible as we could go as far as directly calling the next operation as we call a local function.

Individual ring buffers per function has been chosen as a communication mechanism as it suits all of the above with little or no adjustment.

Programs and Functions

Although a stream processor usually consists of a tree of continuous queries, we wanted to be able to express local loops, despite it makes type checking and compilation harder.

So Ramen has both the notion of functions, that are individual operations turning a given stream of input tuples into a stream of output, and programs, which are a set of functions that are compiled, ran and stopped altogether. Within a program, loops are allowed and functions can be defined in any order. This is quite similar to compiling a normal program.

For a compiled program to be able to run though, all of its parents (programs which output are used anywhere in the new program being run) must be running already. This check is a bit similar to the linking stage of process that's being loaded.

To sum it up, users write Ramen programs, that are composed of several functions, and then run them. Each of the functions have its own input, output and name, which is the name of the program itself followed by a slash followed by the name of the function.

For instance, here is a short program demoing many of Ramen's features:

-- Single line comments start with a double dash as in SQL
PARAMETERS max_response_time DEFAULT 1.5s
       AND time_step DEFAULT 1m;
  -- Parameters can be overridden when starting a program.
  -- "1.5s" and "1m" mean "1.5 second" and "1 minute".

DEFINE web_resp_times AS
  SELECT
    min start AS start,
    max stop AS stop,
      -- By default, any field named start/stop are the start and stop unix
      -- timestamps of the described event.
    hostname,
    stop - start AS response_time
  FROM services/web/http_logs
    -- services/web/http_logs is the name of the function http_logs that's
    -- part of another program named services/web
  WHERE ip_server IN 192.168.42.0/24
  GROUP BY hostname
  COMMIT WHEN in.start > group.start + time_step + 15s;
    -- Assuming all events start timestamps are soft-ordered with max 15s
    -- of jitter, we can commit a group when we receive an input which
    -- start time is later than the end of the group by 15s.

DEFINE alert_on_slow_resp_times AS
  FROM web_resp_times
  WHERE response_time > max_response_time
  NOTIFY "Slow Response";

Compilation

To compile the above example program, Ramen needs to be given the name of the program (or it will have to infer it from the file name) and also where to find the binary of the program services/web where it will find the output type of the function http_logs, that it needs in order to type-check this program. It could either look for them in the running configuration or from a given path in the file-system.

The result of the compilation is going to be a single binary executable that embeds the code for all individual functions, as well as a few additional entry points to print various information such as the output type of the functions.

The running configuration is then nothing but a text file that specifies which program is supposed to be running (with what values for the parameters). ramen run merely edits this file. ramen supervisor is a daemon that will spawn one instance of the compiled binary for every of its functions, and connect them with the required ringbuffers.

Each instance of a compiled binary implementing a particular function is called a worker. So in the above example two workers will be run, one for each of the functions, each with its own input ringbuffer.

The supervisor daemon will also kill the workers that are no longer needed (again, ramen kill merely edits the running configuration).