Initial requirements

Delivery Guarantees and Scale

The system is designed to run on a single server so the woes of networking have not been designed around. Beside, the incoming flow is made of many small events, the individual contribution of each on the final outcome is assumed to be negligible. In case of overloading we want back-pressure to be applied and incoming messages being delayed/rejected but not lost once accepted.

Single or few servers only.

Simplicity

Ramen should be programmable through a data manipulation language as declarative (as opposed to procedural) and as familiar as possible.

It was initially considered that the best trade of between simplicity and efficiency would be to use an actual programming language with a syntax and a library of functions tailored toward stream processing (as Riemann does for instance with Clojure), but the prototype has proven too limited: First, speed would have to be sacrificed (regardless of what language we would use to embed Ramen in), and then it was constraining how we could distribute processing amongst several processes or servers.

Eventually it was decided to implement a SQL like language that's less demanding from users, more flexible to our ever changing requirements and that Ramen is free to compile into any combination of programs/threads/functions as is deemed desirable.

Performances

The system must be able to handle about 500k ops/sec/server.

  • The operations must be compiled down to machine code rather than being interpreted;
  • Operations must run in parallel in different threads/processes;
  • Event transmission along the stream must be as direct as possible; in particular, there must be no central message broker;
  • It is not possible to do only direct function calls due to the fact that different operations will run on different threads most of the time;
  • (de)serialization of events must not be required;
  • In case of overcapacity back-pressure should slows down the input flow rather than lead to loss of events.

Therefore we need some kind of per operation lock-less input queue in shared memory; Ramen uses ring-buffers.

Currently Ramen generates native code via the OCaml compiler, therefore the generated code uses garbage collection and uses boxed values that need to be serialized/deserialized out of the ringbuffers.

The ringbuffers should be dynamically sized but are still constant sized at the moment.

Direct function calls are not supported yet as a message passing mechanism.

Support for more than one machine is not supported yet.

All of the above to be addressed in the future.

Versatility

Although focusing on monitoring and alerting, as little constraints as possible should be imposed on the input stream. In particular, incoming events can describe anything, and might have undergone some aggregation already, and come with several time stamps attached.

  • Events can be of any type reachable from the base types and the compound types;
  • Events can have a start and end time, which are both taken into account when extracting time series;
  • How to build these event-time from actual event is part of the schema so incur no cost;

Ramen supports tuples, arrays and lists as compound types but is still missing records (basically just tuples with syntactic sugar for named fields). Also, events are constrained to be tuples but that will be alleviated at a later stage.

Remembering past values

We also want to be able to use Ramen for troubleshooting/capacity planning/etc so it must be able to answer possibly new queries on past data.

  • Given the total storage space available and a desired retention of some key stages in the stream processing, Ramen allocates the storage space in order to optimize the processing of future queries;
  • Every new query can be run either on the live stream of data or on a past time range or both on the recent history and then on live data;
  • Data is stored either in uncompressed form (incurs little additional processing cost) or in compressed form (ORC file);
  • Any new query branched off the query tree after a save point can be sent archived data;

Transitioning from past to live is yet to be implemented. Also ORC support is still to be done.

Batteries included

Ramen should come with all the necessary components required to build a small monitoring solution.

  • Ramen should accept mere CSV files as input, in addition to fancier interfaces.
  • For dashboarding, it should be easy to use Grafana by implementing the Graphite API.
  • Ramen should be able to connect to an external alert management system such as the alertmanager of Prometheus, but must also be able to perform "the last mile" of alert delivery out of the box for simplicity.

The only possible ways to inject data at the moment are CSV files, collectd or netflow protocols, and direct ringbuffer writes. More obviously need to be implemented, popular message queue being on top of the list.