Ramen Documentation
The system is designed to run on a single server so the woes of networking have not been designed around. Beside, the incoming flow is made of many small events, the individual contribution of each on the final outcome is assumed to be negligible. In case of overloading we want back-pressure to be applied and incoming messages being delayed/rejected but not lost once accepted.
Single or few servers only.
Ramen should be programmable through a data manipulation language as declarative (as opposed to procedural) and as familiar as possible.
It was initially considered that the best trade of between simplicity and efficiency would be to use an actual programming language with a syntax and a library of functions tailored toward stream processing (as Riemann does for instance with Clojure), but the prototype has proven too limited: First, speed would have to be sacrificed (regardless of what language we would use to embed Ramen in), and then it was constraining how we could distribute processing amongst several processes or servers.
Eventually it was decided to implement a SQL like language that's less demanding from users, more flexible to our ever changing requirements and that Ramen is free to compile into any combination of programs/threads/functions as is deemed desirable.
The system must be able to handle about 500k ops/sec/server.
Therefore we need some kind of per operation lock-less input queue in shared memory; Ramen uses ring-buffers.
Currently Ramen generates native code via the OCaml compiler, therefore the generated code uses garbage collection and uses boxed values that need to be serialized/deserialized out of the ringbuffers.
The ringbuffers should be dynamically sized but are still constant sized at the moment.
Direct function calls are not supported yet as a message passing mechanism.
Support for more than one machine is not supported yet.
All of the above to be addressed in the future.
Although focusing on monitoring and alerting, as little constraints as possible should be imposed on the input stream. In particular, incoming events can describe anything, and might have undergone some aggregation already, and come with several time stamps attached.
Ramen supports tuples, arrays and lists as compound types but is still missing records (basically just tuples with syntactic sugar for named fields). Also, events are constrained to be tuples but that will be alleviated at a later stage.
We also want to be able to use Ramen for troubleshooting/capacity planning/etc so it must be able to answer possibly new queries on past data.
Transitioning from past to live is yet to be implemented. Also ORC support is still to be done.
Ramen should come with all the necessary components required to build a small monitoring solution.
The only possible ways to inject data at the moment are CSV files, collectd or netflow protocols, and direct ringbuffer writes. More obviously need to be implemented, popular message queue being on top of the list.