Reservoir sampling

this is an aggregate function. As such, it accepts a single operand which can wither be a scalar, in which case it will operate in turn on each item of the group, or an array or a vector in which case it will operate on each value in sequence and return the result (in practice, this process is delayed until the group is submitted for performance reasons)

User can choose to skip over NULL values or to include them in the computation with one of the modifiers SKIP NULLS to skip NULL values (the default) and KEEP NULLS to include them.

In the first case the result will still be NULL if all input values are NULL, and in the last case any NULL value will make the result NULL.

The other modifier tells whether the state used to compute the aggregate must be local (each group has its own independent state) or global (all groups share a single state). In general when using a GROUP-BY clause the former behavior is intended, and it is thus the default when an explicit GROUP-BY clause is present. Otherwise, the default is to use only one global state.

One can choose between those two with the modifier LOCALLY to force a group-wise state and GLOBALLY to force a global state.

This choice of the state lifespan is only meaningful when the operation is applied to a single scalar value, since the state required to compute the end result over a literal array or vector lives only as long as that computation.

Syntax

SAMPLE …unsigned-int-expr… …expr…

Typing

N:uint, t -> t[<=N]

Description

Build a random set of values of a given maximum size.

The first operand k is the maximum size of the set and the second is the value to be sampled.

If fewer than k values are received then the result will contain them all. If more than k values are received then the result will have exactly k values (not taking into consideration skipping over NULL values). Every received value has the same probability to be part of the resulting set.

This function comes handy to reduce the input size of a memory expensive operation such as a percentile computation.