Detect the top contributors

this is an aggregate function. As such, it accepts a single operand which can wither be a scalar, in which case it will operate in turn on each item of the group, or an array or a vector in which case it will operate on each value in sequence and return the result (in practice, this process is delayed until the group is submitted for performance reasons)

User can choose to skip over NULL values or to include them in the computation with one of the modifiers SKIP NULLS to skip NULL values (the default) and KEEP NULLS to include them.

In the first case the result will still be NULL if all input values are NULL, and in the last case any NULL value will make the result NULL.

The other modifier tells whether the state used to compute the aggregate must be local (each group has its own independent state) or global (all groups share a single state). In general when using a GROUP-BY clause the former behavior is intended, and it is thus the default when an explicit GROUP-BY clause is present. Otherwise, the default is to use only one global state.

One can choose between those two with the modifier LOCALLY to force a group-wise state and GLOBALLY to force a global state.

This choice of the state lifespan is only meaningful when the operation is applied to a single scalar value, since the state required to compute the end result over a literal array or vector lives only as long as that computation.

Syntax

LIST TOP …int-expr… [ OVER …int-expr… ] …expr… [ BY …num-expr… ] [ FOR THE LAST …float-expr… ] [ ABOVE …num-expr… SIGMAS ]

IS …expr… IN TOP …int-expr… [ OVER …int-expr… ] [ BY …num-expr… ] [ FOR THE LAST …float-expr… ] [ ABOVE …num-expr… SIGMAS ]

RANK OF …expr… IN TOP …int-expr… [ OVER …int-expr… ] [ BY …num-expr… ] [ FOR THE LAST …float-expr… ] [ ABOVE …num-expr… SIGMAS ]

Typing

int, int, t, num, FLOAT, num -> t[]

t, int, int, num, FLOAT, num -> BOOL

t, int, int, num, FLOAT, num -> uint

Description

The TOP operation has several use cases. In each of those, the operator computes an estimation of the top N contributors C according to some metric W (weight).

In its simplest form, the operator merely returns that list of contributors.

But oftentimes one just want to know it some value is in the top, so the operator can then return a single boolean.

Finally, one might also want to know what rank, if any, some contributor occupies in the top, and in this last form the operator returns a nullable unsigned integer.

Parameters control not only the accuracy of the top approximation but also how quickly top contributors will fade to make room for newer ones in long running aggregations.

Such TOP operation may return insignificant contributors if not that many really big contributors exist. To filter those out, a last parameter sets a threshold as a multiple of the standard deviation of all the weights that any contributor must met to be considered note-worthy.