Aggregate functions
This page describes the available functions to assist with performing aggregate calculations.
#
avgavg(value)
calculates simple average of values ignoring missing data (e.g
null
values).
Arguments:
value
is any numeric value.
Return value:
Return value type is double
.
Examples:
avg |
---|
22.4 |
cash_or_card | avg |
---|---|
cash | 22.1 |
card | 27.4 |
null | 18.02 |
#
countcount()
or count(*)
- counts rows irrespective of underlying data.
Arguments:
count
does not require arguments.
Return value:
Return value type is long
.
Examples:
- Count of rows in the transactions table.
count |
---|
100 |
- Count of rows in the transactions table aggregated by
payment_type
value.
cash_or_card | count |
---|---|
cash | 25 |
card | 70 |
null | 5 |
note
null
values are aggregated with count()
.
#
count_distinctcount_distinct(column_name)
- counts distinct values in string
, symbol
,
long256
, long
, or int
columns.
Return value:
Return value type is long
.
Examples:
- Count of distinct sides in the transactions table. Side column can either be
BUY
orSELL
ornull
count_distinct |
---|
2 |
- Count of distinct counterparties in the transactions table aggregated by
payment_type
value.
cash_or_card | count_distinct |
---|---|
cash | 3 |
card | 23 |
null | 5 |
note
null
values are not counted in the count_distinct
function.
#
first/lastfirst(column_name)
- returns the first value of a column.last(column_name)
- returns the last value of a column.
Supported column datatype: double
, float
, integer
, character
, short
,
byte
, timestamp
, date
, long
, geohash
.
If a table has a designated timestamp,
then the first row is always the row with the lowest timestamp (oldest) and the
last row is always the one with the highest (latest) timestamp. For a table
without a designated timestamp column, first
returns the first row and last
returns the last inserted row, regardless of any timestamp column.
Return value:
Return value type is string
.
Examples:
Given a table sensors
, which has a designated timestamp column:
device_id | temperature | ts |
---|---|---|
arduino-01 | 12 | 2021-06-02T14:33:19.970258Z |
arduino-02 | 10 | 2021-06-02T14:33:21.703934Z |
arduino-03 | 18 | 2021-06-02T14:33:23.707013Z |
The following query returns oldest value for the device_id
column:
first |
---|
arduino-01 |
The following query returns the latest symbol value for the device_id
column:
last |
---|
arduino-03 |
Without selecting a designated timestamp column, the table may be unordered and
the query may return different result. Given an unordered table
sensors_unordered
:
device_id | temperature | ts |
---|---|---|
arduino-01 | 12 | 2021-06-02T14:33:19.970258Z |
arduino-03 | 18 | 2021-06-02T14:33:23.707013Z |
arduino-02 | 10 | 2021-06-02T14:33:21.703934Z |
The following query returns the first record for the device_id
column:
first |
---|
arduino-01 |
The following query returns the last record for the device_id
column:
last |
---|
arduino-02 |
#
haversine_dist_deghaversine_dist_deg(lat, lon, ts)
- calculates the traveled distance for a
series of latitude and longitude points.
Arguments:
lat
is the latitude expressed as degrees in decimal format (double
)lon
is the longitude expressed as degrees in decimal format (double
)ts
is thetimestamp
for the data point
Return value:
Return value type is double
.
Examples:
#
ksumksum(value)
- adds values ignoring missing data (e.g null
values). Values
are added using the
Kahan compensated sum algorithm.
This is only beneficial for floating-point values such as float
or double
.
Arguments:
value
is any numeric value.
Return value:
Return value type is the same as the type of the argument.
Examples:
ksum |
---|
52.79143968514029 |
#
maxmax(value)
- returns the highest value ignoring missing data (e.g null
values).
Arguments:
value
is any numeric value
Return value:
Return value type is the same as the type of the argument.
Examples:
max |
---|
55.3 |
cash_or_card | amount |
---|---|
cash | 31.5 |
card | 55.3 |
null | 29.2 |
#
minmin(value)
- returns the lowest value ignoring missing data (e.g null
values).
Arguments:
value
is any numeric value
Return value:
Return value type is the same as the type of the argument.
Examples:
min |
---|
12.5 |
cash_or_card | min |
---|---|
cash | 12.5 |
card | 15.3 |
null | 22.2 |
#
nsumnsum(value)
- adds values ignoring missing data (e.g null
values). Values
are added using the
Neumaier sum algorithm.
This is only beneficial for floating-point values such as float
or double
.
Arguments:
value
is any numeric value.
Return value:
Return value type is double
.
Examples:
nsum |
---|
49.5442334742831 |
#
stddev_sampstddev_samp(value)
- calculates the sample standard deviation of values
ignoring missing data (e.g null
values).
Arguments:
value
is any numeric value.
Return value:
Return value type is double
.
Examples:
stddev_samp |
---|
29.011491975882 |
#
sumsum(value)
- adds values ignoring missing data (e.g null
values).
Arguments:
value
is any numeric value.
Return value:
Return value type is the same as the type of the argument.
Examples:
sum |
---|
100 |
item | count |
---|---|
apple | 53 |
orange | 47 |
#
Overflowsum
does not perform overflow check. To avoid overflow, you can cast the
argument to wider type.