Add docs for aggs filtering (#116681)

Add documentation for aggs filtering (the WHERE in STATS command).

Fixes: #115083
This commit is contained in:
Bogdan Pintea 2024-11-22 13:26:30 +01:00 committed by GitHub
parent d26a772f6e
commit 1fe3ed1e85
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 75 additions and 10 deletions

View File

@ -1,16 +1,18 @@
[discrete]
[[esql-stats-by]]
=== `STATS ... BY`
=== `STATS`
The `STATS ... BY` processing command groups rows according to a common value
The `STATS` processing command groups rows according to a common value
and calculates one or more aggregated values over the grouped rows.
**Syntax**
[source,esql]
----
STATS [column1 =] expression1[, ..., [columnN =] expressionN]
[BY grouping_expression1[, ..., grouping_expressionN]]
STATS [column1 =] expression1 [WHERE boolean_expression1][,
...,
[columnN =] expressionN [WHERE boolean_expressionN]]
[BY grouping_expression1[, ..., grouping_expressionN]]
----
*Parameters*
@ -28,14 +30,18 @@ An expression that computes an aggregated value.
An expression that outputs the values to group by.
If its name coincides with one of the computed columns, that column will be ignored.
`boolean_expressionX`::
The condition that must be met for a row to be included in the evaluation of `expressionX`.
NOTE: Individual `null` values are skipped when computing aggregations.
*Description*
The `STATS ... BY` processing command groups rows according to a common value
and calculate one or more aggregated values over the grouped rows. If `BY` is
omitted, the output table contains exactly one row with the aggregations applied
over the entire dataset.
The `STATS` processing command groups rows according to a common value
and calculates one or more aggregated values over the grouped rows. For the
calculation of each aggregated value, the rows in a group can be filtered with
`WHERE`. If `BY` is omitted, the output table contains exactly one row with
the aggregations applied over the entire dataset.
The following <<esql-agg-functions,aggregation functions>> are supported:
@ -90,6 +96,29 @@ include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues]
include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues-result]
|===
To filter the rows that go into an aggregation, use the `WHERE` clause:
[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=aggFiltering]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=aggFiltering-result]
|===
The aggregations can be mixed, with and without a filter and grouping is
optional as well:
[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=aggFilteringNoGroup]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=aggFilteringNoGroup-result]
|===
[[esql-stats-mv-group]]
If the grouping key is multivalued then the input row is in all groups:
@ -109,7 +138,7 @@ It's also possible to group by multiple values:
include::{esql-specs}/stats.csv-spec[tag=statsGroupByMultipleValues]
----
If the all grouping keys are multivalued then the input row is in all groups:
If all the grouping keys are multivalued then the input row is in all groups:
[source.merge.styled,esql]
----
@ -121,7 +150,7 @@ include::{esql-specs}/stats.csv-spec[tag=multi-mv-group-result]
|===
Both the aggregating functions and the grouping expressions accept other
functions. This is useful for using `STATS...BY` on multivalue columns.
functions. This is useful for using `STATS` on multivalue columns.
For example, to calculate the average salary change, you can use `MV_AVG` to
first average the multiple values per employee, and use the result with the
`AVG` function:

View File

@ -2348,6 +2348,42 @@ v:integer | job_positions:keyword
10094 | Accountant
;
docsStatsWithSimpleFiltering
required_capability: per_agg_filtering
// tag::aggFiltering[]
FROM employees
| STATS avg50s = AVG(salary)::LONG WHERE birth_date < "1960-01-01",
avg60s = AVG(salary)::LONG WHERE birth_date >= "1960-01-01"
BY gender
| SORT gender
// end::aggFiltering[]
| WHERE gender IS NOT NULL
;
// tag::aggFiltering-result[]
avg50s:long |avg60s:long |gender:keyword
55462 |46637 |F
48279 |44879 |M
// end::aggFiltering-result[]
;
docsStatsWithFilteringNoGroups
required_capability: per_agg_filtering
// tag::aggFilteringNoGroup[]
FROM employees
| EVAL Ks = salary / 1000 // thousands
| STATS under_40K = COUNT(*) WHERE Ks < 40,
inbetween = COUNT(*) WHERE 40 <= Ks AND Ks < 60,
over_60K = COUNT(*) WHERE 60 <= Ks,
total = COUNT(*)
// end::aggFilteringNoGroup[]
;
// tag::aggFilteringNoGroup-result[]
under_40K:long |inbetween:long |over_60K:long |total:long
36 |39 |25 |100
// end::aggFilteringNoGroup-result[]
;
statsWithFiltering
required_capability: per_agg_filtering