Document special behaviour of ignore_malformed for geo_point mappings (#125692)
With `geo_point` fields, here is the special case of values that have a syntactically valid format, but the numerical values for `latitude` and `longitude` are out of range. If `ignore_malformed` is `false`, an exception will be thrown as usual. But if it is `true`, the document will be indexed correctly, by normalizing the latitude and longitude values into the valid range. The special `_ignored` field will not be set. The original source document will remain as before, but indexed values, doc-values and stored fields will all be normalized.
This commit is contained in:
parent
24909ca9df
commit
1f6518f371
|
@ -9,14 +9,23 @@ mapped_pages:
|
|||
|
||||
Fields of type `geo_point` accept latitude-longitude pairs, which can be used:
|
||||
|
||||
* to find geopoints within a [bounding box](/reference/query-languages/query-dsl/query-dsl-geo-bounding-box-query.md), within a certain [distance](/reference/query-languages/query-dsl/query-dsl-geo-distance-query.md) of a central point, or within a [`geo_shape` query](/reference/query-languages/query-dsl/query-dsl-geo-shape-query.md) (for example, points in a polygon).
|
||||
* to find geopoints within a [bounding box](/reference/query-languages/query-dsl/query-dsl-geo-bounding-box-query.md),
|
||||
within a certain [distance](/reference/query-languages/query-dsl/query-dsl-geo-distance-query.md) of a central point,
|
||||
or within a [`geo_shape` query](/reference/query-languages/query-dsl/query-dsl-geo-shape-query.md) (for example, points in a polygon).
|
||||
* to aggregate documents by [distance](/reference/aggregations/search-aggregations-bucket-geodistance-aggregation.md) from a central point.
|
||||
* to aggregate documents by geographic grids: either [`geo_hash`](/reference/aggregations/search-aggregations-bucket-geohashgrid-aggregation.md), [`geo_tile`](/reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) or [`geo_hex`](/reference/aggregations/search-aggregations-bucket-geohexgrid-aggregation.md).
|
||||
* to aggregate geopoints into a track using the metrics aggregation [`geo_line`](/reference/aggregations/search-aggregations-metrics-geo-line.md).
|
||||
* to aggregate documents by geographic grids: either
|
||||
[`geo_hash`](/reference/aggregations/search-aggregations-bucket-geohashgrid-aggregation.md),
|
||||
[`geo_tile`](/reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) or
|
||||
[`geo_hex`](/reference/aggregations/search-aggregations-bucket-geohexgrid-aggregation.md).
|
||||
* to aggregate geopoints into a track using the metrics aggregation
|
||||
[`geo_line`](/reference/aggregations/search-aggregations-metrics-geo-line.md).
|
||||
* to integrate distance into a document’s [relevance score](/reference/query-languages/query-dsl/query-dsl-function-score-query.md).
|
||||
* to [sort](/reference/elasticsearch/rest-apis/sort-search-results.md#geo-sorting) documents by distance.
|
||||
|
||||
As with [geo_shape](/reference/elasticsearch/mapping-reference/geo-shape.md) and [point](/reference/elasticsearch/mapping-reference/point.md), `geo_point` can be specified in [GeoJSON](http://geojson.org) and [Well-Known Text](https://docs.opengeospatial.org/is/12-063r5/12-063r5.html) formats. However, there are a number of additional formats that are supported for convenience and historical reasons. In total there are six ways that a geopoint may be specified, as demonstrated below:
|
||||
As with [geo_shape](/reference/elasticsearch/mapping-reference/geo-shape.md) and [point](/reference/elasticsearch/mapping-reference/point.md), `geo_point` can be specified in [GeoJSON](http://geojson.org)
|
||||
and [Well-Known Text](https://docs.opengeospatial.org/is/12-063r5/12-063r5.html) formats.
|
||||
However, there are a number of additional formats that are supported for convenience and historical reasons.
|
||||
In total there are six ways that a geopoint may be specified, as demonstrated below:
|
||||
|
||||
```console
|
||||
PUT my-index-000001
|
||||
|
@ -103,15 +112,28 @@ GET my-index-000001/_search
|
|||
::::{admonition} Geopoints expressed as an array or string
|
||||
:class: important
|
||||
|
||||
Please note that string geopoints are ordered as `lat,lon`, while array geopoints, GeoJSON and WKT are ordered as the reverse: `lon,lat`.
|
||||
Please note that string geopoints are ordered as `lat,lon`, while array
|
||||
geopoints, GeoJSON and WKT are ordered as the reverse: `lon,lat`.
|
||||
|
||||
The reasons for this are historical. Geographers traditionally write `latitude` before `longitude`, while recent formats specified for geographic data like [GeoJSON](https://geojson.org/) and [Well-Known Text](https://docs.opengeospatial.org/is/12-063r5/12-063r5.html) order `longitude` before `latitude` (easting before northing) in order to match the mathematical convention of ordering `x` before `y`.
|
||||
The reasons for this are historical. Geographers traditionally write `latitude`
|
||||
before `longitude`, while recent formats specified for geographic data like
|
||||
[GeoJSON](https://geojson.org/) and [Well-Known Text](https://docs.opengeospatial.org/is/12-063r5/12-063r5.html)
|
||||
order `longitude` before `latitude` (easting before northing) in order to match
|
||||
the mathematical convention of ordering `x` before `y`.
|
||||
|
||||
::::
|
||||
|
||||
|
||||
::::{note}
|
||||
A point can be expressed as a [geohash](https://en.wikipedia.org/wiki/Geohash). Geohashes are [base32](https://en.wikipedia.org/wiki/Base32) encoded strings of the bits of the latitude and longitude interleaved. Each character in a geohash adds additional 5 bits to the precision. So the longer the hash, the more precise it is. For the indexing purposed geohashs are translated into latitude-longitude pairs. During this process only first 12 characters are used, so specifying more than 12 characters in a geohash doesn’t increase the precision. The 12 characters provide 60 bits, which should reduce a possible error to less than 2cm.
|
||||
A point can be expressed as a [geohash](https://en.wikipedia.org/wiki/Geohash).
|
||||
Geohashes are [base32](https://en.wikipedia.org/wiki/Base32) encoded strings of
|
||||
the bits of the latitude and longitude interleaved. Each character in a geohash
|
||||
adds additional 5 bits to the precision. So the longer the hash, the more
|
||||
precise it is. For the indexing purposed geohashs are translated into
|
||||
latitude-longitude pairs. During this process only first 12 characters are
|
||||
used, so specifying more than 12 characters in a geohash doesn’t increase the
|
||||
precision. The 12 characters provide 60 bits, which should reduce a possible
|
||||
error to less than 2cm.
|
||||
::::
|
||||
|
||||
|
||||
|
@ -120,27 +142,54 @@ A point can be expressed as a [geohash](https://en.wikipedia.org/wiki/Geohash).
|
|||
The following parameters are accepted by `geo_point` fields:
|
||||
|
||||
[`ignore_malformed`](/reference/elasticsearch/mapping-reference/ignore-malformed.md)
|
||||
: If `true`, malformed geopoints are ignored. If `false` (default), malformed geopoints throw an exception and reject the whole document. A geopoint is considered malformed if its latitude is outside the range -90 ⇐ latitude ⇐ 90, or if its longitude is outside the range -180 ⇐ longitude ⇐ 180. Note that this cannot be set if the `script` parameter is used.
|
||||
: If `true`, malformed geopoints are ignored.
|
||||
If `false` (default), malformed geopoints throw an exception and reject the whole document.
|
||||
A geopoint is considered malformed if its latitude is outside the range -90 ⇐ latitude ⇐ 90,
|
||||
or if its longitude is outside the range -180 ⇐ longitude ⇐ 180.
|
||||
When set to `true`, if the format is valid, but the values are out of range,
|
||||
the values will be normalized into the valid range, and the document will be indexed.
|
||||
This is a special case, and a [different behaviour](/reference/elasticsearch/mapping-reference/ignore-malformed.md#_ignore_malformed_geo_point) from the normal for `ignore_malformed`.
|
||||
Note that this cannot be set if the `script` parameter is used.
|
||||
|
||||
`ignore_z_value`
|
||||
: If `true` (default) three dimension points will be accepted (stored in source) but only latitude and longitude values will be indexed; the third dimension is ignored. If `false`, geopoints containing any more than latitude and longitude (two dimensions) values throw an exception and reject the whole document. Note that this cannot be set if the `script` parameter is used.
|
||||
: If `true` (default) three dimension points will be accepted (stored in source)
|
||||
but only latitude and longitude values will be indexed; the third dimension is
|
||||
ignored. If `false`, geopoints containing any more than latitude and longitude
|
||||
(two dimensions) values throw an exception and reject the whole document. Note
|
||||
that this cannot be set if the `script` parameter is used.
|
||||
|
||||
[`index`](/reference/elasticsearch/mapping-reference/mapping-index.md)
|
||||
: Should the field be quickly searchable? Accepts `true` (default) and `false`. Fields that only have [`doc_values`](/reference/elasticsearch/mapping-reference/doc-values.md) enabled can still be queried, albeit slower.
|
||||
: Should the field be quickly searchable? Accepts `true` (default) and
|
||||
`false`. Fields that only have [`doc_values`](/reference/elasticsearch/mapping-reference/doc-values.md)
|
||||
enabled can still be queried, albeit slower.
|
||||
|
||||
[`null_value`](/reference/elasticsearch/mapping-reference/null-value.md)
|
||||
: Accepts an geopoint value which is substituted for any explicit `null` values. Defaults to `null`, which means the field is treated as missing. Note that this cannot be set if the `script` parameter is used.
|
||||
: Accepts a geopoint value which is substituted for any explicit `null` values.
|
||||
Defaults to `null`, which means the field is treated as missing. Note that this
|
||||
cannot be set if the `script` parameter is used.
|
||||
|
||||
`on_script_error`
|
||||
: Defines what to do if the script defined by the `script` parameter throws an error at indexing time. Accepts `fail` (default), which will cause the entire document to be rejected, and `continue`, which will register the field in the document’s [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) metadata field and continue indexing. This parameter can only be set if the `script` field is also set.
|
||||
: Defines what to do if the script defined by the `script` parameter
|
||||
throws an error at indexing time. Accepts `fail` (default), which
|
||||
will cause the entire document to be rejected, and `continue`, which
|
||||
will register the field in the document’s [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) metadata field and continue
|
||||
indexing. This parameter can only be set if the `script` field is
|
||||
also set.
|
||||
|
||||
`script`
|
||||
: If this parameter is set, then the field will index values generated by this script, rather than reading the values directly from the source. If a value is set for this field on the input document, then the document will be rejected with an error. Scripts are in the same format as their [runtime equivalent](docs-content://manage-data/data-store/mapping/map-runtime-field.md), and should emit points as a pair of (lat, lon) double values.
|
||||
: If this parameter is set, then the field will index values generated
|
||||
by this script, rather than reading the values directly from the
|
||||
source. If a value is set for this field on the input document, then
|
||||
the document will be rejected with an error.
|
||||
Scripts are in the same format as their [runtime equivalent](docs-content://manage-data/data-store/mapping/map-runtime-field.md), and should emit points
|
||||
as a pair of (lat, lon) double values.
|
||||
|
||||
|
||||
## Using geopoints in scripts [_using_geopoints_in_scripts]
|
||||
|
||||
When accessing the value of a geopoint in a script, the value is returned as a `GeoPoint` object, which allows access to the `.lat` and `.lon` values respectively:
|
||||
When accessing the value of a geopoint in a script, the value is returned as
|
||||
a `GeoPoint` object, which allows access to the `.lat` and `.lon` values
|
||||
respectively:
|
||||
|
||||
```painless
|
||||
def geopoint = doc['location'].value;
|
||||
|
@ -159,11 +208,17 @@ def lon = doc['location'].lon;
|
|||
## Synthetic source [geo-point-synthetic-source]
|
||||
|
||||
::::{important}
|
||||
Synthetic `_source` is Generally Available only for TSDB indices (indices that have `index.mode` set to `time_series`). For other indices synthetic `_source` is in technical preview. Features in technical preview may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
|
||||
Synthetic `_source` is Generally Available only for TSDB indices
|
||||
(indices that have `index.mode` set to `time_series`). For other indices
|
||||
synthetic `_source` is in technical preview. Features in technical preview may
|
||||
be changed or removed in a future release. Elastic will work to fix
|
||||
any issues, but features in technical preview are not subject to the support SLA
|
||||
of official GA features.
|
||||
::::
|
||||
|
||||
|
||||
Synthetic source may sort `geo_point` fields (first by latitude and then longitude) and reduces them to their stored precision. For example:
|
||||
Synthetic source may sort `geo_point` fields (first by latitude and then
|
||||
longitude) and reduces them to their stored precision. For example:
|
||||
|
||||
$$$synthetic-source-geo-point-example$$$
|
||||
|
||||
|
|
|
@ -59,7 +59,7 @@ The `ignore_malformed` setting is currently supported by the following [mapping
|
|||
: `date_nanos`
|
||||
|
||||
[Geopoint](/reference/elasticsearch/mapping-reference/geo-point.md)
|
||||
: `geo_point` for lat/lon points
|
||||
: `geo_point` for lat/lon points, although there is a [special case](#_ignore_malformed_geo_point) for out-of-range values
|
||||
|
||||
[Geoshape](/reference/elasticsearch/mapping-reference/geo-shape.md)
|
||||
: `geo_shape` for complex shapes like polygons
|
||||
|
@ -103,8 +103,21 @@ PUT my-index-000001
|
|||
|
||||
## Dealing with malformed fields [_dealing_with_malformed_fields]
|
||||
|
||||
Malformed fields are silently ignored at indexing time when `ignore_malformed` is turned on. Whenever possible it is recommended to keep the number of documents that have a malformed field contained, or queries on this field will become meaningless. Elasticsearch makes it easy to check how many documents have malformed fields by using `exists`,`term` or `terms` queries on the special [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) field.
|
||||
Malformed fields are silently ignored at indexing time when `ignore_malformed` is turned on.
|
||||
Whenever possible it is recommended to keep the number of documents that have a malformed field contained,
|
||||
or queries on this field will become meaningless.
|
||||
Elasticsearch makes it easy to check how many documents have malformed fields by using `exists`,
|
||||
`term` or `terms` queries on the special [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) field.
|
||||
|
||||
## The special case of `geo_point` fields [_ignore_malformed_geo_point]
|
||||
|
||||
With [`geo_point`](/reference/elasticsearch/mapping-reference/geo-point.md) fields,
|
||||
there is the special case of values that have a syntactically valid format,
|
||||
but the numerical values for `latitude` and `longitude` are out of range.
|
||||
If `ignore_malformed` is `false`, an exception will be thrown as usual. But if it is `true`,
|
||||
the document will be indexed correctly, by normalizing the latitude and longitude values into the valid range.
|
||||
The special [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) field will not be set.
|
||||
The original source document will remain as before, but indexed values, doc-values and stored fields will all be normalized.
|
||||
|
||||
## Limits for JSON Objects [json-object-limits]
|
||||
|
||||
|
|
Loading…
Reference in New Issue