GH-77265: Document NaN handling in statistics functions that sort or count (#94676)

* Document NaN handling in functions that sort or count

* Update Doc/library/statistics.rst

Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com>

* Update Doc/library/statistics.rst

Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com>

* Fix trailing whitespace and rewrap text

Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com>
This commit is contained in:
Raymond Hettinger 2022-07-10 02:40:27 -05:00 committed by GitHub
parent 264b3ddfd5
commit ef61b259e3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 29 additions and 0 deletions

View File

@ -35,6 +35,35 @@ and implementation-dependent. If your input data consists of mixed types,
you may be able to use :func:`map` to ensure a consistent result, for
example: ``map(float, input_data)``.
Some datasets use ``NaN`` (not a number) values to represent missing data.
Since NaNs have unusual comparison semantics, they cause surprising or
undefined behaviors in the statistics functions that sort data or that count
occurrences. The functions affected are ``median()``, ``median_low()``,
``median_high()``, ``median_grouped()``, ``mode()``, ``multimode()``, and
``quantiles()``. The ``NaN`` values should be stripped before calling these
functions::
>>> from statistics import median
>>> from math import isnan
>>> from itertools import filterfalse
>>> data = [20.7, float('NaN'),19.2, 18.3, float('NaN'), 14.4]
>>> sorted(data) # This has surprising behavior
[20.7, nan, 14.4, 18.3, 19.2, nan]
>>> median(data) # This result is unexpected
16.35
>>> sum(map(isnan, data)) # Number of missing values
2
>>> clean = list(filterfalse(isnan, data)) # Strip NaN values
>>> clean
[20.7, 19.2, 18.3, 14.4]
>>> sorted(clean) # Sorting now works as expected
[14.4, 18.3, 19.2, 20.7]
>>> median(clean) # This result is now well defined
18.75
Averages and measures of central location
-----------------------------------------