Pelage: Defensive analysis for Polars
  • Get started
  • API Reference
  • Examples
  • Coming from dbt
  • Git
  1. Check functions
  2. column_is_within_n_std
  • API Reference
  • Check functions
    • has_columns
    • has_dtypes
    • has_no_nulls
    • has_no_infs
    • unique
    • unique_combination_of_columns
    • accepted_values
    • not_accepted_values
    • accepted_range
    • maintains_relationships
    • column_is_within_n_std
    • custom_check
  • Checks with group_by
    • has_shape
    • at_least_one
    • not_constant
    • not_null_proportion
    • has_mandatory_values
    • mutually_exclusive_ranges
    • is_monotonic
  • Exceptions
    • PolarsAssertError

On this page

  • column_is_within_n_std
    • Parameters
    • Returns
    • Examples
  1. Check functions
  2. column_is_within_n_std

column_is_within_n_std

checks.column_is_within_n_std(data, items, *args)

Function asserting values are within a given STD range, thus ensuring the absence of outliers.

Parameters

data: PolarsLazyOrDataFrame

Polars DataFrame or LazyFrame containing data to check.

items: Tuple[PolarsColumnType, int]

A column name / column type with the number of STD authorized for the values within. Must be of the following form: (col_name, n_std)

Returns

Type Description
PolarsLazyOrDataFrame The original polars DataFrame or LazyFrame when the check passes

Examples

>>> import polars as pl
>>> import pelage as plg
>>> df = pl.DataFrame(
...     {
...         "a": list(range(0, 11)),
...         "b": list(range(0, 11)),
...         "c": list(range(0, 10)) + [5000],
...     }
... )
>>> df.pipe(plg.column_is_within_n_std, ("a", 2), ("b", 3))
shape: (11, 3)
┌─────┬─────┬──────┐
│ a   ┆ b   ┆ c    │
│ --- ┆ --- ┆ ---  │
│ i64 ┆ i64 ┆ i64  │
╞═════╪═════╪══════╡
│ 0   ┆ 0   ┆ 0    │
│ 1   ┆ 1   ┆ 1    │
│ 2   ┆ 2   ┆ 2    │
│ 3   ┆ 3   ┆ 3    │
│ 4   ┆ 4   ┆ 4    │
│ …   ┆ …   ┆ …    │
│ 6   ┆ 6   ┆ 6    │
│ 7   ┆ 7   ┆ 7    │
│ 8   ┆ 8   ┆ 8    │
│ 9   ┆ 9   ┆ 9    │
│ 10  ┆ 10  ┆ 5000 │
└─────┴─────┴──────┘
>>> df.pipe(plg.column_is_within_n_std, ("b", 2), ("c", 2))
Traceback (most recent call last):
...
pelage.checks.PolarsAssertError: Details
shape: (1, 1)
┌──────┐
│ c    │
│ ---  │
│ i64  │
╞══════╡
│ 5000 │
└──────┘
Error with the DataFrame passed to the check function:
--> There are some outliers outside the specified mean±std range
Impacted columns: ['c']
maintains_relationships
custom_check