Pelage: Defensive analysis for Polars
  • Get started
  • API Reference
  • Examples
  • Coming from dbt
  • Git
  1. Checks with group_by
  2. not_null_proportion
  • API Reference
  • Check functions
    • has_columns
    • has_dtypes
    • has_no_nulls
    • has_no_infs
    • unique
    • unique_combination_of_columns
    • accepted_values
    • not_accepted_values
    • accepted_range
    • maintains_relationships
    • column_is_within_n_std
    • custom_check
  • Checks with group_by
    • has_shape
    • at_least_one
    • not_constant
    • not_null_proportion
    • has_mandatory_values
    • mutually_exclusive_ranges
    • is_monotonic
  • Exceptions
    • PolarsAssertError

On this page

  • not_null_proportion
    • Parameters
    • Returns
    • Examples
  1. Checks with group_by
  2. not_null_proportion

not_null_proportion

checks.not_null_proportion(data, items, group_by=None)

Checks that the proportion of non-null values in a column is within a a specified range [at_least, at_most] where at_most is an optional argument (default: 1.0).

Parameters

data: PolarsLazyOrDataFrame

description

items: Dict[str, float | Tuple[float, float]]

Ranges for the proportion of not null values for selected columns.

Any of the following formats is valid:

{
    "column_name_a" : 0.33,
    "column_name_b" : (0.25, 0.44),
    "column_name_c" : (0.25, 1.0),
    ...
}

When specifying a single float, the higher bound of the range will automatically be set to 1.0, i.e. (given_float, 1.0)

group_by: Optional[PolarsOverClauseInput] = None

When specified perform the check per group instead of the whole column, by default None

Returns

Type Description
PolarsLazyOrDataFrame The original polars DataFrame or LazyFrame when the check passes

Examples

>>> import polars as pl
>>> import pelage as plg
>>> df = pl.DataFrame(
...         {
...             "a": [1, None, None],
...             "b": [1, 2, None],
...         }
...     )
>>> df.pipe(plg.not_null_proportion, {"a": 0.33, "b": 0.66})
shape: (3, 2)
┌──────┬──────┐
│ a    ┆ b    │
│ ---  ┆ ---  │
│ i64  ┆ i64  │
╞══════╪══════╡
│ 1    ┆ 1    │
│ null ┆ 2    │
│ null ┆ null │
└──────┴──────┘
>>> df.pipe(plg.not_null_proportion, {"a": 0.7})
Traceback (most recent call last):
...
pelage.checks.PolarsAssertError: Details
shape: (1, 4)
┌────────┬───────────────────┬──────────┬──────────┐
│ column ┆ not_null_fraction ┆ min_prop ┆ max_prop │
│ ---    ┆ ---               ┆ ---      ┆ ---      │
│ str    ┆ f64               ┆ f64      ┆ i64      │
╞════════╪═══════════════════╪══════════╪══════════╡
│ a      ┆ 0.333333          ┆ 0.7      ┆ 1        │
└────────┴───────────────────┴──────────┴──────────┘
Error with the DataFrame passed to the check function:
--> Some columns contains a proportion of nulls beyond specified limits

The folloing example details how to perform this checks for groups:

>>> group_df = pl.DataFrame(
...     {
...         "a": [1, 1, None, None],
...         "group": ["A", "A", "B", "B"],
...     }
... )
>>> group_df.pipe(plg.not_null_proportion, {"a": 0.5})
shape: (4, 2)
┌──────┬───────┐
│ a    ┆ group │
│ ---  ┆ ---   │
│ i64  ┆ str   │
╞══════╪═══════╡
│ 1    ┆ A     │
│ 1    ┆ A     │
│ null ┆ B     │
│ null ┆ B     │
└──────┴───────┘
>>> group_df.pipe(plg.not_null_proportion, {"a": 0.5}, group_by="group")
Traceback (most recent call last):
...
pelage.checks.PolarsAssertError: Details
shape: (1, 5)
┌───────┬────────┬───────────────────┬──────────┬──────────┐
│ group ┆ column ┆ not_null_fraction ┆ min_prop ┆ max_prop │
│ ---   ┆ ---    ┆ ---               ┆ ---      ┆ ---      │
│ str   ┆ str    ┆ f64               ┆ f64      ┆ i64      │
╞═══════╪════════╪═══════════════════╪══════════╪══════════╡
│ B     ┆ a      ┆ 0.0               ┆ 0.5      ┆ 1        │
└───────┴────────┴───────────────────┴──────────┴──────────┘
Error with the DataFrame passed to the check function:
--> Some columns contains a proportion of nulls beyond specified limits
not_constant
has_mandatory_values