Pelage: Defensive analysis for Polars
  • Get started
  • API Reference
  • Examples
  • Coming from dbt
  • Git
  1. Checks with group_by
  2. has_mandatory_values
  • API Reference
  • Check functions
    • has_columns
    • has_dtypes
    • has_no_nulls
    • has_no_infs
    • unique
    • unique_combination_of_columns
    • accepted_values
    • not_accepted_values
    • accepted_range
    • maintains_relationships
    • column_is_within_n_std
    • custom_check
  • Checks with group_by
    • has_shape
    • at_least_one
    • not_constant
    • not_null_proportion
    • has_mandatory_values
    • mutually_exclusive_ranges
    • is_monotonic
  • Exceptions
    • PolarsAssertError

On this page

  • has_mandatory_values
    • Parameters
    • Returns
    • Examples
  1. Checks with group_by
  2. has_mandatory_values

has_mandatory_values

checks.has_mandatory_values(data, items, group_by=None)

Ensure that all specified values are present in their respective column.

Parameters

data: PolarsLazyOrDataFrame

Polars DataFrame or LazyFrame containing data to check.

items: Dict[str, list]

A dictionnary where the keys are the columns names and the values are lists that contains all the required values for a given column.

group_by: Optional[PolarsOverClauseInput] = None

When specified perform the check per group instead of the whole column, by default None

Returns

Type Description
PolarsLazyOrDataFrame The original polars DataFrame or LazyFrame when the check passes

Examples

>>> import polars as pl
>>> import pelage as plg
>>> df = pl.DataFrame({"a": [1, 2]})
>>> df.pipe(plg.has_mandatory_values, {"a": [1, 2]})
shape: (2, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
└─────┘
>>> df.pipe(plg.has_mandatory_values, {"a": [3, 4]})
Traceback (most recent call last):
...
pelage.checks.PolarsAssertError: Details
Error with the DataFrame passed to the check function:
--> Missing mandatory values in the following columns: {'a': [3, 4]}

The folloing example details how to perform this checks for groups:

>>> group_df_example = pl.DataFrame(
...     {
...         "a": [1, 1, 1, 2],
...         "group": ["G1", "G1", "G2", "G2"],
...     }
... )
>>> group_df_example.pipe(plg.has_mandatory_values, {"a": [1, 2]})
shape: (4, 2)
┌─────┬───────┐
│ a   ┆ group │
│ --- ┆ ---   │
│ i64 ┆ str   │
╞═════╪═══════╡
│ 1   ┆ G1    │
│ 1   ┆ G1    │
│ 1   ┆ G2    │
│ 2   ┆ G2    │
└─────┴───────┘
>>> group_df_example.pipe(plg.has_mandatory_values, {"a": [1, 2]}, group_by="group")
Traceback (most recent call last):
...
pelage.checks.PolarsAssertError: Details
shape: (1, 3)
┌───────┬───────────┬────────────────┐
│ group ┆ a         ┆ a_expected_set │
│ ---   ┆ ---       ┆ ---            │
│ str   ┆ list[i64] ┆ list[i64]      │
╞═══════╪═══════════╪════════════════╡
│ G1    ┆ [1]       ┆ [1, 2]         │
└───────┴───────────┴────────────────┘
Error with the DataFrame passed to the check function:
--> Some groups are missing mandatory values
not_null_proportion
mutually_exclusive_ranges