Pelage: Defensive analysis for Polars
  • Get started
  • API Reference
  • Examples
  • Coming from dbt
  • Git
  1. Check functions
  2. unique_combination_of_columns
  • API Reference
  • Check functions
    • has_columns
    • has_dtypes
    • has_no_nulls
    • has_no_infs
    • unique
    • unique_combination_of_columns
    • accepted_values
    • not_accepted_values
    • accepted_range
    • maintains_relationships
    • column_is_within_n_std
    • custom_check
  • Checks with group_by
    • has_shape
    • at_least_one
    • not_constant
    • not_null_proportion
    • has_mandatory_values
    • mutually_exclusive_ranges
    • is_monotonic
  • Exceptions
    • PolarsAssertError

On this page

  • unique_combination_of_columns
    • Parameters
    • Returns
    • Examples
  1. Check functions
  2. unique_combination_of_columns

unique_combination_of_columns

checks.unique_combination_of_columns(data, columns=None)

Ensure that the selected column have a unique combination per row.

This function is particularly helpful to establish the granularity of a dataframe, i.e. this is a row oriented check.

Parameters

data: PolarsLazyOrDataFrame

description

columns: Optional[PolarsColumnType] = None

Columns to consider for row unicity. By default, all columns are checked.

Returns

Type Description
PolarsLazyOrDataFrame The original polars DataFrame or LazyFrame when the check passes

Examples

>>> import polars as pl
>>> import pelage as plg
>>> df = pl.DataFrame({"a": ["a", "a"], "b": [1, 2]})
>>> df.pipe(plg.unique_combination_of_columns, ["a", "b"])
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 1   │
│ a   ┆ 2   │
└─────┴─────┘
>>> bad = pl.DataFrame({"a": ["X", "X"]})
>>> bad.pipe(plg.unique_combination_of_columns, "a")
Traceback (most recent call last):
...
pelage.checks.PolarsAssertError: Details
shape: (1, 2)
┌─────┬─────┐
│ a   ┆ len │
│ --- ┆ --- │
│ str ┆ u32 │
╞═════╪═════╡
│ X   ┆ 2   │
└─────┴─────┘
Error with the DataFrame passed to the check function:
--> Some combinations of columns are not unique. See above, selected: col("a")
unique
accepted_values