Pelage: Defensive analysis for Polars
  • Get started
  • API Reference
  • Examples
  • Coming from dbt
  • Git
  1. Checks with group_by
  2. unique
  • API Reference
  • Check functions
    • has_columns
    • has_dtypes
    • has_no_nulls
    • has_no_infs
    • unique_combination_of_columns
    • accepted_values
    • not_accepted_values
    • accepted_range
    • maintains_relationships
    • column_is_within_n_std
    • custom_check
  • Checks with group_by
    • at_least_one
    • has_mandatory_values
    • has_shape
    • is_monotonic
    • mutually_exclusive_ranges
    • not_constant
    • not_null_proportion
    • unique
  • Exceptions
    • PolarsAssertError

On this page

  • unique
    • Parameters
    • Returns
    • Examples
  1. Checks with group_by
  2. unique

unique

unique(data, columns=None, group_by=None)

Check if there are no duplicated values in each one of the selected columns.

This is a column oriented check, where each columns are search independently for duplicated values. For a row oriented check see unique_combination_of_columns

Parameters

data: PolarsLazyOrDataFrame

The input DataFrame to check for unique values.

columns: Optional[PolarsColumnType] = None

Columns to consider for uniqueness check. By default, all columns are checked.

group_by: Optional[PolarsOverClauseInput] = None

Use this option to ensure uniqueness with data segmented by group. by default None

Returns

Name Type Description
PolarsLazyOrDataFrame The original polars DataFrame or LazyFrame when the check passes

Examples

>>> import polars as pl
>>> import pelage as plg
>>> df = pl.DataFrame({"a": [1, 2]})
>>> df.pipe(plg.unique, "a")  # Can also use ["a", ...], pl.col("a)
shape: (2, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
└─────┘
>>> df = pl.DataFrame({"a": [1, 1, 2]})
>>> df.pipe(plg.unique, "a")
Traceback (most recent call last):
...
pelage.types.PolarsAssertError: Details
shape: (2, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 1   │
└─────┘
Error with the DataFrame passed to the check function:
--> Somes values are duplicated within the specified columns

Below are examples with group_by option:

>>> df = pl.DataFrame(
...     [
...         [1, 1, 1],
...         [1, 1, 2],
...     ],
...     schema=["col1", "col2", "group"],
...     orient="row",
... )
>>> df.pipe(plg.unique, ["col1", "col2"], group_by="group")
shape: (2, 3)
┌──────┬──────┬───────┐
│ col1 ┆ col2 ┆ group │
│ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ i64   │
╞══════╪══════╪═══════╡
│ 1    ┆ 1    ┆ 1     │
│ 1    ┆ 1    ┆ 2     │
└──────┴──────┴───────┘
>>> df = pl.DataFrame(
...     [
...         [1, 1, 1],
...         [1, 1, 1],
...         [1, 1, 2],
...     ],
...     schema=["col1", "col2", "group"],
...     orient="row",
... )
>>> df.pipe(plg.unique, ["col1", "col2"], group_by="group")
Traceback (most recent call last):
...
pelage.types.PolarsAssertError: Details
shape: (2, 3)
┌──────┬──────┬───────┐
│ col1 ┆ col2 ┆ group │
│ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ i64   │
╞══════╪══════╪═══════╡
│ 1    ┆ 1    ┆ 1     │
│ 1    ┆ 1    ┆ 1     │
└──────┴──────┴───────┘
Error with the DataFrame passed to the check function:
--> Somes values are duplicated within the specified columns
not_null_proportion
PolarsAssertError