is_monotonic
checks.is_monotonic(data, column, decreasing=False, strict=True, interval=None, group_by=None)
Verify that values in a column are consecutively increasing or decreasing.
Parameters
data: PolarsLazyOrDataFrame
-
Polars DataFrame or LazyFrame containing data to check.
column: str
-
Name of the column that should be monotonic.
decreasing: bool = False
-
Should the column be decreasing, by default False
strict: bool = True
-
The series must be stricly increasing or decreasing, no consecutive equal values are allowed, by default True
interval: Optional[Union[int, float, str, pl.Duration]] = None
-
For time-based column, the interval can be specified as a string as in the function
dt.offset_by
orpl.DataFrame().rolling
. It can also be specified with thepl.duration()
function directly in a more explicit manner.When using a string, the interval is dictated by the following string language:
- 1ns (1 nanosecond) - 1us (1 microsecond) - 1ms (1 millisecond) - 1s (1 second) - 1m (1 minute) - 1h (1 hour) - 1d (1 calendar day) - 1w (1 calendar week) - 1mo (1 calendar month) - 1q (1 calendar quarter) - 1y (1 calendar year) - 1i (1 index count)
By “calendar day”, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for “calendar week”, “calendar month”, “calendar quarter”, and “calendar year”.
By default None
group_by: Optional[PolarsOverClauseInput] = None
-
When specified, the monotonic characteristics and intervals are estimated for each group independently.
by default None
Returns
Type | Description |
---|---|
PolarsLazyOrDataFrame | The original polars DataFrame or LazyFrame when the check passes |
Examples
>>> import polars as pl
>>> import pelage as plg
>>> df = given = pl.DataFrame({"int": [1, 2, 1]})
>>> df = pl.DataFrame({"int": [1, 2, 3], "str": ["x", "y", "z"]})
>>> df.pipe(plg.is_monotonic, "int")
3, 2)
shape: (
┌─────┬─────┐int ┆ str │
│ --- ┆ --- │
│ str │
│ i64 ┆
╞═════╪═════╡1 ┆ x │
│ 2 ┆ y │
│ 3 ┆ z │
│
└─────┴─────┘>>> bad = pl.DataFrame({"int": [1, 2, 1], "str": ["x", "y", "z"]})
>>> bad.pipe(plg.is_monotonic, "int")
Traceback (most recent call last):
...
pelage.checks.PolarsAssertError: Detailswith the DataFrame passed to the check function:
Error -->Column "int" expected to be monotonic but is not, try .sort("int")
The folloing example details how to perform this checks for groups:
>>> given = pl.DataFrame(
... ["2020-01-01 01:42:00", "A"),
... ("2020-01-01 01:43:00", "A"),
... ("2020-01-01 01:44:00", "A"),
... ("2021-12-12 01:43:00", "B"),
... ("2021-12-12 01:44:00", "B"),
... (
... ],=["dates", "group"],
... schema"dates").str.to_datetime())
... ).with_columns(pl.col(>>> given.pipe(plg.is_monotonic, "dates", interval="1m", group_by="group")
5, 2)
shape: (
┌─────────────────────┬───────┐
│ dates ┆ group │--- ┆ --- │
│ str │
│ datetime[μs] ┆
╞═════════════════════╪═══════╡2020-01-01 01:42:00 ┆ A │
│ 2020-01-01 01:43:00 ┆ A │
│ 2020-01-01 01:44:00 ┆ A │
│ 2021-12-12 01:43:00 ┆ B │
│ 2021-12-12 01:44:00 ┆ B │
│
└─────────────────────┴───────┘>>> given.pipe(plg.is_monotonic, "dates", interval="3m", group_by="group")
Traceback (most recent call last):
...
pelage.checks.PolarsAssertError: Detailswith the DataFrame passed to the check function:
Error -->Intervals differ from the specified 3m interval. Unexpected: {datetime.timedelta(seconds=60)}