Pelage: Defensive analysis for Polars
  • Get started
  • API Reference
  • Examples
  • Coming from dbt
  • Git

On this page

  • dbt interoperability
  • Context

dbt interoperability

One of the primary objectives of pelage is to facilitate the rewriting of data pipelines from python to SQL, and the inverse. This is why most of the checks are based on the concept of SQL tests proposed by dbt.

dbt core test functions
dbt Available in pelage group_by option
unique ✅ -
not_null has_no_nulls ✅ -
accepted_values ✅ -
relationship maintains_relationship ✅ -
Implementation of dbt-utils tests
dbt-utils Available in pelage group_by option
equal_rowcount has_shape ✅ ✅
fewer_rows_than ❌ ❌
equality ✅ -
expression_is_true custom_check ✅ -
recency ❌ ❌
at_least_one ✅ -
not_constant ✅ ✅
not_empty_string ❌ ❌
cardinality_equality ✅ ✅
not_null_proportion ✅ -
not_accepted_values ✅ -
relationships_where ❌ ❌
mutually_exclusive_ranges ✅ ✅
sequential_values is_monotonic ✅ ✅
unique_combination_of_columns ✅ -
accepted_range ✅ -

Some functions that are also coming from other defensive analysis tools in python have been implemented, even though they are not available in dbt:

Other defensive functions
Name Available in pelage group_by option
has_columns ✅ -
has_dtypes ✅ -
has_no_infs ✅ -
has_mandatory_values ✅ ✅

Context

pelage was designed in order to reduce the gap between data exploration and production. Working on data related use-cases implies facing many different challenges, one the majors are data quality, data drift.

  • One of the best frameworks to test data pipelines is provided by dbt.
  • It’s difficult to write tests after the business logic has been implemented.
  • During EDA, data visualization plays a crucial role to identify relevant data or identify quality problems.
  • SQL transformations are a major component of production-ready data pipelines.