dbt interoperability

One of the primary objectives of pelage is to facilitate the rewriting of data pipelines from python to SQL, and the inverse. This is why most of the checks are based on the concept of SQL tests proposed by dbt.

dbt core test functions
dbt	Available in `pelage`	`group_by` option
unique	✅	-
not_null	has_no_nulls ✅	-
accepted_values	✅	-
relationship	maintains_relationship ✅	-

Implementation of dbt-utils tests
dbt-utils	Available in `pelage`	`group_by` option
equal_rowcount	has_shape ✅	✅
fewer_rows_than	❌	❌
equality	✅	-
expression_is_true	custom_check ✅	-
recency	❌	❌
at_least_one	✅	-
not_constant	✅	✅
not_empty_string	❌	❌
cardinality_equality	✅	✅
not_null_proportion	✅	-
not_accepted_values	✅	-
relationships_where	❌	❌
mutually_exclusive_ranges	✅	✅
sequential_values	is_monotonic ✅	✅
unique_combination_of_columns	✅	-
accepted_range	✅	-

Some functions that are also coming from other defensive analysis tools in python have been implemented, even though they are not available in dbt:

Other defensive functions
Name	Available in `pelage`	`group_by` option
has_columns	✅	-
has_dtypes	✅	-
has_no_infs	✅	-
has_mandatory_values	✅	✅

Context

pelage was designed in order to reduce the gap between data exploration and production. Working on data related use-cases implies facing many different challenges, one the majors are data quality, data drift.

One of the best frameworks to test data pipelines is provided by dbt.
It’s difficult to write tests after the business logic has been implemented.
During EDA, data visualization plays a crucial role to identify relevant data or identify quality problems.
SQL transformations are a major component of production-ready data pipelines.