dbt interoperability
One of the primary objectives of pelage
is to facilitate the rewriting of data pipelines from python to SQL, and the inverse. This is why most of the checks are based on the concept of SQL tests proposed by dbt
.
dbt | Available in pelage |
group_by option |
---|---|---|
unique | ✅ | - |
not_null | has_no_nulls ✅ | - |
accepted_values | ✅ | - |
relationship | maintains_relationship ✅ | - |
dbt-utils | Available in pelage |
group_by option |
---|---|---|
equal_rowcount | has_shape ✅ | ✅ |
fewer_rows_than | ❌ | ❌ |
equality | ✅ | - |
expression_is_true | custom_check ✅ | - |
recency | ❌ | ❌ |
at_least_one | ✅ | - |
not_constant | ✅ | ✅ |
not_empty_string | ❌ | ❌ |
cardinality_equality | ✅ | ✅ |
not_null_proportion | ✅ | - |
not_accepted_values | ✅ | - |
relationships_where | ❌ | ❌ |
mutually_exclusive_ranges | ✅ | ✅ |
sequential_values | is_monotonic ✅ | ✅ |
unique_combination_of_columns | ✅ | - |
accepted_range | ✅ | - |
Some functions that are also coming from other defensive analysis tools in python have been implemented, even though they are not available in dbt:
Name | Available in pelage |
group_by option |
---|---|---|
has_columns | ✅ | - |
has_dtypes | ✅ | - |
has_no_infs | ✅ | - |
has_mandatory_values | ✅ | ✅ |
Context
pelage
was designed in order to reduce the gap between data exploration and production. Working on data related use-cases implies facing many different challenges, one the majors are data quality, data drift.
- One of the best frameworks to test data pipelines is provided by
dbt
. - It’s difficult to write tests after the business logic has been implemented.
- During EDA, data visualization plays a crucial role to identify relevant data or identify quality problems.
- SQL transformations are a major component of production-ready data pipelines.