This notebook contains a few examples on how to use pelage. The idea is to illustrate what the main features with an succession of checks / transformation. We use here a simple example: the MPG dataset, loaded using the seaborn utility function.
Imports
import polars as plimport seaborn as snsimport pelage as plgdata = pl.DataFrame(sns.load_dataset("mpg"))data.head()
shape: (5, 9)
mpg
cylinders
displacement
horsepower
weight
acceleration
model_year
origin
name
f64
i64
f64
f64
i64
f64
i64
str
str
18.0
8
307.0
130.0
3504
12.0
70
"usa"
"chevrolet chev…
15.0
8
350.0
165.0
3693
11.5
70
"usa"
"buick skylark …
18.0
8
318.0
150.0
3436
11.0
70
"usa"
"plymouth satel…
16.0
8
304.0
150.0
3433
12.0
70
"usa"
"amc rebel sst"
17.0
8
302.0
140.0
3449
10.5
70
"usa"
"ford torino"
Basic data transformations
In the following example, we perform some basic checks followed by a simple data transformation and finally checking for the presence of outliers.
When the check fails, a PolarsAssertError exception is raised. The error message tends to provide a summarized view of the problem that occurred during the check.
In addition to help the user better understand the root cause of the check failure, the error object also possesses as df attribute that can contains the identified values causing the check to fail.
Here is how to simply retrieve it without adding a try/except block. This allows us to print the error in a string format.
import syserror = sys.last_valueprint(error)
You can then manipulate a subset dataframe containing the elements that triggered the exception. Here we do a few manipulations to determine what are the values that are outside the specified boundaries as well as their relative importance within the dataset.
( pl.DataFrame(error.df) # This is only here to obtain syntax highlighting .select(pl.col("displacement", "horsepower")) .describe())