Regression use case - dragons data

To illustrate applications of auditor to regression problems we will use an artificial dataset dragons available in the DALEX2 package. Our goal is to predict the length of life of dragons.

library(DALEX2)
data("dragons")
head(dragons)
##   year_of_birth   height   weight scars colour year_of_discovery
## 1         -1291 59.40365 15.32391     7    red              1700
## 2          1589 46.21374 11.80819     5    red              1700
## 3          1528 49.17233 13.34482     6    red              1700
## 4          1645 48.29177 13.27427     5  green              1700
## 5            -8 49.99679 13.08757     1    red              1700
## 6           915 45.40876 11.48717     2    red              1700
##   number_of_lost_teeth life_length
## 1                   25   1368.4331
## 2                   28   1377.0474
## 3                   38   1603.9632
## 4                   33   1434.4222
## 5                   18    985.4905
## 6                   20    969.5682

Models

Linear model

lm_model <- lm(life_length ~ ., data = dragons)

Random forest

library("randomForest")
set.seed(59)
rf_model <- randomForest(life_length ~ ., data = dragons)

Preparation for error analysis

The beginning of each analysis is creation of a modelAudit object. It’s an object that can be used to audit a model.

library("auditor")

lm_audit <- audit(lm_model, label = "lm", data = dragons, y = dragons$life_length)
rf_audit <- audit(rf_model, label = "rf", data = dragons, y = dragons$life_length)

Model Performance Audit

Model Ranking radar plot

Model performance measures may be plotted together to easily compare model performances.

Function modelPerformance() compute chosen model performance measures. A result further from the center means a better model performance.

lm_mp <- modelPerformance(lm_audit, scores = c("MAE", "MSE", "REC", "RROC"))
rf_mp <- modelPerformance(rf_audit, scores = c("MAE", "MSE", "REC", "RROC"))

lm_mp
##          score label name
## 1 3.334652e+01    lm  MAE
## 2 1.656454e+03    lm  MSE
## 3 3.330139e+01    lm  REC
## 4 3.312907e+09    lm RROC

Results of modelPerformance() function for multiple models may be plotted together on one plot. Parameter table indicates whether the table with scores should be generated.

On the plot scores are inversed and scaled to [0,1].

plot(lm_mp, rf_mp, table = TRUE)

There is a possibiliy to define functions with custom model performance measure.

new_score <- function(object) sum(sqrt(abs(object$residuals)))

lm_mp <- modelPerformance(lm_audit,  
                          scores = c("MAE", "MSE", "REC", "RROC"), 
                          new.score = new_score)

rf_mp <- modelPerformance(rf_audit,  
                          scores = c("MAE", "MSE", "REC", "RROC"), 
                          new.score = new_score)

plotModelRanking(lm_mp, rf_mp, table = TRUE)

Other methods

Other methods and plots are described in vignettes: