Introduction

Did you ever want to have one place where you can find information explaining your model? Or maybe you were missing a tool that can show difference in multiple models for the same dataset? Well, here comes modelDown package. By using DALEX package, it creates one html page with plots and information related to the model(s) you want to analyze.

If you want to check out example website generated with modelDown, check out this link (along with script that was used to create the html). Read on to see how to use package for your own models and what features it provides.

The examples presented here were generated for dataset HR_data from breakDown package (available on CRAN). The dataset contains various information about employees (for example their satisfaction from work or their salary). The information we predict is whether they left the company.

Installation

First things first - how can you use this package? Install it from github:

devtools::install_github("MI2DataLab/modelDown")

When you have the package successfully installed, you need to create DALEX explainers for you models. Here is a simple example. Please refer to DALEX package documentation in order to learn more.

# assuming you have two models: glm_model and ranger_model for HR_data
explainer_glm <- DALEX::explain(glm_model, data=HR_data, y=HR_data$left)
explainer_ranger <- DALEX::explain(ranger_model, data=HR_data, y=HR_data$left)

Next, just pass all created explainers to function modelDown. For example:

modelDown::modelDown(explainer_ranger, explainer_glm)

That’s it! Now you should have your html page generated with default options.

Features

Let’s quickly describe the sections of your page. If you want to know more about how the plots are generated, again, check out DALEX package documentation.

Index page

Always know your data before you analyze the model - the index page helps you do exactly that.

You can see basic information about your data, like dimensions and summary of all variables. For numerical variables there is some statistical data presented, for categorical ones you see how many observations were in each category.

Model Performance

The most general informations about how correct were the predictions.

For our two models - clearly ranger model has lower residual values, which suggests its better performance for this dataset.

Variable Importance

Variable importance plot is extremely useful when you want to see how removing single variable impacts the response - basically how important every variable is.

Here, it is clear that for linear model there are two most important variables - number_project and satisfaction_level. For ranger model, there are 4 most important variables. Also, for each model different variable was picked as the most important one.

Variable Response

In variable response plot you can see how one variable impacts response.

For example, for variable average_monthly_hours and glm model, there is a linear dependency - the more hours someone works, the greater chance he will leave the company. For ranger model, this is not so clear - chance of leaving drastically increases for people working more than 270 hours a month. By default the plots are generated for every variable, so you can make similar conslusions for all variables in the model.

Prediction BreakDown

Prediction breakdown shows detailed informations for particular observations in a model.By default for each model one observation with worst predicted value is presented.

On the example, for ranger model the value ofsatisfaction_level had the biggest part in final response calculation. So even though this particular employee’s satisfaction level was lower than half of scale used to measure, he still didn’t leave the company. The model prediction was not correct in this case.

Prediction breakdown makes it easier to understand how model acted. It can be useful for tuning your model and improving its capabilites.

Summary

The idea of the package was to help you understand your models in a condensed and easy way. We hope that using this package will make models’ performance clear to you. Feel free to use it and provide your feedback.