Compute variance-based variable importance (VI) scores using a simple feature importance ranking measure (FIRM) approach; for details, see Greenwell et al. (2018) and Scholbeck et al. (2019).
Arguments
- object
A fitted model object (e.g., a randomForest object).
- ...
Additional arguments to be passed on to the
pdp::partial()
function (e.g.,ice = TRUE
,prob = TRUE
, or a prediction wrapper via thepred.fun
argument); see?pdp::partial
for details on these and other useful arguments.- feature_names
Character string giving the names of the predictor variables (i.e., features) of interest. If
NULL
(the default) then the internalget_feature_names()
function will be called to try and extract them automatically. It is good practice to always specify this argument.- train
A matrix-like R object (e.g., a data frame or matrix) containing the training data. If
NULL
(the default) then the internalget_training_data()
function will be called to try and extract it automatically. It is good practice to always specify this argument.- var_fun
Deprecated; use
var_continuous
andvar_categorical
instead.- var_continuous
Function used to quantify the variability of effects for continuous features. Defaults to using the sample standard deviation (i.e.,
stats::sd()
).- var_categorical
Function used to quantify the variability of effects for categorical features. Defaults to using the range divided by four; that is,
function(x) diff(range(x)) / 4
.
Value
A tidy data frame (i.e., a tibble object) with two columns:
Variable
- the corresponding feature name;Importance
- the associated importance, computed as described in Greenwell et al. (2018).
Details
This approach is based on quantifying the relative "flatness" of the
effect of each feature and assumes the user has some familiarity with the
pdp::partial()
function. The Feature effects can be assessed
using partial dependence (PD) plots (Friedman, 2001) or
individual conditional expectation (ICE) plots (Goldstein et al., 2014).
These methods are model-agnostic and can be applied to any supervised
learning algorithm. By default, relative "flatness" is defined by computing
the standard deviation of the y-axis values for each feature effect plot for
numeric features; for categorical features, the default is to use range
divided by 4. This can be changed via the var_continuous
and
var_categorical
arguments. See
Greenwell et al. (2018) for details and
additional examples.
Note
This approach can provide misleading results in the presence of interaction effects (akin to interpreting main effect coefficients in a linear with higher level interaction effects).
References
J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29: 1189-1232, 2001.
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation. (2014) Journal of Computational and Graphical Statistics, 24(1): 44-65, 2015.
Greenwell, B. M., Boehmke, B. C., and McCarthy, A. J. A Simple and Effective Model-Based Variable Importance Measure. arXiv preprint arXiv:1805.04755 (2018).
Scholbeck, C. A. Scholbeck, and Molnar, C., and Heumann C., and Bischl, B., and Casalicchio, G. Sampling, Intervention, Prediction, Aggregation: A Generalized Framework for Model-Agnostic Interpretations. arXiv preprint arXiv:1904.03959 (2019).
Examples
if (FALSE) {
#
# A projection pursuit regression example
#
# Load the sample data
data(mtcars)
# Fit a projection pursuit regression model
mtcars.ppr <- ppr(mpg ~ ., data = mtcars, nterms = 1)
# Compute variable importance scores using the FIRM method; note that the pdp
# package knows how to work with a "ppr" object, so there's no need to pass
# the training data or a prediction wrapper, but it's good practice.
vi_firm(mtcars.ppr, train = mtcars)
# For unsopported models, need to define a prediction wrapper; this approach
# will work for ANY model (supported or unsupported, so better to just always
# define it pass it)
pfun <- function(object, newdata) {
# To use partial dependence, this function needs to return the AVERAGE
# prediction (for ICE, simply omit the averaging step)
mean(predict(object, newdata = newdata))
}
# Equivalent to the previous results (but would work if this type of model
# was not explicitly supported)
vi_firm(mtcars.ppr, pred.fun = pfun, train = mtcars)
# Equivalent VI scores, but the output is sorted by default
vi(mtcars.ppr, method = "firm")
# Use MAD to estimate variability of the partial dependence values
vi_firm(mtcars.ppr, var_continuous = stats::mad)
# Plot VI scores
vip(mtcars.ppr, method = "firm", train = mtcars, pred.fun = pfun)
}