Compute variable importance scores for the predictors in a model.

vi(object, ...)

# S3 method for default
vi(
  object,
  method = c("model", "pdp", "ice", "permute", "shap"),
  feature_names = NULL,
  FUN = NULL,
  var_fun = NULL,
  abbreviate_feature_names = NULL,
  sort = TRUE,
  decreasing = TRUE,
  scale = FALSE,
  rank = FALSE,
  ...
)

# S3 method for model_fit
vi(object, ...)

Arguments

object

A fitted model object (e.g., a "randomForest" object) or an object that inherits from class "vi".

...

Additional optional arguments to be passed onto vi_model, vi_pdp, vi_ice, or vi_permute.

method

Character string specifying the type of variable importance (VI) to compute. Current options are "model" (the default), for model-specific VI scores (see vi_model for details), "pdp", for PDP-based VI scores (see vi_pdp for details), "ice", for ICE-based VI scores (see vi_ice for details), "permute", for permutation-based VI scores (see vi_permute for details), or "shao", for SHAP-based VI scores. For more details on the PDP/ICE-based methods, see the reference below.

feature_names

Character string giving the names of the predictor variables (i.e., features) of interest.

FUN

Deprecated. Use var_fun instead.

var_fun

List with two components, "cat" and "con", containing the functions to use to quantify the variability of the feature effects (e.g., partial dependence values) for categorical and continuous features, respectively. If NULL, the standard deviation is used for continuous features. For categorical features, the range statistic is used (i.e., (max - min) / 4). Only used when method = "pdp" or method = "ice".

abbreviate_feature_names

Integer specifying the length at which to abbreviate feature names. Default is NULL which results in no abbreviation (i.e., the full name of each feature will be printed).

sort

Logical indicating whether or not to order the sort the variable importance scores. Default is TRUE.

decreasing

Logical indicating whether or not the variable importance scores should be sorted in descending (TRUE) or ascending (FALSE) order of importance. Default is TRUE.

scale

Logical indicating whether or not to scale the variable importance scores so that the largest is 100. Default is FALSE.

rank

Logical indicating whether or not to rank the variable importance scores (i.e., convert to integer ranks). Default is FALSE. Potentially useful when comparing variable importance scores across different models using different methods.

Value

A tidy data frame (i.e., a "tibble" object) with at least two columns: Variable and Importance. For "lm"/"glm"-like objects, an additional column, called Sign, is also included which includes the sign (i.e., POS/NEG) of the original coefficient. If method = "permute" and nsim > 1, then an additional column, StDev, giving the standard deviation of the permutation-based variable importance scores is included.

References

Greenwell, B. M., Boehmke, B. C., and McCarthy, A. J. A Simple and Effective Model-Based Variable Importance Measure. arXiv preprint arXiv:1805.04755 (2018).

Examples

# # A projection pursuit regression example # # Load the sample data data(mtcars) # Fit a projection pursuit regression model mtcars.ppr <- ppr(mpg ~ ., data = mtcars, nterms = 1) # Compute variable importance scores vi(mtcars.ppr, method = "ice")
#> # A tibble: 10 x 2 #> Variable Importance #> <chr> <dbl> #> 1 wt 3.44 #> 2 hp 2.57 #> 3 gear 1.85 #> 4 qsec 1.56 #> 5 cyl 0.743 #> 6 am 0.690 #> 7 vs 0.448 #> 8 drat 0.245 #> 9 carb 0.0870 #> 10 disp 0.0248
vi(mtcars.ppr, method = "ice", var_fun = list("con" = mad, "cat" = function(x) diff(range(x)) / 4))
#> # A tibble: 10 x 2 #> Variable Importance #> <chr> <dbl> #> 1 wt 3.87 #> 2 hp 2.85 #> 3 gear 2.14 #> 4 qsec 1.71 #> 5 cyl 0.949 #> 6 am 0.723 #> 7 vs 0.470 #> 8 drat 0.297 #> 9 carb 0.102 #> 10 disp 0.0317
# Plot variable importance scores vip(mtcars.ppr, method = "ice")