Compute model-specific variable importance scores for the predictors in a model. (This function is meant for internal use only.)

vi_model(object, ...) # S3 method for default vi_model(object, ...) # S3 method for C5.0 vi_model(object, type = c("usage", "splits"), ...) # S3 method for train vi_model(object, ...) # S3 method for cubist vi_model(object, ...) # S3 method for earth vi_model(object, type = c("nsubsets", "rss", "gcv"), ...) # S3 method for gbm vi_model(object, type = c("relative.influence", "permutation"), ...) # S3 method for glmnet vi_model(object, ...) # S3 method for cv.glmnet vi_model(object, ...) # S3 method for H2OBinomialModel vi_model(object, ...) # S3 method for H2OMultinomialModel vi_model(object, ...) # S3 method for H2ORegressionModel vi_model(object, ...) # S3 method for nn vi_model(object, type = c("olden", "garson"), ...) # S3 method for nnet vi_model(object, type = c("olden", "garson"), ...) # S3 method for RandomForest vi_model(object, type = c("accuracy", "auc"), ...) # S3 method for constparty vi_model(object, ...) # S3 method for cforest vi_model(object, ...) # S3 method for mvr vi_model(object, ...) # S3 method for randomForest vi_model(object, ...) # S3 method for ranger vi_model(object, ...) # S3 method for rpart vi_model(object, ...) # S3 method for mlp vi_model(object, type = c("olden", "garson"), ...) # S3 method for ml_model_decision_tree_regression vi_model(object, ...) # S3 method for ml_model_decision_tree_classification vi_model(object, ...) # S3 method for ml_model_gbt_regression vi_model(object, ...) # S3 method for ml_model_gbt_classification vi_model(object, ...) # S3 method for ml_model_generalized_linear_regression vi_model(object, ...) # S3 method for ml_model_linear_regression vi_model(object, ...) # S3 method for ml_model_random_forest_regression vi_model(object, ...) # S3 method for ml_model_random_forest_classification vi_model(object, ...) # S3 method for lm vi_model(object, type = c("stat", "raw"), ...) # S3 method for xgb.Booster vi_model(object, type = c("gain", "cover", "frequency"), ...)

object | A fitted model object (e.g., a |
---|---|

... | Additional optional arguments. |

type | Character string specifying the type of variable importance to return (only used for some models). See details for which methods this argument applies to. |

A tidy data frame (i.e., a `"tibble"`

object) with two columns:
`Variable`

and `Importance`

. For `"lm"/"glm"`

-like object, an
additional column, called `Sign`

, is also included which includes the
sign (i.e., POS/NEG) of the original coefficient.

Computes model-specific variable importance scores depending on the class of
`object`

:

`C5.0`

Variable importance is measured by determining the percentage of training set samples that fall into all the terminal nodes after the split. For example, the predictor in the first split automatically has an importance measurement of 100 percent since all samples are affected by this split. Other predictors may be used frequently in splits, but if the terminal nodes cover only a handful of training set samples, the importance scores may be close to zero. The same strategy is applied to rule-based models and boosted versions of the model. The underlying function can also return the number of times each predictor was involved in a split by using the option

`metric = "usage"`

. See`C5imp`

for details.`cubist`

The Cubist output contains variable usage statistics. It gives the percentage of times where each variable was used in a condition and/or a linear model. Note that this output will probably be inconsistent with the rules shown in the output from summary.cubist. At each split of the tree, Cubist saves a linear model (after feature selection) that is allowed to have terms for each variable used in the current split or any split above it. Quinlan (1992) discusses a smoothing algorithm where each model prediction is a linear combination of the parent and child model along the tree. As such, the final prediction is a function of all the linear models from the initial node to the terminal node. The percentages shown in the Cubist output reflects all the models involved in prediction (as opposed to the terminal models shown in the output). The variable importance used here is a linear combination of the usage in the rule conditions and the model. See

`summary.cubist`

and`varImp.cubist`

for details.`glmnet`

Similar to (generalized) linear models, the absolute value of the coefficients are returned for a specific model. It is important that the features (and hence, the estimated coefficients) be standardized prior to fitting the model. You can specify which coefficients to return by passing the specific value of the penalty parameter via the

`...`

argument. See`coef.glmnet`

for details. By default, the coefficients corresponding to the final penalty value in the sequence is returned; in other words, you should ALWAYS SPECIFY THIS VALUE! For`"cv.glmnet"`

objects, the largest value of lambda such that error is within one standard error of the minimum is used by default. For`"multnet"`

objects, the coefficients corresponding to the first class are used; that is, the fist component of`coef.glmnet`

.`cforest`

Variable importance is measured in a way similar to those computed by

`importance`

. Besides the standard version, a conditional version is available that adjusts for correlations between predictor variables. If`conditional = TRUE`

, the importance of each variable is computed by permuting within a grid defined by the predictors that are associated (with 1 -*p*-value greater than threshold) to the variable of interest. The resulting variable importance score is conditional in the sense of beta coefficients in regression models, but represents the effect of a variable in both main effects and interactions. See Strobl et al. (2008) for details. Note, however, that all random forest results are subject to random variation. Thus, before interpreting the importance ranking, check whether the same ranking is achieved with a different random seed - or otherwise increase the number of trees ntree in`ctree_control`

. Note that in the presence of missings in the predictor variables the procedure described in Hapfelmeier et al. (2012) is performed. See`varimp`

for details.`earth`

The

`earth`

package uses three criteria for estimating the variable importance in a MARS model (see`evimp`

for details):The

`nsubsets`

criterion (`type = "nsubsets"`

) counts the number of model subsets that include each feature. Variables that are included in more subsets are considered more important. This is the criterion used by`summary.earth`

to print variable importance. By "subsets" we mean the subsets of terms generated by`earth()`

's backward pass. There is one subset for each model size (from one to the size of the selected model) and the subset is the best set of terms for that model size. (These subsets are specified in the`$prune.terms`

component of`earth()`

's return value.) Only subsets that are smaller than or equal in size to the final model are used for estimating variable importance. This is the default method used by**vip**.The

`rss`

criterion (`type = "rss"`

) first calculates the decrease in the RSS for each subset relative to the previous subset during`earth()`

’s backward pass. (For multiple response models, RSS's are calculated over all responses.) Then for each variable it sums these decreases over all subsets that include the variable. Finally, for ease of interpretation the summed decreases are scaled so the largest summed decrease is 100. Variables which cause larger net decreases in the RSS are considered more important.The

`gcv`

criterion (`type = "gcv"`

) is similar to the`rss`

approach, but uses the GCV statistic instead of the RSS. Note that adding a variable can sometimes increase the GCV. (Adding the variable has a deleterious effect on the model, as measured in terms of its estimated predictive power on unseen data.) If that happens often enough, the variable can have a negative total importance, and thus appear less important than unused variables.

`gbm`

Variable importance is computed using one of two approaches (See

`summary.gbm`

for details):The standard approach (

`type = "relative.influence"`

) described in Friedman (2001). When`distribution = "gaussian"`

this returns the reduction of squared error attributable to each variable. For other loss functions this returns the reduction attributable to each variable in sum of squared error in predicting the gradient on each iteration. It describes the*relative influence*of each variable in reducing the loss function. This is the default method used by**vip**.An experimental permutation-based approach (

`type = "permutation"`

). This method randomly permutes each predictor variable at a time and computes the associated reduction in predictive performance. This is similar to the variable importance measures Leo Breiman uses for random forests, but**gbm**currently computes using the entire training dataset (not the out-of-bag observations).

`H2OModel`

See

`h2o.varimp`

or visit http://docs.h2o.ai/h2o/latest-stable/h2o-docs/variable-importance.html for details.`nnet`

Two popular methods for constructing variable importance scores with neural networks are the Garson algorithm (Garson 1991), later modified by Goh (1995), and the Olden algorithm (Olden et al. 2004). For both algorithms, the basis of these importance scores is the network’s connection weights. The Garson algorithm determines variable importance by identifying all weighted connections between the nodes of interest. Olden’s algorithm, on the other hand, uses the product of the raw connection weights between each input and output neuron and sums the product across all hidden neurons. This has been shown to outperform the Garson method in various simulations. For DNNs, a similar method due to Gedeon (1997) considers the weights connecting the input features to the first two hidden layers (for simplicity and speed); but this method can be slow for large networks.. To implement the Olden and Garson algorithms, use

`type = "olden"`

and`type = "garson"`

, respectively. See`garson`

and`olden`

for details.`lm`

In (generalized) linear models, variable importance is typically based on the absolute value of the corresponding

*t*-statistics. For such models, the sign of the original coefficient is also returned. By default,`type = "stat"`

is used; however, if the inputs have been appropriately standardized then the raw coefficients can be used with`type = "raw"`

.`ml_feature_importances`

The Spark ML library provides standard variable importance for tree-based methods (e.g., random forests). See

`ml_feature_importances`

for details.`randomForest`

Random forests typically provide two measures of variable importance. The first measure is computed from permuting out-of-bag (OOB) data: for each tree, the prediction error on the OOB portion of the data is recorded (error rate for classification and MSE for regression). Then the same is done after permuting each predictor variable. The difference between the two are then averaged over all trees in the forest, and normalized by the standard deviation of the differences. If the standard deviation of the differences is equal to 0 for a variable, the division is not done (but the average is almost always equal to 0 in that case). See

`importance`

for details, including additional arguments that can be passed via the`...`

argument.The second measure is the total decrease in node impurities from splitting on the variable, averaged over all trees. For classification, the node impurity is measured by the Gini index. For regression, it is measured by residual sum of squares. See`importance`

for details.`cforest`

Same approach described in

`cforest`

above. See`varimp`

and`varimpAUC`

(if`type = "auc"`

) for details.`ranger`

Variable importance for

`ranger`

objects is computed in the usual way for random forests. The approach used depends on the`importance`

argument provided in the initial call to`ranger`

. See`importance`

for details.`rpart`

As stated in one of the

**rpart**vignettes. A variable may appear in the tree many times, either as a primary or a surrogate variable. An overall measure of variable importance is the sum of the goodness of split measures for each split for which it was the primary variable, plus "goodness" * (adjusted agreement) for all splits in which it was a surrogate. Imagine two variables which were essentially duplicates of each other; if we did not count surrogates, they would split the importance with neither showing up as strongly as it should. See`rpart`

for details.`train`

Various model-specific and model-agnostic approaches that depend on the learning algorithm employed in the original call to

`train`

. See`varImp`

for details.`xgboost`

For linear models, the variable importance is the absolute magnitude of the estimated coefficients. For that reason, in order to obtain a meaningful ranking by importance for a linear model, the features need to be on the same scale (which you also would want to do when using either L1 or L2 regularization). Otherwise, the approach described in Friedman (2001) for

`gbm`

s is used. See`xgb.importance`

for details. If`type = NULL`

(the default),`"Gain"`

is used. See`xgb.importance`

for details.

Inspired by the `varImp`

function.