Skip to contents

Project Overview

The vip (Variable Importance Plots) package is a comprehensive R framework for constructing variable importance plots from various machine learning models. It provides both model-specific and model-agnostic approaches to feature importance, serving as a critical tool for interpretable machine learning (IML).

Key Features

  • Unified Interface: Single API (vi() and vip()) for 40+ different ML model types
  • Multiple VI Methods: Model-specific, permutation-based, SHAP-based, and variance-based approaches
  • Extensive Model Support: Integration with major R ML ecosystems (tidymodels, caret, mlr, etc.)
  • Professional Visualization: ggplot2-based plotting with customizable aesthetics
  • Academic Rigor: Peer-reviewed methodology published in The R Journal (2020)

Architecture

vip/
├── R/                          # Source code (12 files, ~3,735 lines)
│   ├── vi.R                   # Main VI computation interface
│   ├── vip.R                  # Main plotting interface
│   ├── vi_model.R             # Model-specific methods (42 S3 methods)
│   ├── vi_permute.R           # Permutation-based importance
│   ├── vi_shap.R              # Shapley-based importance
│   ├── vi_firm.R              # Variance-based importance
│   └── ...                    # Utilities and helpers
├── inst/tinytest/             # Test suite (28 files, ~1,581 lines)
├── man/                       # Documentation (11 .Rd files)
├── vignettes/                 # Package vignette
└── data/                      # Example datasets

Development Workflow

Test-Driven Development (TDD) Framework

This project follows strict test-driven development using the tinytest framework, chosen for its: - Zero dependencies: Lightweight testing without external deps - CRAN compatibility: Seamless integration with R package ecosystem - Conditional testing: Graceful handling of optional dependencies - Clear output: Simple, readable test results

TDD Cycle for vip Development

  1. Write Tests First

    # Example: Adding new model support
    # File: inst/tinytest/test_pkg_newmodel.R
    
    # Check dependencies first
    exit_if_not(requireNamespace("newmodel", quietly = TRUE))
    
    # Load test data
    data("test_dataset")
    
    # Fit model
    model <- newmodel::fit_model(target ~ ., data = test_dataset)
    
    # Test vi() method
    vi_scores <- vi(model)
    expect_inherits(vi_scores, c("vi", "tbl_df", "tbl", "data.frame"))
    expect_equal(nrow(vi_scores), ncol(test_dataset) - 1L)
    expect_true(all(c("Variable", "Importance") %in% names(vi_scores)))
  2. Implement S3 Methods

    # File: R/vi_model.R
    #' @export
    vi_model.newmodel <- function(object, ...) {
      # Extract importance scores
      importance <- newmodel::variable_importance(object)
    
      # Convert to standard format
      tibble::tibble(
        Variable = names(importance),
        Importance = as.numeric(importance)
      )
    }
  3. Run Tests and Refactor

    # Run specific tests
    tinytest::run_test_file("inst/tinytest/test_pkg_newmodel.R")
    
    # Run full suite
    tinytest::test_package("vip")

Core Testing Patterns

1. Conditional Testing for Optional Dependencies

# Always check dependencies before running tests
exit_if_not(
  requireNamespace("randomForest", quietly = TRUE),
  requireNamespace("pdp", quietly = TRUE)
)

2. Standardized Test Structure

# Standard expectation function for VI objects
expectations <- function(object, n_features) {
  # Check class
  expect_identical(class(object), 
                   target = c("vi", "tbl_df", "tbl", "data.frame"))
  
  # Check dimensions
  expect_identical(n_features, target = nrow(object))
  
  # Check required columns
  expect_true(all(c("Variable", "Importance") %in% names(object)))
  
  # Check for valid importance scores
  expect_true(all(is.numeric(object$Importance)))
  expect_true(all(is.finite(object$Importance)))
}

3. Model-Specific Test Patterns

# Pattern for testing model-specific implementations
test_model_vi <- function(model, expected_features) {
  # Test basic vi() call
  vi_result <- vi(model)
  expectations(vi_result, length(expected_features))
  
  # Test with different methods
  for (method in c("model", "permute", "shap", "firm")) {
    if (supports_method(model, method)) {
      vi_method <- vi(model, method = method)
      expectations(vi_method, length(expected_features))
    }
  }
  
  # Test vip() plotting
  p <- vip(model)
  expect_inherits(p, "ggplot")
}

R Best Practices Implementation

1. Package Structure

  • DESCRIPTION: Proper metadata, versioning, and dependency management
  • NAMESPACE: Clean exports using roxygen2 @export tags
  • Imports: Minimal dependencies (5 core imports)
  • S3 Methods: Consistent dispatch system for 40+ model types

2. Code Style and Documentation

# Roxygen2 documentation standard
#' Variable importance
#'
#' Compute variable importance scores for the predictors in a model.
#'
#' @param object A fitted model object
#' @param method Character string specifying VI type ("model", "permute", "shap", "firm")
#' @param feature_names Character vector of feature names to compute
#' @param sort Logical indicating whether to sort results
#' @param ... Additional arguments passed to specific methods
#'
#' @return A tibble with Variable and Importance columns
#'
#' @examples
#' \dontrun{
#' library(randomForest)
#' rf <- randomForest(Species ~ ., data = iris)
#' vi_scores <- vi(rf)
#' }
#'
#' @export
vi <- function(object, ...) {
  UseMethod("vi")
}

3. Error Handling and Validation

# Input validation pattern
vi_validate_inputs <- function(object, method, ...) {
  # Check object class
  if (!inherits(object, "list") && !is.function(predict)) {
    stop("'object' must be a fitted model with a predict method")
  }
  
  # Validate method
  valid_methods <- c("model", "permute", "shap", "firm")
  if (!method %in% valid_methods) {
    stop("'method' must be one of: ", paste(valid_methods, collapse = ", "))
  }
}

4. Performance Considerations

# Efficient parallel processing with foreach
vi_permute_parallel <- function(object, train, metric, nsim, parallel = FALSE, ...) {
  if (parallel && foreach::getDoParRegistered()) {
    results <- foreach::foreach(i = seq_len(nsim), .combine = rbind) %dopar% {
      compute_permutation_importance(object, train, metric)
    }
  } else {
    results <- foreach::foreach(i = seq_len(nsim), .combine = rbind) %do% {
      compute_permutation_importance(object, train, metric)
    }
  }
  results
}

5. ggplot2 Compatibility (Important!)

With ggplot2’s transition to S7 classes, testing ggplot object classes requires updated approaches:

# ❌ DEPRECATED: Direct class testing (will break with ggplot2 S7)
expect_identical(class(p), c("gg", "ggplot"))

# ✅ RECOMMENDED: Use is_ggplot() for forward compatibility
expect_true(ggplot2::is_ggplot(p))

# Alternative approaches:
expect_true(inherits(p, "ggplot"))  # fallback option

Key Points: - Always use ggplot2::is_ggplot() rather than class() for ggplot objects in tests - This ensures compatibility with both current and future ggplot2 versions - Updated in vip 0.4.1 to address issue #162

6. Documentation and README Style Guidelines

Sentence Case Requirements: - ALWAYS use sentence case for all headings, bullet points, and descriptions - Examples: - ✅ “Key features” (not “Key Features”)
- ✅ “Model-specific variable importance” (not “Model-Specific Variable Importance”) - ✅ “Adding model support” (not “Adding Model Support”)

Emoji Usage Guidelines: - Section headers: Emojis are encouraged (🚀, ✨, 🛠️, etc.) - Tables and content: Use sparingly, only when they add clear value - Lists and bullets: Avoid emojis in favor of clean, readable text - General rule: When in doubt, leave it out

Examples:

# ✅ GOOD
## 🚀 Quick start
- **Universal interface**: Works with 40+ model types
- **Multiple methods**: Model-specific, permutation, SHAP

# ❌ AVOID
## 🚀 Quick Start  # Title case
- **🎯 Universal Interface**: Works with 40+ model types  # Emoji in bullet + title case
- **🔬 Multiple Methods**: Model-specific, permutation, SHAP  # Same issues

Development Commands

Essential R CMD Commands

# Check package
R CMD check vip_*.tar.gz --as-cran

# Build package
R CMD build .

# Install package
R CMD INSTALL .

# Generate documentation
Rscript -e "roxygen2::roxygenise()"

Testing Commands

# Run all tests
tinytest::test_package("vip")

# Run specific test file
tinytest::run_test_file("inst/tinytest/test_vi_firm.R")

# Test with coverage
covr::package_coverage()

# Test examples in documentation
R CMD check --run-donttest --run-dontrun

Linting and Style

# Check code style
lintr::lint_package()

# Format code (if using styler)
styler::style_pkg()

Adding New Model Support

Step-by-Step TDD Process

  1. Create Test File

    # File: inst/tinytest/test_pkg_NEWMODEL.R
    exit_if_not(requireNamespace("NEWMODEL", quietly = TRUE))
    
    # Load test data and fit model
    data("test_data")  # or create synthetic data
    model <- NEWMODEL::fit_function(formula, data = test_data)
    
    # Define expectations
    expectations <- function(object) {
      expect_inherits(object, c("vi", "tbl_df", "tbl", "data.frame"))
      expect_equal(nrow(object), ncol(test_data) - 1L)
      expect_true(all(c("Variable", "Importance") %in% names(object)))
    }
    
    # Test vi_model method
    vi_result <- vi(model, method = "model")
    expectations(vi_result)
    
    # Test vip plotting
    p <- vip(model)
    expect_inherits(p, "ggplot")
  2. Implement S3 Method

    # File: R/vi_model.R
    #' @export
    vi_model.NEWMODEL <- function(object, type = NULL, ...) {
      # Extract variable importance
      imp <- NEWMODEL::importance_function(object, type = type)
    
      # Convert to standard tibble format
      tibble::tibble(
        Variable = names(imp),
        Importance = as.numeric(imp)
      )
    }
  3. Update Documentation

    • Add to vi_model.R details section
    • Update DESCRIPTION Enhances field if needed
    • Add example to package vignette
  4. Run Tests

    # Test new implementation
    tinytest::run_test_file("inst/tinytest/test_pkg_NEWMODEL.R")
    
    # Run full test suite
    tinytest::test_package("vip")

Quality Assurance Checklist

Before Committing

Before Release

Known Technical Debt

Current FIXME items to address: 1. get_feature_names.R:58 - Component location verification 2. vi_model.R:588,615,642 - Extra row handling in model outputs
3. vi_permute.R:443 - Yardstick integration optimization

Resources


This guide ensures consistent, high-quality development following R package best practices and test-driven development methodology.