Calculates the distance between the elements in a data set and the mean vector of the data for outlier detection. Values are independent of the scale between variables.

mahalanobis_distance(data, output = c("md", "bd", "both"),
  normalize = FALSE)

# S3 method for matrix
mahalanobis_distance(data, output = c("md", "bd", "both"),
  normalize = FALSE)

# S3 method for data.frame
mahalanobis_distance(data, output = c("md", "bd",
  "both"), normalize = FALSE)

Arguments

data

A matrix or data frame. Data frames will be converted to matrices via data.matrix.

output

Character string specifying which distance metric(s) to compute. Current options include: "md" for Mahalanobis distance (default); "bd" for absolute breakdown distance (used to see which columns drive the Mahalanobis distance); and "both" to return both distance metrics.

normalize

Logical indicating whether or not to normalize the breakdown distances within each column (so that breakdown distances across columns can be compared).

Value

If output = "md", then a vector containing the Mahalanobis distances is returned. Otherwise, a matrix.

References

W. Wang and R. Battiti, "Identifying Intrusions in Computer Networks with Principal Component Analysis," in First International Conference on Availability, Reliability and Security, 2006.

Examples

# NOT RUN {
# Simulate some data
x <- data.frame(C1 = rnorm(100), C2 = rnorm(100), C3 = rnorm(100))

# Add Mahalanobis distances
x %>% dplyr::mutate(MD = mahalanobis_distance(x))

# Add Mahalanobis and breakdown distances
x %>% cbind(mahalanobis_distance(x, output = "both"))

# Add Mahalanobis and normalized breakdown distances
x %>% cbind(mahalanobis_distance(x, output = "both", normalize = TRUE))
# }