Calculates the distance between the elements in a data set and the mean vector of the data for outlier detection. Values are independent of the scale between variables.
mahalanobis_distance(data, output = c("md", "bd", "both"), normalize = FALSE) # S3 method for matrix mahalanobis_distance(data, output = c("md", "bd", "both"), normalize = FALSE) # S3 method for data.frame mahalanobis_distance(data, output = c("md", "bd", "both"), normalize = FALSE)
data | A matrix or data frame. Data frames will be converted to matrices
via |
---|---|
output | Character string specifying which distance metric(s) to
compute. Current options include: |
normalize | Logical indicating whether or not to normalize the breakdown distances within each column (so that breakdown distances across columns can be compared). |
If output = "md"
, then a vector containing the Mahalanobis
distances is returned. Otherwise, a matrix.
W. Wang and R. Battiti, "Identifying Intrusions in Computer Networks with Principal Component Analysis," in First International Conference on Availability, Reliability and Security, 2006.
# NOT RUN { # Simulate some data x <- data.frame(C1 = rnorm(100), C2 = rnorm(100), C3 = rnorm(100)) # Add Mahalanobis distances x %>% dplyr::mutate(MD = mahalanobis_distance(x)) # Add Mahalanobis and breakdown distances x %>% cbind(mahalanobis_distance(x, output = "both")) # Add Mahalanobis and normalized breakdown distances x %>% cbind(mahalanobis_distance(x, output = "both", normalize = TRUE)) # }