Display a histogram matrix for visual inspection of anomalous observation detection. The color of the blocks represents how anomalous each block is, where a lighter blue represents a more anomalous block. The size of the points indicate which values are driving the anomaly, with larger blocks representing more anomalous values.

hmat(data, input = "data", top = 20, order = "numeric",
  block_length = NULL, level_limit = 50, level_keep = 10,
  partial_block = TRUE, na.rm = FALSE, min_var = 0.1, max_cor = 0.9,
  action = "exclude", output = "both", normalize = FALSE)

Arguments

data

the data set (data frame or matrix)

input

the type of input data being passed to the function. data for a raw categorical data set, SV for a state vector input, and MD if the input has already had the Mahalanobis distances calculated

top

how many of the most anomalous blocks you would like to display (default 20)

order

whether to show the anomalous blocks in numeric order or in order of most anomalous to least anomalous (default is "numeric", other choice is "anomaly")

block_length

argument fed into tabulate_state_vector, necessary if input = data

level_limit

argument fed into tabulate_state_vector, if the number of unique categories for a variable exceeds this number, only keep a limited number of the most popular values (default 50)

level_keep

argument fed into tabulate_state_vector, if level_limit is exceeded, keep this many of the most popular values (default 10)

partial_block

argument fed into tabulate_state_vector, if the number of entries is not divisible by the block_length, this logical decides whether to keep the smaller last block (default TRUE)

na.rm

whether to keep track of missing values as part of the analysis or ignore them (default FALSE)

min_var

argument fed into mc_adjust, if a column in the state vector has variance less than this value, remove it (default 0.1)

max_cor

argument fed into mc_adjust, if a column in the state vector has correlation greater than this value, remove it (default 0.9)

action

argument fed into mc_adjust, if a column does not fall in the specified range, determine what to do with it (default "exclude")

output

argument fed into mahalanobis_distance that decides whether to add a column for the Mahalanobis Distance ('MD'), the breakdown distances ('BD') or both (default "both")

normalize

argument fed into mahalanobis_distance that decides whether to normalize the values by column (default = FALSE)

Examples

# NOT RUN {
# Data set input
hmat(security_logs,block_length = 8)

# Data Set input with top 10 blocks displayed
hmat(security_logs, top = 10, block_length = 5)

# State Vector Input
tabulate_state_vector(security_logs, block_length = 6, level_limit = 20) %>%
  hmat(input = "SV")
# }