tabulate_state_vector employs a tabulated vector approach to transform security log data into unique counts of data attributes based on time blocks. Taking a contingency table approach, this function separates variables of type character or factor into their unique levels and counts the number of occurrences for those levels within each block. Due to the large number of unique IP addresses, this function allows for the user to determine how many IP addresses they would like to investigate. The function tabulates the most popular IP addresses.

tabulate_state_vector(data, block_length, level_limit = 50L,
  level_keep = 10L, partial_block = FALSE, na.rm = FALSE)

Arguments

data

data

block_length

integer value to divide data by

level_limit

integer value to determine the cutoff for the number of factors in a column to display before being reduced to show the number of levels to keep (default is 50)

level_keep

integer value indicating the top number of factor levels to retain if a column has more than the level limit (default is 10)

partial_block

a logical which determines whether incomplete blocks are kept in the analysis in the case where the number of log entries isn't evenly divisible by the block_length

na.rm

whether to keep track of missing values as part of the analysis or ignore them

Value

A data frame where each row represents one block and the columns count the number of occurrences that character/factor level occurred in that block

Examples

tabulate_state_vector(security_logs, 30)
#> Some variables contain more than 50 levels. Only the 10 most popular levels of these variables will be tabulated.
#> # A tibble: 10 x 54 #> ASA Attempt Bytes_TRF_102 Bytes_TRF_120 Bytes_TRF_160 Bytes_TRF_200 #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 4 12 1 1 2 1 #> 2 10 7 0 1 0 0 #> 3 5 13 1 1 0 1 #> 4 6 11 1 0 0 2 #> 5 5 12 0 1 0 0 #> 6 6 12 0 0 0 0 #> 7 4 11 0 0 0 0 #> 8 5 10 0 0 2 0 #> 9 3 7 1 0 0 0 #> 10 1 9 0 1 0 0 #> # … with 48 more variables: Bytes_TRF_208 <dbl>, Bytes_TRF_60 <dbl>, #> # Bytes_TRF_64 <dbl>, Bytes_TRF_70 <dbl>, Bytes_TRF_72 <dbl>, #> # Bytes_TRF_80 <dbl>, Bytes_TRF_90 <dbl>, China <dbl>, CISCO <dbl>, #> # Dst_IP_145.114.4.203 <dbl>, Dst_IP_151.194.233.198 <dbl>, #> # Dst_IP_219.142.109.8 <dbl>, Dst_IP_32.73.26.223 <dbl>, #> # Dst_IP_56.137.121.203 <dbl>, Dst_Port_20000 <dbl>, Dst_Port_25 <dbl>, #> # Dst_Port_593 <dbl>, Dst_Port_80 <dbl>, Dst_Port_90 <dbl>, ePO <dbl>, #> # Failure <dbl>, Firewall <dbl>, IBM <dbl>, India <dbl>, Juniper <dbl>, #> # Korea <dbl>, McAfee <dbl>, Netherlands <dbl>, NSP <dbl>, `Palo Alto #> # Networks` <dbl>, Russia <dbl>, SNIPS <dbl>, Src_IP_174.110.206.174 <dbl>, #> # Src_IP_223.70.128.61 <dbl>, Src_IP_227.12.127.87 <dbl>, #> # Src_IP_28.9.24.154 <dbl>, Src_IP_89.130.69.91 <dbl>, Src_Port_113 <dbl>, #> # Src_Port_135 <dbl>, Src_Port_21 <dbl>, Src_Port_25 <dbl>, #> # Src_Port_80 <dbl>, SRX <dbl>, Success <dbl>, TCP <dbl>, UDP <dbl>, `United #> # Kingdom` <dbl>, US <dbl>