tabulate_state_vector
employs a tabulated vector approach to transform
security log data into unique counts of data attributes based on time blocks.
Taking a contingency table approach, this function separates variables of type
character or factor into their unique levels and counts the number of occurrences
for those levels within each block. Due to the large number of unique IP addresses,
this function allows for the user to determine how many IP addresses they would
like to investigate. The function tabulates the most popular IP addresses.
tabulate_state_vector(data, block_length, level_limit = 50L, level_keep = 10L, partial_block = FALSE, na.rm = FALSE)
data | data |
---|---|
block_length | integer value to divide data by |
level_limit | integer value to determine the cutoff for the number of factors in a column to display before being reduced to show the number of levels to keep (default is 50) |
level_keep | integer value indicating the top number of factor levels to retain if a column has more than the level limit (default is 10) |
partial_block | a logical which determines whether incomplete blocks are kept in
the analysis in the case where the number of log entries isn't evenly
divisible by the |
na.rm | whether to keep track of missing values as part of the analysis or ignore them |
A data frame where each row represents one block and the columns count the number of occurrences that character/factor level occurred in that block
tabulate_state_vector(security_logs, 30)#>#> # A tibble: 10 x 54 #> ASA Attempt Bytes_TRF_102 Bytes_TRF_120 Bytes_TRF_160 Bytes_TRF_200 #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 4 12 1 1 2 1 #> 2 10 7 0 1 0 0 #> 3 5 13 1 1 0 1 #> 4 6 11 1 0 0 2 #> 5 5 12 0 1 0 0 #> 6 6 12 0 0 0 0 #> 7 4 11 0 0 0 0 #> 8 5 10 0 0 2 0 #> 9 3 7 1 0 0 0 #> 10 1 9 0 1 0 0 #> # … with 48 more variables: Bytes_TRF_208 <dbl>, Bytes_TRF_60 <dbl>, #> # Bytes_TRF_64 <dbl>, Bytes_TRF_70 <dbl>, Bytes_TRF_72 <dbl>, #> # Bytes_TRF_80 <dbl>, Bytes_TRF_90 <dbl>, China <dbl>, CISCO <dbl>, #> # Dst_IP_145.114.4.203 <dbl>, Dst_IP_151.194.233.198 <dbl>, #> # Dst_IP_219.142.109.8 <dbl>, Dst_IP_32.73.26.223 <dbl>, #> # Dst_IP_56.137.121.203 <dbl>, Dst_Port_20000 <dbl>, Dst_Port_25 <dbl>, #> # Dst_Port_593 <dbl>, Dst_Port_80 <dbl>, Dst_Port_90 <dbl>, ePO <dbl>, #> # Failure <dbl>, Firewall <dbl>, IBM <dbl>, India <dbl>, Juniper <dbl>, #> # Korea <dbl>, McAfee <dbl>, Netherlands <dbl>, NSP <dbl>, `Palo Alto #> # Networks` <dbl>, Russia <dbl>, SNIPS <dbl>, Src_IP_174.110.206.174 <dbl>, #> # Src_IP_223.70.128.61 <dbl>, Src_IP_227.12.127.87 <dbl>, #> # Src_IP_28.9.24.154 <dbl>, Src_IP_89.130.69.91 <dbl>, Src_Port_113 <dbl>, #> # Src_Port_135 <dbl>, Src_Port_21 <dbl>, Src_Port_25 <dbl>, #> # Src_Port_80 <dbl>, SRX <dbl>, Success <dbl>, TCP <dbl>, UDP <dbl>, `United #> # Kingdom` <dbl>, US <dbl>