TY - GEN
T1 - Hypersparse Network Flow Analysis of Packets with GraphBLAS
AU - Trigg, Tyler
AU - Meiners, Chad
AU - Pisharody, Sandeep
AU - Jananthan, Hayden
AU - Jones, Michael
AU - Michaleas, Adam
AU - Davis, Timothy
AU - Welch, Erik
AU - Arcand, William
AU - Bestor, David
AU - Bergeron, William
AU - Byun, Chansup
AU - Gadepally, Vijay
AU - Houle, Micheal
AU - Hubbell, Matthew
AU - Klein, Anna
AU - Michaleas, Peter
AU - Milechin, Lauren
AU - Mullen, Julie
AU - Prout, Andrew
AU - Reuther, Albert
AU - Rosa, Antonio
AU - Samsi, Siddharth
AU - Stetson, Doug
AU - Yee, Charles
AU - Kepner, Jeremy
N1 - Funding Information:
This material is based upon work supported by the Assistant Secretary of Defense for Research and Engineering under Air Force Contract No. FA8702-15-D-0001, National Science Foundation CCF-1533644, and United States Air Force Research Laboratory and Artificial Intelligence Accelerator Cooperative Agreement Number FA8750-19-2-1000. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Assistant Secretary of Defense for Research and Engineering, the National Science Foundation, or the United States Air Force. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Internet analysis is a major challenge due to the volume and rate of network traffic. In lieu of analyzing traffic as raw packets, network analysts often rely on compressed network flows (netflows) that contain the start time, stop time, source, destination, and number of packets in each direction. However, many traffic analyses benefit from temporal aggregation of multiple simultaneous netflows, which can be computationally challenging. To alleviate this concern, a novel netflow compression and resampling method has been developed leveraging GraphBLAS hyperspace traffic matrices that preserve anonymization while enabling subrange analysis. Standard multi-temporal spatial analyses are then performed on each sub range to generate detailed statistical aggregates of the source packets, source fan-out, unique links, destination fan-in, and destination packets of each subrange which can then be used for background modeling and anomaly detection. A simple file format based on GraphBLAS sparse matrices is developed for storing these statistical aggregates. This method is scale tested on the MIT SuperCloud using a 50 trillion packet netflow corpus from several hundred sites collected over several months. The resulting compression achieved is significant (<0.1 bit per packet) enabling extremely large netflow analyses to be stored and transported. The single node parallel performance is analyzed in terms of both processors and threads showing that a single node can perform hundreds of simultaneous analyses at over a million packets/sec (roughly equivalent to a 10 Gigabit link).
AB - Internet analysis is a major challenge due to the volume and rate of network traffic. In lieu of analyzing traffic as raw packets, network analysts often rely on compressed network flows (netflows) that contain the start time, stop time, source, destination, and number of packets in each direction. However, many traffic analyses benefit from temporal aggregation of multiple simultaneous netflows, which can be computationally challenging. To alleviate this concern, a novel netflow compression and resampling method has been developed leveraging GraphBLAS hyperspace traffic matrices that preserve anonymization while enabling subrange analysis. Standard multi-temporal spatial analyses are then performed on each sub range to generate detailed statistical aggregates of the source packets, source fan-out, unique links, destination fan-in, and destination packets of each subrange which can then be used for background modeling and anomaly detection. A simple file format based on GraphBLAS sparse matrices is developed for storing these statistical aggregates. This method is scale tested on the MIT SuperCloud using a 50 trillion packet netflow corpus from several hundred sites collected over several months. The resulting compression achieved is significant (<0.1 bit per packet) enabling extremely large netflow analyses to be stored and transported. The single node parallel performance is analyzed in terms of both processors and threads showing that a single node can perform hundreds of simultaneous analyses at over a million packets/sec (roughly equivalent to a 10 Gigabit link).
KW - compression
KW - hypersparse matrices
KW - network analyses
KW - streaming graphs
UR - http://www.scopus.com/inward/record.url?scp=85142236749&partnerID=8YFLogxK
U2 - 10.1109/HPEC55821.2022.9926320
DO - 10.1109/HPEC55821.2022.9926320
M3 - Conference contribution
AN - SCOPUS:85142236749
T3 - 2022 IEEE High Performance Extreme Computing Conference, HPEC 2022
BT - 2022 IEEE High Performance Extreme Computing Conference, HPEC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 19 September 2022 through 23 September 2022
ER -