TY - GEN
T1 - Hyperscaling Internet Graph Analysis with D4M on the MIT SuperCloud
AU - Gadepally, Vijay
AU - Kepner, Jeremy
AU - Milechin, Lauren
AU - Arcand, William
AU - Bestor, David
AU - Bergeron, Bill
AU - Byun, Chansup
AU - Hubbell, Matthew
AU - Houle, Micheal
AU - Jones, Micheal
AU - Michaleas, Peter
AU - Mullen, Julie
AU - Prout, Andrew
AU - Rosa, Antonio
AU - Yee, Charles
AU - Samsi, Siddharth
AU - Reuther, Albert
N1 - Funding Information:
Vijay Gadepally is the corresponding author and can be reached at vijayg [at] ll.mit.edu. This material is based upon work supported by the Assistant Secretary of Defense for Research and Engineering under Air Force Contract No. FA8721-05-C-0002 and/or FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Assistant Secretary of Defense for Research and Engineering.
Publisher Copyright:
© 2018 IEEE.
PY - 2018/11/26
Y1 - 2018/11/26
N2 - Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this challenge can be amplified by many orders of magnitude. Development of novel computer network traffic analytics requires: high level programming environments, massive amount of packet capture (PCAP) data, and diverse data products for 'at scale' algorithm pipeline development. D4M (Dynamic Distributed Dimensional Data Model) combines the power of sparse linear algebra, associative arrays, parallel processing, and distributed databases (such as SciDB and Apache Accumulo) to provide a scalable data and computation system that addresses the big data problems associated with network analytics development. Combining D4M with the MIT SuperCloud manycore processors and parallel storage system enables network analysts to interactively process massive amounts of data in minutes. To demonstrate these capabilities, we have implemented a representative analytics pipeline in D4M and benchmarked it on 96 hours of Gigabit PCAP data with MIT SuperCloud. The entire pipeline from uncompressing the raw files to database ingest was implemented in 135 lines of D4M code and achieved speedups of over 20,000.
AB - Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this challenge can be amplified by many orders of magnitude. Development of novel computer network traffic analytics requires: high level programming environments, massive amount of packet capture (PCAP) data, and diverse data products for 'at scale' algorithm pipeline development. D4M (Dynamic Distributed Dimensional Data Model) combines the power of sparse linear algebra, associative arrays, parallel processing, and distributed databases (such as SciDB and Apache Accumulo) to provide a scalable data and computation system that addresses the big data problems associated with network analytics development. Combining D4M with the MIT SuperCloud manycore processors and parallel storage system enables network analysts to interactively process massive amounts of data in minutes. To demonstrate these capabilities, we have implemented a representative analytics pipeline in D4M and benchmarked it on 96 hours of Gigabit PCAP data with MIT SuperCloud. The entire pipeline from uncompressing the raw files to database ingest was implemented in 135 lines of D4M code and achieved speedups of over 20,000.
UR - http://www.scopus.com/inward/record.url?scp=85060098013&partnerID=8YFLogxK
U2 - 10.1109/HPEC.2018.8547552
DO - 10.1109/HPEC.2018.8547552
M3 - Conference contribution
AN - SCOPUS:85060098013
T3 - 2018 IEEE High Performance Extreme Computing Conference, HPEC 2018
BT - 2018 IEEE High Performance Extreme Computing Conference, HPEC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 25 September 2018 through 27 September 2018
ER -