Hyperscaling Internet Graph Analysis with D4M on the MIT SuperCloud

Vijay Gadepally, Jeremy Kepner, Lauren Milechin, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Matthew Hubbell, Micheal Houle, Micheal Jones, Peter Michaleas, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Siddharth Samsi, Albert Reuther

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this challenge can be amplified by many orders of magnitude. Development of novel computer network traffic analytics requires: high level programming environments, massive amount of packet capture (PCAP) data, and diverse data products for 'at scale' algorithm pipeline development. D4M (Dynamic Distributed Dimensional Data Model) combines the power of sparse linear algebra, associative arrays, parallel processing, and distributed databases (such as SciDB and Apache Accumulo) to provide a scalable data and computation system that addresses the big data problems associated with network analytics development. Combining D4M with the MIT SuperCloud manycore processors and parallel storage system enables network analysts to interactively process massive amounts of data in minutes. To demonstrate these capabilities, we have implemented a representative analytics pipeline in D4M and benchmarked it on 96 hours of Gigabit PCAP data with MIT SuperCloud. The entire pipeline from uncompressing the raw files to database ingest was implemented in 135 lines of D4M code and achieved speedups of over 20,000.

Original languageEnglish
Title of host publication2018 IEEE High Performance Extreme Computing Conference, HPEC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538659892
DOIs
StatePublished - Nov 26 2018
Externally publishedYes
Event2018 IEEE High Performance Extreme Computing Conference, HPEC 2018 - Waltham, United States
Duration: Sep 25 2018Sep 27 2018

Publication series

Name2018 IEEE High Performance Extreme Computing Conference, HPEC 2018

Conference

Conference2018 IEEE High Performance Extreme Computing Conference, HPEC 2018
Country/TerritoryUnited States
CityWaltham
Period09/25/1809/27/18

Fingerprint

Dive into the research topics of 'Hyperscaling Internet Graph Analysis with D4M on the MIT SuperCloud'. Together they form a unique fingerprint.

Cite this