Driving big data with big compute

Chansup Byun, William Arcand, David Bestor, Bill Bergeron, Matthew Hubbell, Jeremy Kepner, Andrew McCabe, Peter Michaleas, Julie Mullen, David O'Gwynn, Andrew Prout, Albert Reuther, Antonio Rosa, Charles Yee

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

28 Scopus citations

Abstract

Big Data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily accessible to the Java community and MPI clusters provide high parallel efficiency for compute intensive workloads. Bringing the big data and big compute communities together is an active area of research. The LLGrid team has developed and deployed a number of technologies that aim to provide the best of both worlds. LLGrid MapReduce allows the map/reduce parallel programming model to be used quickly and efficiently in any language on any compute cluster. D4M (Dynamic Distributed Dimensional Data Model) provided a high level distributed arrays interface to the Apache Accumulo database. The accessibility of these technologies is assessed by measuring the effort to use these tools and is typically a few lines of code. The performance is assessed by measuring the insert rate into the Accumulo database. Using these tools a database insert rate of 4M inserts/second has been achieved on an 8 node cluster.

Original languageEnglish
Title of host publication2012 IEEE Conference on High Performance Extreme Computing, HPEC 2012
DOIs
StatePublished - 2012
Externally publishedYes
Event2012 IEEE Conference on High Performance Extreme Computing, HPEC 2012 - Waltham, MA, United States
Duration: Sep 10 2012Sep 12 2012

Publication series

Name2012 IEEE Conference on High Performance Extreme Computing, HPEC 2012

Conference

Conference2012 IEEE Conference on High Performance Extreme Computing, HPEC 2012
Country/TerritoryUnited States
CityWaltham, MA
Period09/10/1209/12/12

Keywords

  • LLGridMapReduce
  • concurrent query
  • d4m
  • hdfs
  • parallel ingestion
  • parallel matlab
  • scheduler

Fingerprint

Dive into the research topics of 'Driving big data with big compute'. Together they form a unique fingerprint.

Cite this