Achieving 100,000,000 database inserts per second using Accumulo and D4M

Jeremy Kepner, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Vijay Gadepally, Matthew Hubbell, Peter Michaleas, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Charles Yee

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

53 Scopus citations

Abstract

The Apache Accumulo database is an open source relaxed consistency database that is widely used for government applications. Accumulo is designed to deliver high performance on unstructured data such as graphs of network data. This paper tests the performance of Accumulo using data from the Graph500 benchmark. The Dynamic Distributed Dimensional Data Model (D4M) software is used to implement the benchmark on a 216-node cluster running the MIT SuperCloud software stack. A peak performance of over 100,000,000 database inserts per second was achieved which is 100× larger than the highest previously published value for any other database. The performance scales linearly with the number of ingest clients, number of database servers, and data size. The performance was achieved by adapting several supercomputing techniques to this application: distributed arrays, domain decomposition, adaptive load balancing, and single-program-multiple-data programming.

Original languageEnglish
Title of host publication2014 IEEE High Performance Extreme Computing Conference, HPEC 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781479962334
DOIs
StatePublished - Feb 11 2014
Externally publishedYes
Event2014 IEEE High Performance Extreme Computing Conference, HPEC 2014 - Waltham, United States
Duration: Sep 9 2014Sep 11 2014

Publication series

Name2014 IEEE High Performance Extreme Computing Conference, HPEC 2014

Conference

Conference2014 IEEE High Performance Extreme Computing Conference, HPEC 2014
Country/TerritoryUnited States
CityWaltham
Period09/9/1409/11/14

Keywords

  • Accumulo
  • Big Data
  • D4M
  • Graph500
  • Hadoop
  • MIT SuperCloud

Fingerprint

Dive into the research topics of 'Achieving 100,000,000 database inserts per second using Accumulo and D4M'. Together they form a unique fingerprint.

Cite this