Lustre, hadoop, accumulo

Jeremy Kepner, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Lauren Edwards, Vijay Gadepally, Matthew Hubbell, Peter Michaleas, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Albert Reuther

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Data processing systems impose multiple views on data as it is processed by the system. These views include spreadsheets, databases, matrices, and graphs. There are a wide variety of technologies that can be used to store and process data through these different steps. The Lustre parallel file system, the Hadoop distributed file system, and the Accumulo database are all designed to address the largest and the most challenging data storage problems. There have been many ad-hoc comparisons of these technologies. This paper describes the foundational principles of each technology, provides simple models for assessing their capabilities, and compares the various technologies on a hypothetical common cluster. These comparisons indicate that Lustre provides 2x more storage capacity, is less likely to loose data during 3 simultaneous drive failures, and provides higher bandwidth on general purpose workloads. Hadoop can provide 4x greater read bandwidth on special purpose workloads. Accumulo provides 105 lower latency on random lookups than either Lustre or Hadoop but Accumulo's bulk bandwidth is 10x less. Significant recent work has been done to enable mix-and-match solutions that allow Lustre, Hadoop, and Accumulo to be combined in different ways.

Original languageEnglish
Title of host publication2015 IEEE High Performance Extreme Computing Conference, HPEC 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781467392860
DOIs
StatePublished - Nov 9 2015
Externally publishedYes
EventIEEE High Performance Extreme Computing Conference, HPEC 2015 - Waltham, United States
Duration: Sep 15 2015Sep 17 2015

Publication series

Name2015 IEEE High Performance Extreme Computing Conference, HPEC 2015

Conference

ConferenceIEEE High Performance Extreme Computing Conference, HPEC 2015
Country/TerritoryUnited States
CityWaltham
Period09/15/1509/17/15

Keywords

  • Accumulo
  • Big Data
  • Hadoop
  • Insider
  • Lustre
  • Parallel Performance

Fingerprint

Dive into the research topics of 'Lustre, hadoop, accumulo'. Together they form a unique fingerprint.

Cite this