Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis

Albert Reuther, Jeremy Kepner, Chansup Byun, Siddharth Samsi, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna Klein, Lauren Milechin, Julia Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Peter Michaleas

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

92 Scopus citations

Abstract

Interactive massively parallel computations are critical for machine learning and data analysis. These computations are a staple of the MIT Lincoln Laboratory Supercomputing Center (LLSC) and has required the LLSC to develop unique interactive supercomputing capabilities. Scaling interactive machine learning frameworks, such as TensorFlow, and data analysis environments, such as MATLAB/Octave, to tens of thousands of cores presents many technical challenges - in particular, rapidly dispatching many tasks through a scheduler, such as Slurm, and starting many instances of applications with thousands of dependencies. Careful tuning of launches and prepositioning of applications overcome these challenges and allow the launching of thousands of tasks in seconds on a 40,000-core supercomputer. Specifically, this work demonstrates launching 32,000 TensorFlow processes in 4 seconds and launching 262,000 Octave processes in 40 seconds. These capabilities allow researchers to rapidly explore novel machine learning architecture and data analysis algorithms.

Original languageEnglish
Title of host publication2018 IEEE High Performance Extreme Computing Conference, HPEC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538659892
DOIs
StatePublished - Nov 26 2018
Externally publishedYes
Event2018 IEEE High Performance Extreme Computing Conference, HPEC 2018 - Waltham, United States
Duration: Sep 25 2018Sep 27 2018

Publication series

Name2018 IEEE High Performance Extreme Computing Conference, HPEC 2018

Conference

Conference2018 IEEE High Performance Extreme Computing Conference, HPEC 2018
Country/TerritoryUnited States
CityWaltham
Period09/25/1809/27/18

Keywords

  • Data analytics
  • High performance computing
  • Interactive
  • Machine learning
  • Manycore
  • Scheduler

Fingerprint

Dive into the research topics of 'Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis'. Together they form a unique fingerprint.

Cite this