Node-Based Job Scheduling for Large Scale Simulations of Short Running Jobs

Chansup Byun, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Charles Yee, Jeremy Kepner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Diverse workloads such as interactive supercomputing, big data analysis, and large-scale AI algorithm development, requires a high-performance scheduler. This paper presents a novel node-based scheduling approach for large scale simulations of short running jobs on MIT SuperCloud systems, that allows the resources to be fully utilized for both long running batch jobs while simultaneously providing fast launch and release of large-scale short running jobs. The node-based scheduling approach has demonstrated up to 100 times faster scheduler performance that other state-of-The-Art systems.

Original languageEnglish
Title of host publication2021 IEEE High Performance Extreme Computing Conference, HPEC 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665423694
DOIs
StatePublished - 2021
Externally publishedYes
Event2021 IEEE High Performance Extreme Computing Conference, HPEC 2021 - Virtual, Online, United States
Duration: Sep 20 2021Sep 24 2021

Publication series

Name2021 IEEE High Performance Extreme Computing Conference, HPEC 2021

Conference

Conference2021 IEEE High Performance Extreme Computing Conference, HPEC 2021
Country/TerritoryUnited States
CityVirtual, Online
Period09/20/2109/24/21

Keywords

  • cluster utilization
  • fast scheduling
  • job management
  • scheduling performance

Fingerprint

Dive into the research topics of 'Node-Based Job Scheduling for Large Scale Simulations of Short Running Jobs'. Together they form a unique fingerprint.

Cite this