TY - GEN
T1 - Performance measurements of supercomputing and cloud storage solutions
AU - Jones, Michael
AU - Kepner, Jeremy
AU - Arcand, William
AU - Bestor, David
AU - Bergeron, Bill
AU - Gadepally, Vijay
AU - Houle, Michael
AU - Hubbell, Matthew
AU - Michaleas, Peter
AU - Prout, Andrew
AU - Reuther, Albert
AU - Samsi, Siddharth
AU - Monticiollo, Paul
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/10/30
Y1 - 2017/10/30
N2 - Increasing amounts of data from varied sources, particularly in the fields of machine learning and graph analytics, are causing storage requirements to grow rapidly. A variety of technologies exist for storing and sharing these data, ranging from parallel file systems used by supercomputers to distributed block storage systems found in clouds. Relatively few comparative measurements exist to inform decisions about which storage systems are best suited for particular tasks. This work provides these measurements for two of the most popular storage technologies: Lustre and Amazon S3. Lustre is an open-source, high performance, parallel file system used by many of the largest supercomputers in the world. Amazon's Simple Storage Service, or S3, is part of the Amazon Web Services offering, and offers a scalable, distributed option to store and retrieve data from anywhere on the Internet. Parallel processing is essential for achieving high performance on modern storage systems. The performance tests used span the gamut of parallel I/O scenarios, ranging from single-client, single-node Amazon S3 and Lustre performance to a large-scale, multi-client test designed to demonstrate the capabilities of a modern storage appliance under heavy load. These results show that, when parallel I/O is used correctly (i.e., many simultaneous read or write processes), full network bandwidth performance is achievable and ranged from 10 gigabits/s over a 10 GigE S3 connection to 0.35 terabits/s using Lustre on a 1200 port 10 GigE switch. These results demonstrate that S3 is well-suited to sharing vast quantities of data over the Internet, while Lustre is well-suited to processing large quantities of data locally.
AB - Increasing amounts of data from varied sources, particularly in the fields of machine learning and graph analytics, are causing storage requirements to grow rapidly. A variety of technologies exist for storing and sharing these data, ranging from parallel file systems used by supercomputers to distributed block storage systems found in clouds. Relatively few comparative measurements exist to inform decisions about which storage systems are best suited for particular tasks. This work provides these measurements for two of the most popular storage technologies: Lustre and Amazon S3. Lustre is an open-source, high performance, parallel file system used by many of the largest supercomputers in the world. Amazon's Simple Storage Service, or S3, is part of the Amazon Web Services offering, and offers a scalable, distributed option to store and retrieve data from anywhere on the Internet. Parallel processing is essential for achieving high performance on modern storage systems. The performance tests used span the gamut of parallel I/O scenarios, ranging from single-client, single-node Amazon S3 and Lustre performance to a large-scale, multi-client test designed to demonstrate the capabilities of a modern storage appliance under heavy load. These results show that, when parallel I/O is used correctly (i.e., many simultaneous read or write processes), full network bandwidth performance is achievable and ranged from 10 gigabits/s over a 10 GigE S3 connection to 0.35 terabits/s using Lustre on a 1200 port 10 GigE switch. These results demonstrate that S3 is well-suited to sharing vast quantities of data over the Internet, while Lustre is well-suited to processing large quantities of data locally.
KW - Amazon Simple Storage Service
KW - High Performance Computing
KW - High Performance Storage
KW - Lustre
KW - MIT Super-Cloud
UR - http://www.scopus.com/inward/record.url?scp=85041214009&partnerID=8YFLogxK
U2 - 10.1109/HPEC.2017.8091073
DO - 10.1109/HPEC.2017.8091073
M3 - Conference contribution
AN - SCOPUS:85041214009
T3 - 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017
BT - 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017
Y2 - 12 September 2017 through 14 September 2017
ER -