TY - GEN
T1 - D4M
T2 - IEEE High Performance Extreme Computing Conference, HPEC 2015
AU - Gadepally, Vijay
AU - Kepner, Jeremy
AU - Arcand, William
AU - Bestor, David
AU - Bergeron, Bill
AU - Byun, Chansup
AU - Edwards, Lauren
AU - Hubbell, Matthew
AU - Michaleas, Peter
AU - Mullen, Julie
AU - Prout, Andrew
AU - Rosa, Antonio
AU - Yee, Charles
AU - Reuther, Albert
N1 - Funding Information:
This material is based upon work supported by the National Science Foundation under Grant No. DMS-1312831. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Publisher Copyright:
© 2015 IEEE.
PY - 2015/11/9
Y1 - 2015/11/9
N2 - The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Numerous tools exist that allow users to store, query and index these massive quantities of data. Each storage or database engine comes with the promise of dealing with complex data. Scientists and engineers who wish to use these systems often quickly find that there is no single technology that offers a panacea to the complexity of information. When using multiple technologies, however, there is significant trouble in designing the movement of information between storage and database engines to support an end-to-end application along with a steep learning curve associated with learning the nuances of each underlying technology. In this article, we present the Dynamic Distributed Dimensional Data Model (D4M) as a potential tool to unify database and storage engine operations. Previous articles on D4M have showcased the ability of D4M to interact with the popular NoSQL Accumulo database. Recently however, D4M now operates on a variety of backend storage or database engines while providing a federated look to the end user through the use of associative arrays. In order to showcase how new databases may be supported by D4M, we describe the process of building the D4M-SciDB connector and present performance of this connection.
AB - The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Numerous tools exist that allow users to store, query and index these massive quantities of data. Each storage or database engine comes with the promise of dealing with complex data. Scientists and engineers who wish to use these systems often quickly find that there is no single technology that offers a panacea to the complexity of information. When using multiple technologies, however, there is significant trouble in designing the movement of information between storage and database engines to support an end-to-end application along with a steep learning curve associated with learning the nuances of each underlying technology. In this article, we present the Dynamic Distributed Dimensional Data Model (D4M) as a potential tool to unify database and storage engine operations. Previous articles on D4M have showcased the ability of D4M to interact with the popular NoSQL Accumulo database. Recently however, D4M now operates on a variety of backend storage or database engines while providing a federated look to the end user through the use of associative arrays. In order to showcase how new databases may be supported by D4M, we describe the process of building the D4M-SciDB connector and present performance of this connection.
KW - Big Data
KW - Data Analytics
KW - Dimensional Analysis
KW - Federated Databases
UR - http://www.scopus.com/inward/record.url?scp=84964822753&partnerID=8YFLogxK
U2 - 10.1109/HPEC.2015.7322472
DO - 10.1109/HPEC.2015.7322472
M3 - Conference contribution
AN - SCOPUS:84964822753
T3 - 2015 IEEE High Performance Extreme Computing Conference, HPEC 2015
BT - 2015 IEEE High Performance Extreme Computing Conference, HPEC 2015
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 15 September 2015 through 17 September 2015
ER -