TY - GEN
T1 - Scaling up stochastic dual coordinate ascent
AU - Tran, Kenneth
AU - Hosseini, Saghar
AU - Xiao, Lin
AU - Finley, Thomas
AU - Bilenko, Mikhail
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/8/10
Y1 - 2015/8/10
N2 - Stochastic Dual Coordinate Ascent (SDCA) has recently emerged as a state-of-the-art method for solving large-scale supervised learning problems formulated as minimization of convex loss functions. It performs iterative, random-coordinate updates to maximize the dual objective. Due to the sequential nature of the iterations, it is typically implemented as a single-threaded algorithm limited to inmemory datasets. In this paper, we introduce an asynchronous parallel version of the algorithm, analyze its convergence properties, and propose a solution for primal-dual synchronization required to achieve convergence in practice. In addition, we describe a method for scaling the algorithm to out-of-memory datasets via multi-threaded deserialization of block-compressed data. This approach yields sufficient pseudo-randomness to provide the same convergence rate as random-order in-memory access. Empirical evaluation demonstrates the efficiency of the proposed methods and their ability to fully utilize computational resources and scale to out-of-memory datasets.
AB - Stochastic Dual Coordinate Ascent (SDCA) has recently emerged as a state-of-the-art method for solving large-scale supervised learning problems formulated as minimization of convex loss functions. It performs iterative, random-coordinate updates to maximize the dual objective. Due to the sequential nature of the iterations, it is typically implemented as a single-threaded algorithm limited to inmemory datasets. In this paper, we introduce an asynchronous parallel version of the algorithm, analyze its convergence properties, and propose a solution for primal-dual synchronization required to achieve convergence in practice. In addition, we describe a method for scaling the algorithm to out-of-memory datasets via multi-threaded deserialization of block-compressed data. This approach yields sufficient pseudo-randomness to provide the same convergence rate as random-order in-memory access. Empirical evaluation demonstrates the efficiency of the proposed methods and their ability to fully utilize computational resources and scale to out-of-memory datasets.
UR - http://www.scopus.com/inward/record.url?scp=84954173914&partnerID=8YFLogxK
U2 - 10.1145/2783258.2783412
DO - 10.1145/2783258.2783412
M3 - Conference contribution
AN - SCOPUS:84954173914
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 1185
EP - 1194
BT - KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015
Y2 - 10 August 2015 through 13 August 2015
ER -