TY - CHAP
T1 - Communication-efficient distributed optimization of self-concordant empirical loss
AU - Zhang, Yuchen
AU - Xiao, Lin
N1 - Publisher Copyright:
© 2018, Springer Nature Switzerland AG.
PY - 2018
Y1 - 2018
N2 - We consider distributed convex optimization problems originating from sample average approximation of stochastic optimization, or empirical risk minimization in machine learning. We assume that each machine in the distributed computing system has access to a local empirical loss function, constructed with i.i.d. data sampled from a common distribution. We propose a communication-efficient distributed algorithm to minimize the overall empirical loss, which is the average of the local empirical losses. The algorithm is based on an inexact damped Newton method, where the inexact Newton steps are computed by a distributed preconditioned conjugate gradient method. We analyze its iteration complexity and communication efficiency for minimizing self-concordant empirical loss functions, and discuss the results for ridge regression, logistic regression and binary classification with a smoothed hinge loss. In a standard setting for supervised learning where the condition number of the problem grows with square root of the sample size, the required number of communication rounds of the algorithm does not increase with the sample size, and only grows slowly with the number of machines.
AB - We consider distributed convex optimization problems originating from sample average approximation of stochastic optimization, or empirical risk minimization in machine learning. We assume that each machine in the distributed computing system has access to a local empirical loss function, constructed with i.i.d. data sampled from a common distribution. We propose a communication-efficient distributed algorithm to minimize the overall empirical loss, which is the average of the local empirical losses. The algorithm is based on an inexact damped Newton method, where the inexact Newton steps are computed by a distributed preconditioned conjugate gradient method. We analyze its iteration complexity and communication efficiency for minimizing self-concordant empirical loss functions, and discuss the results for ridge regression, logistic regression and binary classification with a smoothed hinge loss. In a standard setting for supervised learning where the condition number of the problem grows with square root of the sample size, the required number of communication rounds of the algorithm does not increase with the sample size, and only grows slowly with the number of machines.
KW - Distributed optimization
KW - Empirical risk minimization
KW - Inexact Newton methods
KW - Self-concordant functions
UR - http://www.scopus.com/inward/record.url?scp=85056653139&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-97478-1_11
DO - 10.1007/978-3-319-97478-1_11
M3 - Chapter
AN - SCOPUS:85056653139
T3 - Lecture Notes in Mathematics
SP - 289
EP - 341
BT - Lecture Notes in Mathematics
PB - Springer Verlag
ER -