Dual averaging methods for regularized stochastic learning and online optimization

Lin Xiao

Research output: Contribution to journalArticlepeer-review

457 Scopus citations


We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as ℓ1-norm for promoting sparsity. We develop extensions of Nesterov's dual averaging method, that can exploit the regularization structure in an online setting. At each iteration of these methods, the learning variables are adjusted by solving a simple minimization problem that involves the running average of all past subgradients of the loss function and the whole regularization term, not just its subgradient. In the case of ℓ1 -regularization, our method is particularly effective in obtaining sparse solutions. We show that these methods achieve the optimal convergence rates or regret bounds that are standard in the literature on stochastic and online convex optimization. For stochastic learning problems in which the loss functions have Lipschitz continuous gradients, we also present an accelerated version of the dual averaging method.

Original languageEnglish
Pages (from-to)2543-2596
Number of pages54
JournalJournal of Machine Learning Research
StatePublished - Oct 2010
Externally publishedYes


  • Accelerated gradient methods
  • Dual averaging methods
  • Online optimization
  • Stochastic learning
  • Structural convex optimization
  • ℓ-regularization


Dive into the research topics of 'Dual averaging methods for regularized stochastic learning and online optimization'. Together they form a unique fingerprint.

Cite this