Stochastic variance reduction methods for policy evaluation

Simon S. Du, Jianshu Chen, Lihong Li, Lin Xiao, Dengyong Zhou

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

18 Scopus citations

Abstract

Policy evaluation is concerned with estimating the value function that predicts long-term values of states under a given policy. It is a crucial step in many reinforcement-learning algorithms. In this paper, wc focus on policy evaluation with linear function approximation over a fixed dataset. We first transform the empirical policy evaluation problem into a (quadratic) convex-concave saddle-point problem, and then present a primal-dual batch gradient method, as well as two stochastic variance reduction methods for solving the problem. These algorithms scale linearly in both sample size and feature dimension. Moreover, they achieve linear convergence even when the saddle-point problem has only strong concavity in the dual variables but no strong convexity in the primal variables. Numerical experiments on benchmark problems demonstrate the effectiveness of our methods.

Original languageEnglish
Title of host publication34th International Conference on Machine Learning, ICML 2017
PublisherInternational Machine Learning Society (IMLS)
Pages1744-1761
Number of pages18
ISBN (Electronic)9781510855144
StatePublished - 2017
Externally publishedYes
Event34th International Conference on Machine Learning, ICML 2017 - Sydney, Australia
Duration: Aug 6 2017Aug 11 2017

Publication series

Name34th International Conference on Machine Learning, ICML 2017
Volume3

Conference

Conference34th International Conference on Machine Learning, ICML 2017
Country/TerritoryAustralia
CitySydney
Period08/6/1708/11/17

Fingerprint

Dive into the research topics of 'Stochastic variance reduction methods for policy evaluation'. Together they form a unique fingerprint.

Cite this