SBEED: Convergent reinforcement learning with nonlinear function approximation

Bo Dai, Albert Shaw, Lihong Li, Lin Xiao, Niao He, Zhen Liu, Jlanshu Chen, Le Song

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

34 Scopus citations

Abstract

When function approximation is used, solving the Bellman optimality equation with stability guarantees has remained a major open problem in reinforcement learning for decades. The fun-damental difficulty is that the Bellman operator may bccomc an expansion in general, resulting in oscillating and even divergent behavior of popular algorithms like Q-learning. In this paper, we revisit the Bellman equation, and reformulate it into a novel primal-dual optimization problem using Nesterov's smoothing technique and the Lcgcndre-Fenchel transformation. We then develop a new algorithm, called Smoothed Bellman Error Embedding, to solve this optimization problem where any differentiate function class may be used. We provide what we believe to be the first convergence guarantee for general nonlinear function approximation, and analyze the algorithm's sample complexity. Empirically, our algorithm compares favorably to state-of-the-art baselines in several benchmark control problems.

Original languageEnglish
Title of host publication35th International Conference on Machine Learning, ICML 2018
EditorsAndreas Krause, Jennifer Dy
PublisherInternational Machine Learning Society (IMLS)
Pages1809-1818
Number of pages10
ISBN (Electronic)9781510867963
StatePublished - 2018
Externally publishedYes
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: Jul 10 2018Jul 15 2018

Publication series

Name35th International Conference on Machine Learning, ICML 2018
Volume3

Conference

Conference35th International Conference on Machine Learning, ICML 2018
Country/TerritorySweden
CityStockholm
Period07/10/1807/15/18

Fingerprint

Dive into the research topics of 'SBEED: Convergent reinforcement learning with nonlinear function approximation'. Together they form a unique fingerprint.

Cite this