Optimal algorithms for online convex optimization with multi-point bandit feedback

Alekh Agarwal, Ofer Dekel, Lin Xiao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

149 Scopus citations

Abstract

Bandit convex optimization is a special case of online convex optimization with partial information. In this setting, a player attempts to minimize a sequence of adversarially generated convex loss functions, while only observing the value of each function at a single point. In some cases, the minimax regret of these problems is known to be strictly worse than the minimax regret in the corresponding full information setting. We introduce the multi-point bandit setting, in which the player can query each loss function at multiple points. When the player is allowed to query each function at two points, we prove regret bounds that closely resemble bounds for the full information case. This suggests that knowing the value of each loss function at two points is almost as useful as knowing the value of each function everywhere. When the player is allowed to query each function at d + 1 points (d being the dimension of the space), we prove regret bounds that are exactly equivalent to full information bounds for smooth functions.

Original languageEnglish
Title of host publicationCOLT 2010 - The 23rd Conference on Learning Theory
Pages28-40
Number of pages13
StatePublished - 2010
Externally publishedYes
Event23rd Conference on Learning Theory, COLT 2010 - Haifa, Israel
Duration: Jun 27 2010Jun 29 2010

Publication series

NameCOLT 2010 - The 23rd Conference on Learning Theory

Conference

Conference23rd Conference on Learning Theory, COLT 2010
Country/TerritoryIsrael
CityHaifa
Period06/27/1006/29/10

Fingerprint

Dive into the research topics of 'Optimal algorithms for online convex optimization with multi-point bandit feedback'. Together they form a unique fingerprint.

Cite this