TY - GEN

T1 - Optimal algorithms for online convex optimization with multi-point bandit feedback

AU - Agarwal, Alekh

AU - Dekel, Ofer

AU - Xiao, Lin

PY - 2010

Y1 - 2010

N2 - Bandit convex optimization is a special case of online convex optimization with partial information. In this setting, a player attempts to minimize a sequence of adversarially generated convex loss functions, while only observing the value of each function at a single point. In some cases, the minimax regret of these problems is known to be strictly worse than the minimax regret in the corresponding full information setting. We introduce the multi-point bandit setting, in which the player can query each loss function at multiple points. When the player is allowed to query each function at two points, we prove regret bounds that closely resemble bounds for the full information case. This suggests that knowing the value of each loss function at two points is almost as useful as knowing the value of each function everywhere. When the player is allowed to query each function at d + 1 points (d being the dimension of the space), we prove regret bounds that are exactly equivalent to full information bounds for smooth functions.

AB - Bandit convex optimization is a special case of online convex optimization with partial information. In this setting, a player attempts to minimize a sequence of adversarially generated convex loss functions, while only observing the value of each function at a single point. In some cases, the minimax regret of these problems is known to be strictly worse than the minimax regret in the corresponding full information setting. We introduce the multi-point bandit setting, in which the player can query each loss function at multiple points. When the player is allowed to query each function at two points, we prove regret bounds that closely resemble bounds for the full information case. This suggests that knowing the value of each loss function at two points is almost as useful as knowing the value of each function everywhere. When the player is allowed to query each function at d + 1 points (d being the dimension of the space), we prove regret bounds that are exactly equivalent to full information bounds for smooth functions.

UR - http://www.scopus.com/inward/record.url?scp=84860610530&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84860610530

SN - 9780982252925

T3 - COLT 2010 - The 23rd Conference on Learning Theory

SP - 28

EP - 40

BT - COLT 2010 - The 23rd Conference on Learning Theory

Y2 - 27 June 2010 through 29 June 2010

ER -