TY - GEN
T1 - Optimal algorithms for online convex optimization with multi-point bandit feedback
AU - Agarwal, Alekh
AU - Dekel, Ofer
AU - Xiao, Lin
PY - 2010
Y1 - 2010
N2 - Bandit convex optimization is a special case of online convex optimization with partial information. In this setting, a player attempts to minimize a sequence of adversarially generated convex loss functions, while only observing the value of each function at a single point. In some cases, the minimax regret of these problems is known to be strictly worse than the minimax regret in the corresponding full information setting. We introduce the multi-point bandit setting, in which the player can query each loss function at multiple points. When the player is allowed to query each function at two points, we prove regret bounds that closely resemble bounds for the full information case. This suggests that knowing the value of each loss function at two points is almost as useful as knowing the value of each function everywhere. When the player is allowed to query each function at d + 1 points (d being the dimension of the space), we prove regret bounds that are exactly equivalent to full information bounds for smooth functions.
AB - Bandit convex optimization is a special case of online convex optimization with partial information. In this setting, a player attempts to minimize a sequence of adversarially generated convex loss functions, while only observing the value of each function at a single point. In some cases, the minimax regret of these problems is known to be strictly worse than the minimax regret in the corresponding full information setting. We introduce the multi-point bandit setting, in which the player can query each loss function at multiple points. When the player is allowed to query each function at two points, we prove regret bounds that closely resemble bounds for the full information case. This suggests that knowing the value of each loss function at two points is almost as useful as knowing the value of each function everywhere. When the player is allowed to query each function at d + 1 points (d being the dimension of the space), we prove regret bounds that are exactly equivalent to full information bounds for smooth functions.
UR - http://www.scopus.com/inward/record.url?scp=84860610530&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84860610530
SN - 9780982252925
T3 - COLT 2010 - The 23rd Conference on Learning Theory
SP - 28
EP - 40
BT - COLT 2010 - The 23rd Conference on Learning Theory
Y2 - 27 June 2010 through 29 June 2010
ER -