Hock Peng Chan and Shouri Hu (2022). INFINITE-ARMS BANDIT: OPTIMALITY VIA CONFIDENCE BOUNDS. Vol 32 No. 3, 1683-1699.

Abstract: The literature on the infinite-arms bandit problem includes a regret lower bound of all allocation strategies for Bernoulli rewards with a uniform prior, and strategies based on success runs. Furthermore, a two-target algorithm has been proposed that achieves the regret lower bound, and optimality has been extended to Bernoulli rewards with general priors. We present a confidence-bound target (CBT) algorithm that achieves optimality for rewards that are bounded above. For each arm, we construct a confidence bound and compare it against those of other arms and a target value to determine whether the arm should be sampled further. The target value depends on the assumed priors of the arm means. In the absence of information on the prior, the target value is determined empirically. Numerical studies show that the CBT algorithm is versatile and outperforms its competitors.

Key words and phrases: MAB, optimal allocation, sequential analysis, UCB.