# Differentially Private Multi-Armed Bandits in the Shuffle Model. (arXiv:2106.02900v1 [cs.LG])

We give an $(varepsilon,delta)$-differentially private algorithm for the

multi-armed bandit (MAB) problem in the shuffle model with a

distribution-dependent regret of $Oleft(left(sum_{ain

[k]:Delta_a>0}frac{log

T}{Delta_a}right)+frac{ksqrt{logfrac{1}{delta}}log

T}{varepsilon}right)$, and a distribution-independent regret of

$Oleft(sqrt{kTlog T}+frac{ksqrt{logfrac{1}{delta}}log

T}{varepsilon}right)$, where $T$ is the number of rounds, $Delta_a$ is the

suboptimality gap of the arm $a$, and $k$ is the total number of arms. Our

upper bound almost matches the regret of the best known algorithms for the

centralized model, and significantly outperforms the best known algorithm in

the local model.