Differentially Private Histograms in the Shuffle Model from Fake Users. (arXiv:2104.02739v1 [cs.CR])

There has been much recent work in the shuffle model of differential privacy,
particularly for approximate $d$-bin histograms. While these protocols achieve
low error, the number of messages sent by each user — the message complexity
— has so far scaled with $d$ or the privacy parameters. This is problematic
for off-the-shelf implementations of the shuffler. Here, we propose a protocol
whose message complexity is two when there are sufficiently many users. The
protocol essentially pairs each row in the dataset with a fake row and performs
a simple randomization on all rows. Experiments on real-world language confirm
that the protocol’s error is small. We also bound the impact corrupt users have
on the estimates and show that it compares favorably to prior state-of-the-art.