Improving Differentially Private SGD via Randomly Sparsified Gradients. (arXiv:2112.00845v2 [cs.LG] UPDATED)

Differentially private stochastic gradient descent (DP-SGD) has been widely
adopted in deep learning to provide rigorously defined privacy, which requires
gradient clipping to bound the maximum norm of individual gradients and
additive isotropic Gaussian noise. With analysis of the convergence rate of
DP-SGD in a non-convex setting, we reveal that randomly sparsifying gradients
before clipping and noisification adjusts a trade-off between internal
components of the convergence bound and leads to a smaller upper bound when the
noise is dominant. Additionally, our theoretical analysis and extensive
empirical evaluations show that the trade-off is not trivial but possibly a
unique property of DP-SGD, as either canceling noisification or gradient
clipping removes the trade-off in the bound. Based on the analysis, we propose
an efficient and lightweight approach of random sparsification (RS) for DP-SGD.
Applying RS across various DP-SGD frameworks improves performance, while the
produced sparse gradients of RS exhibit advantages in reducing communication
cost and strengthening security against reconstruction attacks, which are also
key problems in private machine learning.