Adversarial Transfer Attacks With Unknown Data and Class Overlap. (arXiv:2109.11125v1 [cs.LG])

The ability to transfer adversarial attacks from one model (the surrogate) to
another model (the victim) has been an issue of concern within the machine
learning (ML) community. The ability to successfully evade unseen models
represents an uncomfortable level of ease toward implementing attacks. In this
work we note that as studied, current transfer attack research has an
unrealistic advantage for the attacker: the attacker has the exact same
training data as the victim. We present the first study of transferring
adversarial attacks focusing on the data available to attacker and victim under
imperfect settings without querying the victim, where there is some variable
level of overlap in the exact data used or in the classes learned by each
model. This threat model is relevant to applications in medicine, malware, and
others. Under this new threat model attack success rate is not correlated with
data or class overlap in the way one would expect, and varies with dataset.
This makes it difficult for attacker and defender to reason about each other
and contributes to the broader study of model robustness and security. We
remedy this by developing a masked version of Projected Gradient Descent that
simulates class disparity, which enables the attacker to reliably estimate a
lower-bound on their attack’s success.