One Detector to Rule Them All: Towards a General Deepfake Attack Detection Framework. (arXiv:2105.00187v1 [cs.CV])

Deep learning-based video manipulation methods have become widely accessible
to the masses. With little to no effort, people can quickly learn how to
generate deepfake (DF) videos. While deep learning-based detection methods have
been proposed to identify specific types of DFs, their performance suffers for
other types of deepfake methods, including real-world deepfakes, on which they
are not sufficiently trained. In other words, most of the proposed deep
learning-based detection methods lack transferability and generalizability.
Beyond detecting a single type of DF from benchmark deepfake datasets, we focus
on developing a generalized approach to detect multiple types of DFs, including
deepfakes from unknown generation methods such as DeepFake-in-the-Wild (DFW)
videos. To better cope with unknown and unseen deepfakes, we introduce a
Convolutional LSTM-based Residual Network (CLRNet), which adopts a unique model
training strategy and explores spatial as well as the temporal information in
deepfakes. Through extensive experiments, we show that existing defense methods
are not ready for real-world deployment. Whereas our defense method (CLRNet)
achieves far better generalization when detecting various benchmark deepfake
methods (97.57% on average). Furthermore, we evaluate our approach with a
high-quality DeepFake-in-the-Wild dataset, collected from the Internet
containing numerous videos and having more than 150,000 frames. Our CLRNet
model demonstrated that it generalizes well against high-quality DFW videos by
achieving 93.86% detection accuracy, outperforming existing state-of-the-art
defense methods by a considerable margin.