A Continuous Liveness Detection System for Text-independent Speaker Verification. (arXiv:2106.01840v1 [cs.CR])

Voice authentication is drawing increasing attention and becomes an
attractive alternative to passwords for mobile authentication. Recent advances
in mobile technology further accelerate the adoption of voice biometrics in an
array of diverse mobile applications. However, recent studies show that voice
authentication is vulnerable to replay attacks, where an adversary can spoof a
voice authentication system using a pre-recorded voice sample collected from
the victim. In this paper, we propose VoiceLive, a liveness detection system
for both text-dependent and text-independent voice authentication on
smartphones. VoiceLive detects a live user by leveraging the user’s unique
vocal system and the stereo recording of smartphones. In particular, utilizing
the built-in gyroscope, loudspeaker, and microphone, VoiceLive first measures
the smartphone’s distance and angle from the user, then it captures the
position-specific time-difference-of-arrival (TDoA) changes in a sequence of
phoneme sounds to the two microphones of the phone, and uses such unique TDoA
dynamic which doesn’t exist under replay attacks for liveness detection.
VoiceLive is practical as it doesn’t require additional hardware but
two-channel stereo recording that is supported by virtually all smartphones.
Our experimental evaluation with 12 participants and different types of phones
shows that VoiceLive achieves over 99% detection accuracy at around 1% Equal
Error Rate (EER) on the text-dependent system and around 99% accuracy and 2%
EER on the text-independent one. Results also show that VoiceLive is robust to
different phone positions, i.e. the user is free to hold the smartphone with
distinct distances and angles.