AI-enabled Automation for Completeness Checking of Privacy Policies. (arXiv:2106.05688v1 [cs.CR])

Technological advances in information sharing have raised concerns about data
protection. Privacy policies contain privacy-related requirements about how the
personal data of individuals will be handled by an organization or a software
system (e.g., a web service or an app). In Europe, privacy policies are subject
to compliance with the General Data Protection Regulation (GDPR). A
prerequisite for GDPR compliance checking is to verify whether the content of a
privacy policy is complete according to the provisions of GDPR. Incomplete
privacy policies might result in large fines on violating organization as well
as incomplete privacy-related software specifications. Manual completeness
checking is both time-consuming and error-prone. In this paper, we propose
AI-based automation for the completeness checking of privacy policies. Through
systematic qualitative methods, we first build two artifacts to characterize
the privacy-related provisions of GDPR, namely a conceptual model and a set of
completeness criteria. Then, we develop an automated solution on top of these
artifacts by leveraging a combination of natural language processing and
supervised machine learning. Specifically, we identify the GDPR-relevant
information content in privacy policies and subsequently check them against the
completeness criteria. To evaluate our approach, we collected 234 real privacy
policies from the fund industry. Over a set of 48 unseen privacy policies, our
approach detected 300 of the total of 334 violations of some completeness
criteria correctly, while producing 23 false positives. The approach thus has a
precision of 92.9% and recall of 89.8%. Compared to a baseline that applies
keyword search only, our approach results in an improvement of 24.5% in
precision and 38% in recall.