Plinius: Secure and Persistent Machine Learning Model Training. (arXiv:2104.02987v1 [cs.CR])

With the increasing popularity of cloud based machine learning (ML)
techniques there comes a need for privacy and integrity guarantees for ML data.
In addition, the significant scalability challenges faced by DRAM coupled with
the high access-times of secondary storage represent a huge performance
bottleneck for ML systems. While solutions exist to tackle the security aspect,
performance remains an issue. Persistent memory (PM) is resilient to power loss
(unlike DRAM), provides fast and fine-granular access to memory (unlike disk
storage) and has latency and bandwidth close to DRAM (in the order of ns and
GB/s, respectively). We present PLINIUS, a ML framework using Intel SGX
enclaves for secure training of ML models and PM for fault tolerance
guarantees. P LINIUS uses a novel mirroring mechanism to create and maintain
(i) encrypted mirror copies of ML models on PM, and (ii) encrypted training
data in byte-addressable PM, for near-instantaneous data recovery after a
system failure. Compared to disk-based checkpointing systems,PLINIUS is 3.2x
and 3.7x faster respectively for saving and restoring models on real PM
hardware, achieving robust and secure ML model training in SGX enclaves.