Privacy-Preserving Federated Analytics using Multiparty Homomorphic Encryption
David Jules Froelicher
Ph.D. thesis advised by Jean-Pierre Hubaux and Bryan Ford
October 1, 2021
Abstract:
Analyzing and processing data that are siloed and dispersed among multiple
distrustful stakeholders is difficult and can even become impossible when the
data are sensitive or confidential. Current data-protection and privacy
regulations (e.g., GDPR) highly restrict the sharing and outsourcing of
personal information among stakeholders that are in different jurisdictions.
Sharing data is, however, required in many domains such as finance and
medicine. The medical sector is a paradigmatic example: Privacy is paramount
and data sharing is needed in numerous applications where data is scarce (e.g.,
patients with rare diseases) and scattered among multiple stakeholders around
the world. Existing privacy-preserving solutions for federated analytics rely
either (1) on data centralization or outsourcing to a limited number of
entities, which incur multiple security and trust issues, or (2) on the
iterative exchange of cleartext aggregated and optionally obfuscated data,
which can leak personal information or introduce bias in the final result. In
this thesis, our goal is (1) to propose privacy-preserving federated solutions
for exploration, and for statistical and machine-learning analyses on data held
by multiple distrustful stakeholders, and (2) to analyze and evaluate the
proposed systems, thus showing that they provide an efficient, secure,
scalable, and accurate alternative to existing solutions for federated analysis
by proving their utility in real-world state-of-the-art biomedical studies. In
order to do this, we rely on multiparty homomorphic encryption (MHE). MHE
combines secure multiparty computation (SMC) techniques with homomorphic
encryption (HE) by pooling the advantages of both SMC and HE, i.e.,
interactivity and flexibility, and by minimizing their disadvantages, i.e.,
difficulty in scaling to a large number of parties and computation complexity.
First, we design UNLYNX, a system that enables privacy-preserving federated
data exploration on a distributed dataset held by multiple data-providers
(DPs), where N-1 out of N of the nodes performing the computations can be
malicious. To achieve this, we build interactive protocols by relying on
ElGamal additive homomorphic encryption (AHE) and ensure that each
untrusted-node operation can be publicly verified by means of zero-knowledge
proofs (ZKPs). We then explore how statistics, e.g., standard deviation and
variance, can be computed by relying on AHE and ZKPs through the design of
another system named DRYNX. In DRYNX, we also explore how to limit the
influence of an entity that inputs wrong data in the system, and we propose an
efficient federated solution for correctness verification.
We also propose SPINDLE, a solution for secure cooperative gradient descent
on federated data that we instantiate for the privacy-preserving training and
oblivious evaluation of generalized linear models. SPINDLE covers the entire
machine-learning workflow, as it enables oblivious predictions to be performed
on a trained model that remains secret. It ensures both data and model
confidentiality in a passive adversarial model in which N-1 out of N DPs can
collude. Finally, we demonstrate that the solutions proposed in this thesis can
be efficient enablers for large-scale, highly sensitive, multi-site biomedical
studies. We design and test, by replicating recent state-of-the-art medical
studies, secure workflows for the federated execution of computations that
span from analyses with low computational complexity, such as non-parametric
survival analyses often used in oncology, to analyses with high computational
complexity such as one of the key tools for genomic studies, genome-wide
association studies (GWAS) on millions of variants.
Ph.D. Thesis:
PDF
Private Defense Slides:
PDF
Public Defense Slides:
PDF