Wildest Dreams: Reproducible Research in Privacy-preserving Neural
Network Training
- URL: http://arxiv.org/abs/2403.03592v1
- Date: Wed, 6 Mar 2024 10:25:36 GMT
- Title: Wildest Dreams: Reproducible Research in Privacy-preserving Neural
Network Training
- Authors: Tanveer Khan, Mindaugas Budzys, Khoa Nguyen, Antonis Michalas
- Abstract summary: This work focuses on the ML model's training phase, where maintaining user data privacy is of utmost importance.
We provide a solid theoretical background that eases the understanding of current approaches.
We reproduce results for some of the papers and examine at what level existing works in the field provide support for open science.
- Score: 2.853180143237022
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine Learning (ML), addresses a multitude of complex issues in multiple
disciplines, including social sciences, finance, and medical research. ML
models require substantial computing power and are only as powerful as the data
utilized. Due to high computational cost of ML methods, data scientists
frequently use Machine Learning-as-a-Service (MLaaS) to outsource computation
to external servers. However, when working with private information, like
financial data or health records, outsourcing the computation might result in
privacy issues. Recent advances in Privacy-Preserving Techniques (PPTs) have
enabled ML training and inference over protected data through the use of
Privacy-Preserving Machine Learning (PPML). However, these techniques are still
at a preliminary stage and their application in real-world situations is
demanding. In order to comprehend discrepancy between theoretical research
suggestions and actual applications, this work examines the past and present of
PPML, focusing on Homomorphic Encryption (HE) and Secure Multi-party
Computation (SMPC) applied to ML. This work primarily focuses on the ML model's
training phase, where maintaining user data privacy is of utmost importance. We
provide a solid theoretical background that eases the understanding of current
approaches and their limitations. In addition, we present a SoK of the most
recent PPML frameworks for model training and provide a comprehensive
comparison in terms of the unique properties and performances on standard
benchmarks. Also, we reproduce the results for some of the papers and examine
at what level existing works in the field provide support for open science. We
believe our work serves as a valuable contribution by raising awareness about
the current gap between theoretical advancements and real-world applications in
PPML, specifically regarding open-source availability, reproducibility, and
usability.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - LLM-PBE: Assessing Data Privacy in Large Language Models [111.58198436835036]
Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis.
Despite the critical nature of this issue, there has been no existing literature to offer a comprehensive assessment of data privacy risks in LLMs.
Our paper introduces LLM-PBE, a toolkit crafted specifically for the systematic evaluation of data privacy risks in LLMs.
arXiv Detail & Related papers (2024-08-23T01:37:29Z) - LLM-based Knowledge Pruning for Time Series Data Analytics on Edge-computing Devices [23.18319883190927]
We propose Knowledge Pruning (KP), a novel paradigm for time series learning.
Unlike other methods, our KP targets to prune the redundant knowledge and only distill the pertinent knowledge into the target model.
With our proposed KP, a lightweight network can effectively learn the pertinent knowledge, achieving satisfactory performances with a low cost.
arXiv Detail & Related papers (2024-06-13T02:51:18Z) - GuardML: Efficient Privacy-Preserving Machine Learning Services Through
Hybrid Homomorphic Encryption [2.611778281107039]
Privacy-Preserving Machine Learning (PPML) methods have been introduced to safeguard the privacy and security of Machine Learning models.
Modern cryptographic scheme, Hybrid Homomorphic Encryption (HHE) has recently emerged.
We develop and evaluate an HHE-based PPML application for classifying heart disease based on sensitive ECG data.
arXiv Detail & Related papers (2024-01-26T13:12:52Z) - Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly [62.473245910234304]
This paper takes a hardware-centric approach to explore how Large Language Models can be brought to modern edge computing systems.
We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions.
arXiv Detail & Related papers (2023-10-04T20:27:20Z) - Privacy Adhering Machine Un-learning in NLP [66.17039929803933]
In real world industry use Machine Learning to build models on user data.
Such mandates require effort both in terms of data as well as model retraining.
continuous removal of data and model retraining steps do not scale.
We propose textitMachine Unlearning to tackle this challenge.
arXiv Detail & Related papers (2022-12-19T16:06:45Z) - A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems.
ML models often remember' the old data.
Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z) - SoK: Privacy Preserving Machine Learning using Functional Encryption:
Opportunities and Challenges [1.2183405753834562]
We focus on Inner-product-FE and Quadratic-FE-based machine learning models for the privacy-preserving machine learning (PPML) applications.
To the best of our knowledge, this is the first work to systematize FE-based PPML approaches.
arXiv Detail & Related papers (2022-04-11T14:15:36Z) - Privacy-Preserving Machine Learning: Methods, Challenges and Directions [4.711430413139393]
Well-designed privacy-preserving machine learning (PPML) solutions have attracted increasing research interest from academia and industry.
This paper systematically reviews existing privacy-preserving approaches and proposes a PGU model to guide evaluation for various PPML solutions.
arXiv Detail & Related papers (2021-08-10T02:58:31Z) - Privacy-Preserving XGBoost Inference [0.6345523830122165]
A major barrier to adoption is the sensitive nature of predictive queries.
One central goal of privacy-preserving machine learning (PPML) is to enable users to submit encrypted queries to a remote ML service.
We propose a privacy-preserving XGBoost prediction algorithm, which we have implemented and evaluated empirically on AWS SageMaker.
arXiv Detail & Related papers (2020-11-09T21:46:07Z) - Machine Learning Force Fields [54.48599172620472]
Machine Learning (ML) has enabled numerous advances in computational chemistry.
One of the most promising applications is the construction of ML-based force fields (FFs)
This review gives an overview of applications of ML-FFs and the chemical insights that can be obtained from them.
arXiv Detail & Related papers (2020-10-14T13:14:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.