Related papers: Wildest Dreams: Reproducible Research in Privacy-preserving Neural Network Training

Wildest Dreams: Reproducible Research in Privacy-preserving Neural Network Training

URL: http://arxiv.org/abs/2403.03592v1
Date: Wed, 6 Mar 2024 10:25:36 GMT
Title: Wildest Dreams: Reproducible Research in Privacy-preserving Neural Network Training
Authors: Tanveer Khan, Mindaugas Budzys, Khoa Nguyen, Antonis Michalas
Abstract summary: This work focuses on the ML model's training phase, where maintaining user data privacy is of utmost importance. We provide a solid theoretical background that eases the understanding of current approaches. We reproduce results for some of the papers and examine at what level existing works in the field provide support for open science.
Score: 2.853180143237022
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine Learning (ML), addresses a multitude of complex issues in multiple disciplines, including social sciences, finance, and medical research. ML models require substantial computing power and are only as powerful as the data utilized. Due to high computational cost of ML methods, data scientists frequently use Machine Learning-as-a-Service (MLaaS) to outsource computation to external servers. However, when working with private information, like financial data or health records, outsourcing the computation might result in privacy issues. Recent advances in Privacy-Preserving Techniques (PPTs) have enabled ML training and inference over protected data through the use of Privacy-Preserving Machine Learning (PPML). However, these techniques are still at a preliminary stage and their application in real-world situations is demanding. In order to comprehend discrepancy between theoretical research suggestions and actual applications, this work examines the past and present of PPML, focusing on Homomorphic Encryption (HE) and Secure Multi-party Computation (SMPC) applied to ML. This work primarily focuses on the ML model's training phase, where maintaining user data privacy is of utmost importance. We provide a solid theoretical background that eases the understanding of current approaches and their limitations. In addition, we present a SoK of the most recent PPML frameworks for model training and provide a comprehensive comparison in terms of the unique properties and performances on standard benchmarks. Also, we reproduce the results for some of the papers and examine at what level existing works in the field provide support for open science. We believe our work serves as a valuable contribution by raising awareness about the current gap between theoretical advancements and real-world applications in PPML, specifically regarding open-source availability, reproducibility, and usability.

Related papers

Towards Efficient Privacy-Preserving Machine Learning: A Systematic Review from Protocol, Model, and System Perspectives [11.859194469912083]
Privacy-preserving machine learning (PPML) based on cryptographic protocols has emerged as a promising paradigm to protect user data privacy in cloud-based machine learning services.<n>PPML often incurs significant efficiency and scalability costs due to orders of magnitude overhead compared to the counterpart.
arXiv Detail & Related papers (2025-07-19T07:45:39Z)
Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context. We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters. Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z)
LLM-PBE: Assessing Data Privacy in Large Language Models [111.58198436835036]
Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis. Despite the critical nature of this issue, there has been no existing literature to offer a comprehensive assessment of data privacy risks in LLMs. Our paper introduces LLM-PBE, a toolkit crafted specifically for the systematic evaluation of data privacy risks in LLMs.
arXiv Detail & Related papers (2024-08-23T01:37:29Z)
LLM-based Knowledge Pruning for Time Series Data Analytics on Edge-computing Devices [23.18319883190927]
We propose Knowledge Pruning (KP), a novel paradigm for time series learning. Unlike other methods, our KP targets to prune the redundant knowledge and only distill the pertinent knowledge into the target model. With our proposed KP, a lightweight network can effectively learn the pertinent knowledge, achieving satisfactory performances with a low cost.
arXiv Detail & Related papers (2024-06-13T02:51:18Z)
GuardML: Efficient Privacy-Preserving Machine Learning Services Through Hybrid Homomorphic Encryption [2.611778281107039]
Privacy-Preserving Machine Learning (PPML) methods have been introduced to safeguard the privacy and security of Machine Learning models. Modern cryptographic scheme, Hybrid Homomorphic Encryption (HHE) has recently emerged. We develop and evaluate an HHE-based PPML application for classifying heart disease based on sensitive ECG data.
arXiv Detail & Related papers (2024-01-26T13:12:52Z)
Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly [62.473245910234304]
This paper takes a hardware-centric approach to explore how Large Language Models can be brought to modern edge computing systems. We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions.
arXiv Detail & Related papers (2023-10-04T20:27:20Z)
Privacy Adhering Machine Un-learning in NLP [66.17039929803933]
In real world industry use Machine Learning to build models on user data. Such mandates require effort both in terms of data as well as model retraining. continuous removal of data and model retraining steps do not scale. We propose textitMachine Unlearning to tackle this challenge.
arXiv Detail & Related papers (2022-12-19T16:06:45Z)
A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems. ML models often remember' the old data. Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z)
SoK: Privacy Preserving Machine Learning using Functional Encryption: Opportunities and Challenges [1.2183405753834562]
We focus on Inner-product-FE and Quadratic-FE-based machine learning models for the privacy-preserving machine learning (PPML) applications. To the best of our knowledge, this is the first work to systematize FE-based PPML approaches.
arXiv Detail & Related papers (2022-04-11T14:15:36Z)
Privacy-Preserving Machine Learning: Methods, Challenges and Directions [4.711430413139393]
Well-designed privacy-preserving machine learning (PPML) solutions have attracted increasing research interest from academia and industry. This paper systematically reviews existing privacy-preserving approaches and proposes a PGU model to guide evaluation for various PPML solutions.
arXiv Detail & Related papers (2021-08-10T02:58:31Z)
Privacy-Preserving XGBoost Inference [0.6345523830122165]
A major barrier to adoption is the sensitive nature of predictive queries. One central goal of privacy-preserving machine learning (PPML) is to enable users to submit encrypted queries to a remote ML service. We propose a privacy-preserving XGBoost prediction algorithm, which we have implemented and evaluated empirically on AWS SageMaker.
arXiv Detail & Related papers (2020-11-09T21:46:07Z)
Machine Learning Force Fields [54.48599172620472]
Machine Learning (ML) has enabled numerous advances in computational chemistry. One of the most promising applications is the construction of ML-based force fields (FFs) This review gives an overview of applications of ML-FFs and the chemical insights that can be obtained from them.
arXiv Detail & Related papers (2020-10-14T13:14:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.