ezDPS: An Efficient and Zero-Knowledge Machine Learning Inference
Pipeline
- URL: http://arxiv.org/abs/2212.05428v1
- Date: Sun, 11 Dec 2022 06:47:28 GMT
- Title: ezDPS: An Efficient and Zero-Knowledge Machine Learning Inference
Pipeline
- Authors: Haodi Wang and Thang Hoang
- Abstract summary: We propose ezDPS, a new efficient and zero-knowledge Machine Learning inference scheme.
ezDPS is a zkML pipeline in which the data is processed in multiple stages for high accuracy.
We show that ezDPS achieves one-to-three orders of magnitude more efficient than the generic circuit-based approach in all metrics.
- Score: 2.0813318162800707
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Machine Learning as a service (MLaaS) permits resource-limited clients to
access powerful data analytics services ubiquitously. Despite its merits, MLaaS
poses significant concerns regarding the integrity of delegated computation and
the privacy of the server's model parameters. To address this issue, Zhang et
al. (CCS'20) initiated the study of zero-knowledge Machine Learning (zkML). Few
zkML schemes have been proposed afterward; however, they focus on sole ML
classification algorithms that may not offer satisfactory accuracy or require
large-scale training data and model parameters, which may not be desirable for
some applications. We propose ezDPS, a new efficient and zero-knowledge ML
inference scheme. Unlike prior works, ezDPS is a zkML pipeline in which the
data is processed in multiple stages for high accuracy. Each stage of ezDPS is
harnessed with an established ML algorithm that is shown to be effective in
various applications, including Discrete Wavelet Transformation, Principal
Components Analysis, and Support Vector Machine. We design new gadgets to prove
ML operations effectively. We fully implemented ezDPS and assessed its
performance on real datasets. Experimental results showed that ezDPS achieves
one-to-three orders of magnitude more efficient than the generic circuit-based
approach in all metrics while maintaining more desirable accuracy than single
ML classification approaches.
Related papers
- Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - Don't Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning [0.0]
This paper addresses a critical issue in Machine Learning (ML) where unintended information contaminates the training data, impacting model performance evaluation.
The discrepancy between evaluated and actual performance on new data is a significant concern.
It explores the connection between data leakage and the specific task being addressed, investigates its occurrence in Transfer Learning, and compares standard inductive ML with transductive ML frameworks.
arXiv Detail & Related papers (2024-01-24T20:30:52Z) - MLLM-DataEngine: An Iterative Refinement Approach for MLLM [62.30753425449056]
We propose a novel closed-loop system that bridges data generation, model training, and evaluation.
Within each loop, the MLLM-DataEngine first analyze the weakness of the model based on the evaluation results.
For targeting, we propose an Adaptive Bad-case Sampling module, which adjusts the ratio of different types of data.
For quality, we resort to GPT-4 to generate high-quality data with each given data type.
arXiv Detail & Related papers (2023-08-25T01:41:04Z) - From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets.
Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z) - Data Debugging with Shapley Importance over End-to-End Machine Learning
Pipelines [27.461398584509755]
DataScope is the first system that efficiently computes Shapley values of training examples over an end-to-end machine learning pipeline.
Our results show that DataScope is up to four orders of magnitude faster than state-of-the-art Monte Carlo-based methods.
arXiv Detail & Related papers (2022-04-23T19:29:23Z) - FreaAI: Automated extraction of data slices to test machine learning
models [2.475112368179548]
We show the feasibility of automatically extracting feature models that result in explainable data slices over which the ML solution under-performs.
Our novel technique, IBM FreaAI aka FreaAI, extracts such slices from structured ML test data or any other labeled data.
arXiv Detail & Related papers (2021-08-12T09:21:16Z) - Memory-Based Optimization Methods for Model-Agnostic Meta-Learning and
Personalized Federated Learning [56.17603785248675]
Model-agnostic meta-learning (MAML) has become a popular research area.
Existing MAML algorithms rely on the episode' idea by sampling a few tasks and data points to update the meta-model at each iteration.
This paper proposes memory-based algorithms for MAML that converge with vanishing error.
arXiv Detail & Related papers (2021-06-09T08:47:58Z) - Exploring Opportunistic Meta-knowledge to Reduce Search Spaces for
Automated Machine Learning [8.325359814939517]
This paper investigates whether, based on previous experience, a pool of available classifiers/regressors can be preemptively culled ahead of initiating a pipeline composition/optimisation process.
arXiv Detail & Related papers (2021-05-01T15:25:30Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z) - Transfer Learning without Knowing: Reprogramming Black-box Machine
Learning Models with Scarce Data and Limited Resources [78.72922528736011]
We propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box machine learning model.
Using zeroth order optimization and multi-label mapping techniques, BAR can reprogram a black-box ML model solely based on its input-output responses.
BAR outperforms state-of-the-art methods and yields comparable performance to the vanilla adversarial reprogramming method.
arXiv Detail & Related papers (2020-07-17T01:52:34Z) - Insights into Performance Fitness and Error Metrics for Machine Learning [1.827510863075184]
Machine learning (ML) is the field of training machines to achieve high level of cognition and perform human-like analysis.
This paper examines a number of the most commonly-used performance fitness and error metrics for regression and classification algorithms.
arXiv Detail & Related papers (2020-05-17T22:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.