SeSeMI: Secure Serverless Model Inference on Sensitive Data
- URL: http://arxiv.org/abs/2412.11640v1
- Date: Mon, 16 Dec 2024 10:37:30 GMT
- Title: SeSeMI: Secure Serverless Model Inference on Sensitive Data
- Authors: Guoyu Hu, Yuncheng Wu, Gang Chen, Tien Tuan Anh Dinh, Beng Chin Ooi,
- Abstract summary: Existing cloud-based model inference systems are costly, not easy to scale, and must be trusted in handling the models and user request data.
Our goal is to design a serverless model inference system that protects models and user request data from untrusted cloud providers.
We present SeSeMI, a secure, efficient, and cost-effective serverless model inference system.
- Score: 14.820151992047089
- License:
- Abstract: Model inference systems are essential for implementing end-to-end data analytics pipelines that deliver the benefits of machine learning models to users. Existing cloud-based model inference systems are costly, not easy to scale, and must be trusted in handling the models and user request data. Serverless computing presents a new opportunity, as it provides elasticity and fine-grained pricing. Our goal is to design a serverless model inference system that protects models and user request data from untrusted cloud providers. It offers high performance and low cost, while requiring no intrusive changes to the current serverless platforms. To realize our goal, we leverage trusted hardware. We identify and address three challenges in using trusted hardware for serverless model inference. These challenges arise from the high-level abstraction of serverless computing, the performance overhead of trusted hardware, and the characteristics of model inference workloads. We present SeSeMI, a secure, efficient, and cost-effective serverless model inference system. It adds three novel features non-intrusively to the existing serverless infrastructure and nothing else.The first feature is a key service that establishes secure channels between the user and the serverless instances, which also provides access control to models and users' data. The second is an enclave runtime that allows one enclave to process multiple concurrent requests. The final feature is a model packer that allows multiple models to be executed by one serverless instance. We build SeSeMI on top of Apache OpenWhisk, and conduct extensive experiments with three popular machine learning models. The results show that SeSeMI achieves low latency and low cost at scale for realistic workloads.
Related papers
- Label Privacy in Split Learning for Large Models with Parameter-Efficient Training [51.28799334394279]
We search for a way to fine-tune models over an API while keeping the labels private.
We propose P$3$EFT, a multi-party split learning algorithm that takes advantage of existing PEFT properties to maintain privacy at a lower performance overhead.
arXiv Detail & Related papers (2024-12-21T15:32:03Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Scalable Collaborative Learning via Representation Sharing [53.047460465980144]
Federated learning (FL) and Split Learning (SL) are two frameworks that enable collaborative learning while keeping the data private (on device)
In FL, each data holder trains a model locally and releases it to a central server for aggregation.
In SL, the clients must release individual cut-layer activations (smashed data) to the server and wait for its response (during both inference and back propagation).
In this work, we present a novel approach for privacy-preserving machine learning, where the clients collaborate via online knowledge distillation using a contrastive loss.
arXiv Detail & Related papers (2022-11-20T10:49:22Z) - Partially Oblivious Neural Network Inference [4.843820624525483]
We show that for neural network models, like CNNs, some information leakage can be acceptable.
We experimentally demonstrate that in a CIFAR-10 network we can leak up to $80%$ of the model's weights with practically no security impact.
arXiv Detail & Related papers (2022-10-27T05:39:36Z) - DualCF: Efficient Model Extraction Attack from Counterfactual
Explanations [57.46134660974256]
Cloud service providers have launched Machine-Learning-as-a-Service platforms to allow users to access large-scale cloudbased models via APIs.
Such extra information inevitably causes the cloud models to be more vulnerable to extraction attacks.
We propose a novel simple yet efficient querying strategy to greatly enhance the querying efficiency to steal a classification model.
arXiv Detail & Related papers (2022-05-13T08:24:43Z) - UnSplit: Data-Oblivious Model Inversion, Model Stealing, and Label
Inference Attacks Against Split Learning [0.0]
Split learning framework aims to split up the model among the client and the server.
We show that split learning paradigm can pose serious security risks and provide no more than a false sense of security.
arXiv Detail & Related papers (2021-08-20T07:39:16Z) - Serverless Model Serving for Data Science [23.05534539170047]
We study the viability of serverless as a mainstream model serving platform for data science applications.
We find that serverless outperforms many cloud-based alternatives with respect to cost and performance.
We present several practical recommendations for data scientists on how to use serverless for scalable and cost-effective model serving.
arXiv Detail & Related papers (2021-03-04T11:23:01Z) - Serverless inferencing on Kubernetes [0.0]
We will discuss the KFServing project which builds on the KNative serverless paradigm to provide a serverless machine learning inference solution.
We will show how it solves the challenges of autoscaling GPU based inference and discuss some of the lessons learnt from using it in production.
arXiv Detail & Related papers (2020-07-14T21:23:59Z) - Information-Theoretic Bounds on the Generalization Error and Privacy
Leakage in Federated Learning [96.38757904624208]
Machine learning algorithms on mobile networks can be characterized into three different categories.
The main objective of this work is to provide an information-theoretic framework for all of the aforementioned learning paradigms.
arXiv Detail & Related papers (2020-05-05T21:23:45Z) - Corella: A Private Multi Server Learning Approach based on Correlated
Queries [30.3330177204504]
We propose $textitCorella$ as an alternative approach to protect the privacy of data.
The proposed scheme relies on a cluster of servers, where at most $T in mathbbN$ of them may collude, each running a learning model.
The variance of the noise is set to be large enough to make the information leakage to any subset of up to $T$ servers information-theoretically negligible.
arXiv Detail & Related papers (2020-03-26T17:44:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.