Related papers: Serverless inferencing on Kubernetes

Serverless inferencing on Kubernetes

URL: http://arxiv.org/abs/2007.07366v2
Date: Fri, 24 Jul 2020 07:18:25 GMT
Title: Serverless inferencing on Kubernetes
Authors: Clive Cox, Dan Sun, Ellis Tarn, Animesh Singh, Rakesh Kelkar, David Goodwin
Abstract summary: We will discuss the KFServing project which builds on the KNative serverless paradigm to provide a serverless machine learning inference solution. We will show how it solves the challenges of autoscaling GPU based inference and discuss some of the lessons learnt from using it in production.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Organisations are increasingly putting machine learning models into production at scale. The increasing popularity of serverless scale-to-zero paradigms presents an opportunity for deploying machine learning models to help mitigate infrastructure costs when many models may not be in continuous use. We will discuss the KFServing project which builds on the KNative serverless paradigm to provide a serverless machine learning inference solution that allows a consistent and simple interface for data scientists to deploy their models. We will show how it solves the challenges of autoscaling GPU based inference and discuss some of the lessons learnt from using it in production.

Related papers

SeSeMI: Secure Serverless Model Inference on Sensitive Data [14.820151992047089]
Existing cloud-based model inference systems are costly, not easy to scale, and must be trusted in handling the models and user request data. Our goal is to design a serverless model inference system that protects models and user request data from untrusted cloud providers. We present SeSeMI, a secure, efficient, and cost-effective serverless model inference system.
arXiv Detail & Related papers (2024-12-16T10:37:30Z)
Update Selective Parameters: Federated Machine Unlearning Based on Model Explanation [46.86767774669831]
We propose a more effective and efficient federated unlearning scheme based on the concept of model explanation. We select the most influential channels within an already-trained model for the data that need to be unlearned.
arXiv Detail & Related papers (2024-06-18T11:43:20Z)
Reusable MLOps: Reusable Deployment, Reusable Infrastructure and Hot-Swappable Machine Learning models and services [0.0]
We introduce a new sustainable concept in the field of AI/ML operations - called Reusable MLOps. We reuse the existing deployment and infrastructure to serve new models by hot-swapping them without tearing down the infrastructure or the microservice.
arXiv Detail & Related papers (2024-02-19T23:40:46Z)
Heterogeneous Decentralized Machine Unlearning with Seed Model Distillation [47.42071293545731]
Information security legislation endowed users with unconditional rights to be forgotten by trained machine learning models. We design a decentralized unlearning framework called HDUS, which uses distilled seed models to construct erasable ensembles for all clients.
arXiv Detail & Related papers (2023-08-25T09:42:54Z)
Learnware: Small Models Do Big [69.88234743773113]
The prevailing big model paradigm, which has achieved impressive results in natural language processing and computer vision applications, has not yet addressed those issues, whereas becoming a serious source of carbon emissions. This article offers an overview of the learnware paradigm, which attempts to enable users not need to build machine learning models from scratch, with the hope of reusing small models to do things even beyond their original purposes.
arXiv Detail & Related papers (2022-10-07T15:55:52Z)
Continual-Learning-as-a-Service (CLaaS): On-Demand Efficient Adaptation of Predictive Models [17.83007940710455]
Two main future trends for companies that want to build machine learning-based applications are real-time inference and continual updating. This paper defines a novel software service and model delivery infrastructure termed Continual Learning-as-a-Service (CL) to address these issues. It provides support for model updating and validation tools for data scientists without an on-premise solution and in an efficient, stateful and easy-to-use manner.
arXiv Detail & Related papers (2022-06-14T16:22:54Z)
Applied Federated Learning: Architectural Design for Robust and Efficient Learning in Privacy Aware Settings [0.8454446648908585]
The classical machine learning paradigm requires the aggregation of user data in a central location. Centralization of data poses risks, including a heightened risk of internal and external security incidents. Federated learning with differential privacy is designed to avoid the server-side centralization pitfall.
arXiv Detail & Related papers (2022-06-02T00:30:04Z)
Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures [9.178035808110124]
JellyBean is a framework for serving and optimizing machine learning inference on heterogeneous infrastructures. We show that JellyBean reduces the total serving cost of visual question answering by up to 58%, and vehicle tracking from the NVIDIA AI City Challenge by up to 36%.
arXiv Detail & Related papers (2022-05-10T07:32:32Z)
FLHub: a Federated Learning model sharing service [0.7614628596146599]
We propose Federated Learning Hub (FLHub) as a sharing service for machine learning models. FLHub allows users to upload, download, and contribute the model developed by other developers similarly to GitHub. We demonstrate that a forked model can finish training faster than the existing model and that learning progressed more quickly for each federated round.
arXiv Detail & Related papers (2022-02-14T06:02:55Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
An Expectation-Maximization Perspective on Federated Learning [75.67515842938299]
Federated learning describes the distributed training of models across multiple clients while keeping the data private on-device. In this work, we view the server-orchestrated federated learning process as a hierarchical latent variable model where the server provides the parameters of a prior distribution over the client-specific model parameters. We show that with simple Gaussian priors and a hard version of the well known Expectation-Maximization (EM) algorithm, learning in such a model corresponds to FedAvg, the most popular algorithm for the federated learning setting.
arXiv Detail & Related papers (2021-11-19T12:58:59Z)
Information-Theoretic Bounds on the Generalization Error and Privacy Leakage in Federated Learning [96.38757904624208]
Machine learning algorithms on mobile networks can be characterized into three different categories. The main objective of this work is to provide an information-theoretic framework for all of the aforementioned learning paradigms.
arXiv Detail & Related papers (2020-05-05T21:23:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.