Serverless inferencing on Kubernetes
- URL: http://arxiv.org/abs/2007.07366v2
- Date: Fri, 24 Jul 2020 07:18:25 GMT
- Title: Serverless inferencing on Kubernetes
- Authors: Clive Cox, Dan Sun, Ellis Tarn, Animesh Singh, Rakesh Kelkar, David
Goodwin
- Abstract summary: We will discuss the KFServing project which builds on the KNative serverless paradigm to provide a serverless machine learning inference solution.
We will show how it solves the challenges of autoscaling GPU based inference and discuss some of the lessons learnt from using it in production.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Organisations are increasingly putting machine learning models into
production at scale. The increasing popularity of serverless scale-to-zero
paradigms presents an opportunity for deploying machine learning models to help
mitigate infrastructure costs when many models may not be in continuous use. We
will discuss the KFServing project which builds on the KNative serverless
paradigm to provide a serverless machine learning inference solution that
allows a consistent and simple interface for data scientists to deploy their
models. We will show how it solves the challenges of autoscaling GPU based
inference and discuss some of the lessons learnt from using it in production.
Related papers
- Update Selective Parameters: Federated Machine Unlearning Based on Model Explanation [46.86767774669831]
We propose a more effective and efficient federated unlearning scheme based on the concept of model explanation.
We select the most influential channels within an already-trained model for the data that need to be unlearned.
arXiv Detail & Related papers (2024-06-18T11:43:20Z) - Reusable MLOps: Reusable Deployment, Reusable Infrastructure and
Hot-Swappable Machine Learning models and services [0.0]
We introduce a new sustainable concept in the field of AI/ML operations - called Reusable MLOps.
We reuse the existing deployment and infrastructure to serve new models by hot-swapping them without tearing down the infrastructure or the microservice.
arXiv Detail & Related papers (2024-02-19T23:40:46Z) - Heterogeneous Decentralized Machine Unlearning with Seed Model
Distillation [47.42071293545731]
Information security legislation endowed users with unconditional rights to be forgotten by trained machine learning models.
We design a decentralized unlearning framework called HDUS, which uses distilled seed models to construct erasable ensembles for all clients.
arXiv Detail & Related papers (2023-08-25T09:42:54Z) - Learnware: Small Models Do Big [69.88234743773113]
The prevailing big model paradigm, which has achieved impressive results in natural language processing and computer vision applications, has not yet addressed those issues, whereas becoming a serious source of carbon emissions.
This article offers an overview of the learnware paradigm, which attempts to enable users not need to build machine learning models from scratch, with the hope of reusing small models to do things even beyond their original purposes.
arXiv Detail & Related papers (2022-10-07T15:55:52Z) - Continual-Learning-as-a-Service (CLaaS): On-Demand Efficient Adaptation
of Predictive Models [17.83007940710455]
Two main future trends for companies that want to build machine learning-based applications are real-time inference and continual updating.
This paper defines a novel software service and model delivery infrastructure termed Continual Learning-as-a-Service (CL) to address these issues.
It provides support for model updating and validation tools for data scientists without an on-premise solution and in an efficient, stateful and easy-to-use manner.
arXiv Detail & Related papers (2022-06-14T16:22:54Z) - Applied Federated Learning: Architectural Design for Robust and
Efficient Learning in Privacy Aware Settings [0.8454446648908585]
The classical machine learning paradigm requires the aggregation of user data in a central location.
Centralization of data poses risks, including a heightened risk of internal and external security incidents.
Federated learning with differential privacy is designed to avoid the server-side centralization pitfall.
arXiv Detail & Related papers (2022-06-02T00:30:04Z) - Serving and Optimizing Machine Learning Workflows on Heterogeneous
Infrastructures [9.178035808110124]
JellyBean is a framework for serving and optimizing machine learning inference on heterogeneous infrastructures.
We show that JellyBean reduces the total serving cost of visual question answering by up to 58%, and vehicle tracking from the NVIDIA AI City Challenge by up to 36%.
arXiv Detail & Related papers (2022-05-10T07:32:32Z) - FLHub: a Federated Learning model sharing service [0.7614628596146599]
We propose Federated Learning Hub (FLHub) as a sharing service for machine learning models.
FLHub allows users to upload, download, and contribute the model developed by other developers similarly to GitHub.
We demonstrate that a forked model can finish training faster than the existing model and that learning progressed more quickly for each federated round.
arXiv Detail & Related papers (2022-02-14T06:02:55Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - An Expectation-Maximization Perspective on Federated Learning [75.67515842938299]
Federated learning describes the distributed training of models across multiple clients while keeping the data private on-device.
In this work, we view the server-orchestrated federated learning process as a hierarchical latent variable model where the server provides the parameters of a prior distribution over the client-specific model parameters.
We show that with simple Gaussian priors and a hard version of the well known Expectation-Maximization (EM) algorithm, learning in such a model corresponds to FedAvg, the most popular algorithm for the federated learning setting.
arXiv Detail & Related papers (2021-11-19T12:58:59Z) - Information-Theoretic Bounds on the Generalization Error and Privacy
Leakage in Federated Learning [96.38757904624208]
Machine learning algorithms on mobile networks can be characterized into three different categories.
The main objective of this work is to provide an information-theoretic framework for all of the aforementioned learning paradigms.
arXiv Detail & Related papers (2020-05-05T21:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.