FlexServe: Deployment of PyTorch Models as Flexible REST Endpoints
- URL: http://arxiv.org/abs/2003.01538v1
- Date: Sat, 29 Feb 2020 18:51:09 GMT
- Title: FlexServe: Deployment of PyTorch Models as Flexible REST Endpoints
- Authors: Edward Verenich, Alvaro Velasquez, M.G. Sarwar Murshed, Faraz Hussain
- Abstract summary: integration of artificial intelligence capabilities into modern software systems is increasingly being simplified through the use of cloud-based services and representational state transfer architecture.
Insufficient information regarding underlying model provenance and the lack of control over model evolution serve as an impediment to the more widespread adoption of these services in many operational environments which have strict security requirements.
- Score: 6.730473762151365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integration of artificial intelligence capabilities into modern software
systems is increasingly being simplified through the use of cloud-based machine
learning services and representational state transfer architecture design.
However, insufficient information regarding underlying model provenance and the
lack of control over model evolution serve as an impediment to the more
widespread adoption of these services in many operational environments which
have strict security requirements. Furthermore, tools such as TensorFlow
Serving allow models to be deployed as RESTful endpoints, but require
error-prone transformations for PyTorch models as these dynamic computational
graphs. This is in contrast to the static computational graphs of TensorFlow.
To enable rapid deployments of PyTorch models without intermediate
transformations we have developed FlexServe, a simple library to deploy
multi-model ensembles with flexible batching.
Related papers
- SeBS-Flow: Benchmarking Serverless Cloud Function Workflows [51.4200085836966]
We propose the first serverless workflow benchmarking suite SeBS-Flow.
SeBS-Flow includes six real-world application benchmarks and four microbenchmarks representing different computational patterns.
We conduct comprehensive evaluations on three major cloud platforms, assessing performance, cost, scalability, and runtime deviations.
arXiv Detail & Related papers (2024-10-04T14:52:18Z) - FlexModel: A Framework for Interpretability of Distributed Large
Language Models [0.0]
We present FlexModel, a software package providing a streamlined interface for engaging with models distributed across multi- GPU and multi-node configurations.
The library is compatible with existing model distribution libraries and encapsulates PyTorch models.
It exposes user-registerable HookFunctions to facilitate straightforward interaction with distributed model internals.
arXiv Detail & Related papers (2023-12-05T21:19:33Z) - Predicting Resource Consumption of Kubernetes Container Systems using
Resource Models [3.138731415322007]
This paper considers how to derive resource models for cloud systems empirically.
We do so based on models of deployed services in a formal language with explicit adherence to CPU and memory resources.
We report on leveraging data collected empirically from small deployments to simulate the execution of higher intensity scenarios on larger deployments.
arXiv Detail & Related papers (2023-05-12T17:59:01Z) - PDSketch: Integrated Planning Domain Programming and Learning [86.07442931141637]
We present a new domain definition language, named PDSketch.
It allows users to flexibly define high-level structures in the transition models.
Details of the transition model will be filled in by trainable neural networks.
arXiv Detail & Related papers (2023-03-09T18:54:12Z) - MetaNetwork: A Task-agnostic Network Parameters Generation Framework for
Improving Device Model Generalization [65.02542875281233]
We propose a novel task-agnostic framework, named MetaNetwork, for generating adaptive device model parameters from cloud without on-device training.
The MetaGenerator is designed to learn a mapping function from samples to model parameters, and it can generate and deliver the adaptive parameters to the device based on samples uploaded from the device to the cloud.
The MetaStabilizer aims to reduce the oscillation of the MetaGenerator, accelerate the convergence and improve the model performance during both training and inference.
arXiv Detail & Related papers (2022-09-12T13:26:26Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Amazon SageMaker Model Parallelism: A General and Flexible Framework for
Large Model Training [10.223511922625065]
We present Amazon SageMaker model parallelism, a software library that integrates with PyTorch.
It enables easy training of large models using model parallelism and other memory-saving features.
We evaluate performance over GPT-3, RoBERTa, BERT, and neural collaborative filtering.
arXiv Detail & Related papers (2021-11-10T22:30:21Z) - DIETERpy: a Python framework for The Dispatch and Investment Evaluation
Tool with Endogenous Renewables [62.997667081978825]
DIETER is an open-source power sector model designed to analyze future settings with very high shares of variable renewable energy sources.
It minimizes overall system costs, including fixed and variable costs of various generation, flexibility and sector coupling options.
We introduce DIETERpy that builds on the existing model version, written in the General Algebraic Modeling System (GAMS) and enhances it with a Python framework.
arXiv Detail & Related papers (2020-10-02T09:27:33Z) - Ensemble Distillation for Robust Model Fusion in Federated Learning [72.61259487233214]
Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model.
In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side.
We propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients.
arXiv Detail & Related papers (2020-06-12T14:49:47Z) - Scalable Deployment of AI Time-series Models for IoT [0.7169734491710924]
IBM Research Castor is a cloud-native system for managing and deploying time-series models in IoT applications.
Model templates can be deployed against specific instances of semantic concepts.
Results from deployments in real-world smartgrid live forecasting applications are reported.
arXiv Detail & Related papers (2020-03-24T14:27:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.