Related papers: FlexServe: Deployment of PyTorch Models as Flexible REST Endpoints

FlexServe: Deployment of PyTorch Models as Flexible REST Endpoints

URL: http://arxiv.org/abs/2003.01538v1
Date: Sat, 29 Feb 2020 18:51:09 GMT
Title: FlexServe: Deployment of PyTorch Models as Flexible REST Endpoints
Authors: Edward Verenich, Alvaro Velasquez, M.G. Sarwar Murshed, Faraz Hussain
Abstract summary: integration of artificial intelligence capabilities into modern software systems is increasingly being simplified through the use of cloud-based services and representational state transfer architecture. Insufficient information regarding underlying model provenance and the lack of control over model evolution serve as an impediment to the more widespread adoption of these services in many operational environments which have strict security requirements.
Score: 6.730473762151365
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The integration of artificial intelligence capabilities into modern software systems is increasingly being simplified through the use of cloud-based machine learning services and representational state transfer architecture design. However, insufficient information regarding underlying model provenance and the lack of control over model evolution serve as an impediment to the more widespread adoption of these services in many operational environments which have strict security requirements. Furthermore, tools such as TensorFlow Serving allow models to be deployed as RESTful endpoints, but require error-prone transformations for PyTorch models as these dynamic computational graphs. This is in contrast to the static computational graphs of TensorFlow. To enable rapid deployments of PyTorch models without intermediate transformations we have developed FlexServe, a simple library to deploy multi-model ensembles with flexible batching.

Related papers

SeBS-Flow: Benchmarking Serverless Cloud Function Workflows [51.4200085836966]
We propose the first serverless workflow benchmarking suite SeBS-Flow. SeBS-Flow includes six real-world application benchmarks and four microbenchmarks representing different computational patterns. We conduct comprehensive evaluations on three major cloud platforms, assessing performance, cost, scalability, and runtime deviations.
arXiv Detail & Related papers (2024-10-04T14:52:18Z)
FlexModel: A Framework for Interpretability of Distributed Large Language Models [0.0]
We present FlexModel, a software package providing a streamlined interface for engaging with models distributed across multi- GPU and multi-node configurations. The library is compatible with existing model distribution libraries and encapsulates PyTorch models. It exposes user-registerable HookFunctions to facilitate straightforward interaction with distributed model internals.
arXiv Detail & Related papers (2023-12-05T21:19:33Z)
Predicting Resource Consumption of Kubernetes Container Systems using Resource Models [3.138731415322007]
This paper considers how to derive resource models for cloud systems empirically. We do so based on models of deployed services in a formal language with explicit adherence to CPU and memory resources. We report on leveraging data collected empirically from small deployments to simulate the execution of higher intensity scenarios on larger deployments.
arXiv Detail & Related papers (2023-05-12T17:59:01Z)
PDSketch: Integrated Planning Domain Programming and Learning [86.07442931141637]
We present a new domain definition language, named PDSketch. It allows users to flexibly define high-level structures in the transition models. Details of the transition model will be filled in by trainable neural networks.
arXiv Detail & Related papers (2023-03-09T18:54:12Z)
MetaNetwork: A Task-agnostic Network Parameters Generation Framework for Improving Device Model Generalization [65.02542875281233]
We propose a novel task-agnostic framework, named MetaNetwork, for generating adaptive device model parameters from cloud without on-device training. The MetaGenerator is designed to learn a mapping function from samples to model parameters, and it can generate and deliver the adaptive parameters to the device based on samples uploaded from the device to the cloud. The MetaStabilizer aims to reduce the oscillation of the MetaGenerator, accelerate the convergence and improve the model performance during both training and inference.
arXiv Detail & Related papers (2022-09-12T13:26:26Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
Amazon SageMaker Model Parallelism: A General and Flexible Framework for Large Model Training [10.223511922625065]
We present Amazon SageMaker model parallelism, a software library that integrates with PyTorch. It enables easy training of large models using model parallelism and other memory-saving features. We evaluate performance over GPT-3, RoBERTa, BERT, and neural collaborative filtering.
arXiv Detail & Related papers (2021-11-10T22:30:21Z)
Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration [130.89746032163106]
We propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data. We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration. We present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.
arXiv Detail & Related papers (2020-11-10T19:31:29Z)
DIETERpy: a Python framework for The Dispatch and Investment Evaluation Tool with Endogenous Renewables [62.997667081978825]
DIETER is an open-source power sector model designed to analyze future settings with very high shares of variable renewable energy sources. It minimizes overall system costs, including fixed and variable costs of various generation, flexibility and sector coupling options. We introduce DIETERpy that builds on the existing model version, written in the General Algebraic Modeling System (GAMS) and enhances it with a Python framework.
arXiv Detail & Related papers (2020-10-02T09:27:33Z)
Ensemble Distillation for Robust Model Fusion in Federated Learning [72.61259487233214]
Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model. In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side. We propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients.
arXiv Detail & Related papers (2020-06-12T14:49:47Z)
Scalable Deployment of AI Time-series Models for IoT [0.7169734491710924]
IBM Research Castor is a cloud-native system for managing and deploying time-series models in IoT applications. Model templates can be deployed against specific instances of semantic concepts. Results from deployments in real-world smartgrid live forecasting applications are reported.
arXiv Detail & Related papers (2020-03-24T14:27:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.