Related papers: MOSEL: Inference Serving Using Dynamic Modality Selection

MOSEL: Inference Serving Using Dynamic Modality Selection

URL: http://arxiv.org/abs/2310.18481v1
Date: Fri, 27 Oct 2023 20:50:56 GMT
Title: MOSEL: Inference Serving Using Dynamic Modality Selection
Authors: Bodun Hu, Le Xu, Jeongyoon Moon, Neeraja J. Yadwadkar, Aditya Akella
Abstract summary: We introduce a form of dynamism, modality selection, where we adaptively choose modalities from inference inputs while maintaining the model quality. We introduce MOSEL, an automated inference serving system for multi-modal ML models that carefully picks input modalities per request based on user-defined performance and accuracy requirements.
Score: 4.849058875921672
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Rapid advancements over the years have helped machine learning models reach previously hard-to-achieve goals, sometimes even exceeding human capabilities. However, to attain the desired accuracy, the model sizes and in turn their computational requirements have increased drastically. Thus, serving predictions from these models to meet any target latency and cost requirements of applications remains a key challenge, despite recent work in building inference-serving systems as well as algorithmic approaches that dynamically adapt models based on inputs. In this paper, we introduce a form of dynamism, modality selection, where we adaptively choose modalities from inference inputs while maintaining the model quality. We introduce MOSEL, an automated inference serving system for multi-modal ML models that carefully picks input modalities per request based on user-defined performance and accuracy requirements. MOSEL exploits modality configurations extensively, improving system throughput by 3.6$\times$ with an accuracy guarantee and shortening job completion times by 11$\times$.

Related papers

Intention-Conditioned Flow Occupancy Models [69.79049994662591]
Large-scale pre-training has fundamentally changed how machine learning research is done today.<n>Applying this same framework to reinforcement learning is appealing because it offers compelling avenues for addressing core challenges in RL.<n>Recent advances in generative AI have provided new tools for modeling highly complex distributions.
arXiv Detail & Related papers (2025-06-10T15:27:46Z)
Inference-Time Intervention in Large Language Models for Reliable Requirement Verification [2.3759432635713895]
Inference-time intervention techniques provide a promising alternative to fine-tuning. We demonstrate how interventions enable fine-grained control for automating the usually time-intensive requirement verification process. Our method achieves robust and reliable outputs, significantly improving over both a baseline model and a fine-tuning approach.
arXiv Detail & Related papers (2025-03-18T10:49:36Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains. Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches. We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios. We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z)
REFRESH: Responsible and Efficient Feature Reselection Guided by SHAP Values [17.489279048199304]
REFRESH is a method to reselect features so that additional constraints that are desirable towards model performance can be achieved without having to train several new models. REFRESH's underlying algorithm is a novel technique using SHAP values and correlation analysis that can approximate for the predictions of a model without having to train these models.
arXiv Detail & Related papers (2024-03-13T18:06:43Z)
AutoXPCR: Automated Multi-Objective Model Selection for Time Series Forecasting [1.0515439489916734]
We propose AutoXPCR - a novel method for automated and explainable multi-objective model selection. Our approach leverages meta-learning to estimate any model's performance along PCR criteria, which encompass (P)redictive error, (C)omplexity, and (R)esource demand. Our method clearly outperforms other model selection approaches - on average, it only requires 20% of computation costs for recommending models with 90% of the best-possible quality.
arXiv Detail & Related papers (2023-12-20T14:04:57Z)
TrainerAgent: Customizable and Efficient Model Training through LLM-Powered Multi-Agent System [14.019244136838017]
TrainerAgent is a multi-agent framework including Task, Data, Model and Server agents. These agents analyze user-defined tasks, input data, and requirements (e.g., accuracy, speed), optimizing them from both data and model perspectives to obtain satisfactory models, and finally deploy these models as online service. This research presents a significant advancement in achieving desired models with increased efficiency and quality as compared to traditional model development.
arXiv Detail & Related papers (2023-11-11T17:39:24Z)
QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement. QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights. We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z)
Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models [1.0878040851637998]
With over 200, 000 models in the Hugging Face ecosystem, users grapple with selecting and optimizing models to suit multifaceted and data domains. Here, we propose a context-aware routing system, Tryage, that leverages a language model router for optimal selection of expert models from a model library.
arXiv Detail & Related papers (2023-08-22T17:48:24Z)
Precision-Recall Divergence Optimization for Generative Modeling with GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows. We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences. Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z)
MILO: Model-Agnostic Subset Selection Framework for Efficient Model Training and Tuning [68.12870241637636]
We propose MILO, a model-agnostic subset selection framework that decouples the subset selection from model training. Our empirical results indicate that MILO can train models $3times - 10 times$ faster and tune hyperparameters $20times - 75 times$ faster than full-dataset training or tuning without performance.
arXiv Detail & Related papers (2023-01-30T20:59:30Z)
RoMA: Robust Model Adaptation for Offline Model-based Optimization [115.02677045518692]
We consider the problem of searching an input maximizing a black-box objective function given a static dataset of input-output queries. A popular approach to solving this problem is maintaining a proxy model that approximates the true objective function. Here, the main challenge is how to avoid adversarially optimized inputs during the search.
arXiv Detail & Related papers (2021-10-27T05:37:12Z)
Modeling the Second Player in Distributionally Robust Optimization [90.25995710696425]
We argue for the use of neural generative models to characterize the worst-case distribution. This approach poses a number of implementation and optimization challenges. We find that the proposed approach yields models that are more robust than comparable baselines.
arXiv Detail & Related papers (2021-03-18T14:26:26Z)
Reinforced Deep Markov Models With Applications in Automatic Trading [0.0]
We propose a model-based RL approach, coined Reinforced Deep Markov Model (RDMM) RDMM integrates desirable properties of a reinforcement learning algorithm acting as an automatic trading system. Tests show that the RDMM is data-efficient and provides financial gains compared to the benchmarks in the optimal execution problem.
arXiv Detail & Related papers (2020-11-09T12:46:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.