Desiderata for next generation of ML model serving
- URL: http://arxiv.org/abs/2210.14665v1
- Date: Wed, 26 Oct 2022 12:29:25 GMT
- Title: Desiderata for next generation of ML model serving
- Authors: Sherif Akoush, Andrei Paleyes, Arnaud Van Looveren and Clive Cox
- Abstract summary: This paper puts forth a range of important qualities that next generation of inference platforms should be aiming for.
An overarching design pattern is data-centricity, which enables smarter monitoring in ML system operation.
- Score: 0.34410212782758054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inference is a significant part of ML software infrastructure. Despite the
variety of inference frameworks available, the field as a whole can be
considered in its early days. This paper puts forth a range of important
qualities that next generation of inference platforms should be aiming for. We
present our rationale for the importance of each quality, and discuss ways to
achieve it in practice. An overarching design pattern is data-centricity, which
enables smarter monitoring in ML system operation.
Related papers
- Coding for Intelligence from the Perspective of Category [66.14012258680992]
Coding targets compressing and reconstructing data, and intelligence.
Recent trends demonstrate the potential homogeneity of these two fields.
We propose a novel problem of Coding for Intelligence from the category theory view.
arXiv Detail & Related papers (2024-07-01T07:05:44Z) - Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation [36.50624138061438]
Pistis-RAG is a scalable multi-stage framework designed to address the challenges of large-scale retrieval-augmented generation (RAG) systems.
Our framework consists of distinct stages: matching, pre-ranking, ranking, reasoning, and aggregating.
Our novel ranking stage is designed specifically for RAG systems, incorporating principles of information retrieval.
arXiv Detail & Related papers (2024-06-21T08:52:11Z) - A Survey on Efficient Inference for Large Language Models [25.572035747669275]
Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks.
The substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios.
This paper presents a comprehensive survey of the existing literature on efficient LLM inference.
arXiv Detail & Related papers (2024-04-22T15:53:08Z) - A Large-Scale Evaluation of Speech Foundation Models [110.95827399522204]
We establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the foundation model paradigm for speech.
We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads.
arXiv Detail & Related papers (2024-04-15T00:03:16Z) - Prospector Heads: Generalized Feature Attribution for Large Models & Data [82.02696069543454]
We introduce prospector heads, an efficient and interpretable alternative to explanation-based attribution methods.
We demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in input data.
arXiv Detail & Related papers (2024-02-18T23:01:28Z) - EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning [84.6451394629312]
We introduce EgoPlan-Bench, a benchmark to evaluate the planning abilities of MLLMs in real-world scenarios.
We show that EgoPlan-Bench poses significant challenges, highlighting a substantial scope for improvement in MLLMs to achieve human-level task planning.
We also present EgoPlan-IT, a specialized instruction-tuning dataset that effectively enhances model performance on EgoPlan-Bench.
arXiv Detail & Related papers (2023-12-11T03:35:58Z) - Data-centric Operational Design Domain Characterization for Machine
Learning-based Aeronautical Products [4.8461049669050915]
We give first rigorous characterization of Operational Design Domains (ODDs) for Machine Learning (ML)-based aeronautical products.
We propose the dimensions along which the parameters that define an ODD can be explicitly captured, together with a categorization of the data that ML-based applications can encounter in operation.
arXiv Detail & Related papers (2023-07-15T02:08:33Z) - Exploring the potential of flow-based programming for machine learning
deployment in comparison with service-oriented architectures [8.677012233188968]
We argue that part of the reason is infrastructure that was not designed for activities around data collection and analysis.
We propose to consider flow-based programming with data streams as an alternative to commonly used service-oriented architectures for building software applications.
arXiv Detail & Related papers (2021-08-09T15:06:02Z) - The Benchmark Lottery [114.43978017484893]
"A benchmark lottery" describes the overall fragility of the machine learning benchmarking process.
We show that the relative performance of algorithms may be altered significantly simply by choosing different benchmark tasks.
arXiv Detail & Related papers (2021-07-14T21:08:30Z) - Counterfactual Explanations for Machine Learning on Multivariate Time
Series Data [0.9274371635733836]
This paper proposes a novel explainability technique for providing counterfactual explanations for supervised machine learning frameworks.
The proposed method outperforms state-of-the-art explainability methods on several different ML frameworks and data sets in metrics such as faithfulness and robustness.
arXiv Detail & Related papers (2020-08-25T02:04:59Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.