Related papers: Doing More with Less -- Implementing Routing Strategies in Large Language Model-Based Systems: An Extended Survey

Doing More with Less -- Implementing Routing Strategies in Large Language Model-Based Systems: An Extended Survey

URL: http://arxiv.org/abs/2502.00409v2
Date: Tue, 04 Feb 2025 09:12:03 GMT
Title: Doing More with Less -- Implementing Routing Strategies in Large Language Model-Based Systems: An Extended Survey
Authors: Clovis Varangot-Reille, Christophe Bouvard, Antoine Gourru, Mathieu Ciancone, Marion Schaeffer, François Jacquenet,
Abstract summary: Large Language Models (LLM)-based systems rely on a single LLM for all user queries.<n>They often require different preprocessing strategies, levels of reasoning, or knowledge.<n>This paper explores key considerations for integrating routing into LLM-based systems.
Score: 1.430963201405577
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLM)-based systems, i.e. interconnected elements that include an LLM as a central component (e.g., conversational agents), are typically monolithic static architectures that rely on a single LLM for all user queries. However, they often require different preprocessing strategies, levels of reasoning, or knowledge. Generalist LLMs (e.g. GPT-4) trained on very large multi-topic corpora can perform well in a variety of tasks. They require significant financial, energy, and hardware resources that may not be justified for basic tasks. This implies potentially investing in unnecessary costs for a given query. To overcome this problem, a routing mechanism routes user queries to the most suitable components, such as smaller LLMs or experts in specific topics. This approach may improve response quality while minimising costs. Routing can be expanded to other components of the conversational agent architecture, such as the selection of optimal embedding strategies. This paper explores key considerations for integrating routing into LLM-based systems, focusing on resource management, cost definition, and strategy selection. Our main contributions include a formalisation of the problem, a novel taxonomy of existing approaches emphasising relevance and resource efficiency, and a comparative analysis of these strategies in relation to industry practices. Finally, we identify critical challenges and directions for future research.

Related papers

R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models. Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start. Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z)
Smart Routing: Cost-Effective Multi-LLM Serving for Multi-Core AIOS [31.60019342381251]
Existing scheduling frameworks mainly target at latency optimization. This paper proposes an efficient capability-cost coordinated scheduling framework, ECCOS, for multi-LLM serving.
arXiv Detail & Related papers (2025-02-27T22:35:31Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks. However, they still struggle with problems requiring multi-step decision-making and environmental feedback. We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
Universal Model Routing for Efficient LLM Inference [72.65083061619752]
We consider the problem of dynamic routing, where new, previously unobserved LLMs are available at test time. We propose a new approach to this problem that relies on representing each LLM as a feature vector, derived based on predictions on a set of representative prompts. We prove that these strategies are estimates of a theoretically optimal routing rule, and provide an excess risk bound to quantify their errors.
arXiv Detail & Related papers (2025-02-12T20:30:28Z)
A Survey of Query Optimization in Large Language Models [10.255235456427037]
RAG mitigates the limitations of Large Language Models by dynamically retrieving and leveraging up-to-date relevant information.<n>QO has emerged as a critical element, playing a pivotal role in determining the effectiveness of RAG's retrieval stage.
arXiv Detail & Related papers (2024-12-23T13:26:04Z)
Large Language Models for Knowledge-Free Network Management: Feasibility Study and Opportunities [36.70339455624253]
This article presents a novel knowledge-free network management paradigm with the power of foundation models called large language models (LLMs) LLMs can understand important contexts from input prompts containing minimal system information, thereby offering remarkable inference performance even for entirely new tasks. Numerical results validate that knowledge-free LLMs are able to achieve comparable performance to existing knowledge-based optimization algorithms.
arXiv Detail & Related papers (2024-10-06T07:42:23Z)
TensorOpera Router: A Multi-Model Router for Efficient LLM Inference [27.2803289964386]
TO-lemma is a non-monolithic LLM querying system. It seamlessly integrates various LLM experts into a single query interface. It dynamically routes incoming queries to the most high-performant expert based on query's requirements.
arXiv Detail & Related papers (2024-08-22T11:57:07Z)
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z)
Efficient Prompting for LLM-based Generative Internet of Things [88.84327500311464]
Large language models (LLMs) have demonstrated remarkable capacities on various tasks, and integrating the capacities of LLMs into the Internet of Things (IoT) applications has drawn much research attention recently. Due to security concerns, many institutions avoid accessing state-of-the-art commercial LLM services, requiring the deployment and utilization of open-source LLMs in a local network setting. We propose a LLM-based Generative IoT (GIoT) system deployed in the local network setting in this study.
arXiv Detail & Related papers (2024-06-14T19:24:00Z)
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA) We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity. We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z)
ReSLLM: Large Language Models are Strong Resource Selectors for Federated Search [35.44746116088232]
Federated search will become increasingly pivotal in the context of Retrieval-Augmented Generation pipelines. Current SOTA resource selection methodologies rely on feature-based learning approaches. We propose ReSLLM to drive the selection of resources in federated search in a zero-shot setting.
arXiv Detail & Related papers (2024-01-31T07:58:54Z)
How Can Recommender Systems Benefit from Large Language Models: A Survey [82.06729592294322]
Large language models (LLM) have shown impressive general intelligence and human-like capabilities. We conduct a comprehensive survey on this research direction from the perspective of the whole pipeline in real-world recommender systems.
arXiv Detail & Related papers (2023-06-09T11:31:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.