Doing More with Less -- Implementing Routing Strategies in Large Language Model-Based Systems: An Extended Survey
- URL: http://arxiv.org/abs/2502.00409v2
- Date: Tue, 04 Feb 2025 09:12:03 GMT
- Title: Doing More with Less -- Implementing Routing Strategies in Large Language Model-Based Systems: An Extended Survey
- Authors: Clovis Varangot-Reille, Christophe Bouvard, Antoine Gourru, Mathieu Ciancone, Marion Schaeffer, François Jacquenet,
- Abstract summary: Large Language Models (LLM)-based systems rely on a single LLM for all user queries.
They often require different preprocessing strategies, levels of reasoning, or knowledge.
This paper explores key considerations for integrating routing into LLM-based systems.
- Score: 1.430963201405577
- License:
- Abstract: Large Language Models (LLM)-based systems, i.e. interconnected elements that include an LLM as a central component (e.g., conversational agents), are typically monolithic static architectures that rely on a single LLM for all user queries. However, they often require different preprocessing strategies, levels of reasoning, or knowledge. Generalist LLMs (e.g. GPT-4) trained on very large multi-topic corpora can perform well in a variety of tasks. They require significant financial, energy, and hardware resources that may not be justified for basic tasks. This implies potentially investing in unnecessary costs for a given query. To overcome this problem, a routing mechanism routes user queries to the most suitable components, such as smaller LLMs or experts in specific topics. This approach may improve response quality while minimising costs. Routing can be expanded to other components of the conversational agent architecture, such as the selection of optimal embedding strategies. This paper explores key considerations for integrating routing into LLM-based systems, focusing on resource management, cost definition, and strategy selection. Our main contributions include a formalisation of the problem, a novel taxonomy of existing approaches emphasising relevance and resource efficiency, and a comparative analysis of these strategies in relation to industry practices. Finally, we identify critical challenges and directions for future research.
Related papers
- Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.
However, they still struggle with problems requiring multi-step decision-making and environmental feedback.
We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - Universal Model Routing for Efficient LLM Inference [72.65083061619752]
We consider the problem of dynamic routing, where new, previously unobserved LLMs are available at test time.
We propose a new approach to this problem that relies on representing each LLM as a feature vector, derived based on predictions on a set of representative prompts.
We prove that these strategies are estimates of a theoretically optimal routing rule, and provide an excess risk bound to quantify their errors.
arXiv Detail & Related papers (2025-02-12T20:30:28Z) - A Survey of Query Optimization in Large Language Models [10.255235456427037]
RAG mitigates the limitations of Large Language Models by dynamically retrieving and leveraging up-to-date relevant information.
QO has emerged as a critical element, playing a pivotal role in determining the effectiveness of RAG's retrieval stage.
arXiv Detail & Related papers (2024-12-23T13:26:04Z) - Large Language Models for Knowledge-Free Network Management: Feasibility Study and Opportunities [36.70339455624253]
This article presents a novel knowledge-free network management paradigm with the power of foundation models called large language models (LLMs)
LLMs can understand important contexts from input prompts containing minimal system information, thereby offering remarkable inference performance even for entirely new tasks.
Numerical results validate that knowledge-free LLMs are able to achieve comparable performance to existing knowledge-based optimization algorithms.
arXiv Detail & Related papers (2024-10-06T07:42:23Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - Meta Reasoning for Large Language Models [58.87183757029041]
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs)
MRP guides LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task.
We evaluate the effectiveness of MRP through comprehensive benchmarks.
arXiv Detail & Related papers (2024-06-17T16:14:11Z) - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z) - ReSLLM: Large Language Models are Strong Resource Selectors for
Federated Search [35.44746116088232]
Federated search will become increasingly pivotal in the context of Retrieval-Augmented Generation pipelines.
Current SOTA resource selection methodologies rely on feature-based learning approaches.
We propose ReSLLM to drive the selection of resources in federated search in a zero-shot setting.
arXiv Detail & Related papers (2024-01-31T07:58:54Z) - How Can Recommender Systems Benefit from Large Language Models: A Survey [82.06729592294322]
Large language models (LLM) have shown impressive general intelligence and human-like capabilities.
We conduct a comprehensive survey on this research direction from the perspective of the whole pipeline in real-world recommender systems.
arXiv Detail & Related papers (2023-06-09T11:31:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.