ECVL-ROUTER: Scenario-Aware Routing for Vision-Language Models
- URL: http://arxiv.org/abs/2510.27256v1
- Date: Fri, 31 Oct 2025 07:46:44 GMT
- Title: ECVL-ROUTER: Scenario-Aware Routing for Vision-Language Models
- Authors: Xin Tang, Youfang Han, Fangfei Gou, Wei Zhao, Xin Meng, Yang Yu, Jinguo Zhang, Yuanchun Shi, Yuntao Wang, Tengxiang Zhang,
- Abstract summary: We propose ECVL-ROUTER, the first scenario-aware routing framework for Vision-Language Models (VLMs)<n>Our approach introduces a new routing strategy and evaluation metrics that dynamically select the appropriate model for each query based on user requirements.<n>Results show that our approach successfully routes over 80% of queries to the small model while incurring less than 10% drop in problem solving probability.
- Score: 26.059355108708374
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision-Language Models (VLMs) excel in diverse multimodal tasks. However, user requirements vary across scenarios, which can be categorized into fast response, high-quality output, and low energy consumption. Relying solely on large models deployed in the cloud for all queries often leads to high latency and energy cost, while small models deployed on edge devices are capable of handling simpler tasks with low latency and energy cost. To fully leverage the strengths of both large and small models, we propose ECVL-ROUTER, the first scenario-aware routing framework for VLMs. Our approach introduces a new routing strategy and evaluation metrics that dynamically select the appropriate model for each query based on user requirements, maximizing overall utility. We also construct a multimodal response-quality dataset tailored for router training and validate the approach through extensive experiments. Results show that our approach successfully routes over 80\% of queries to the small model while incurring less than 10\% drop in problem solving probability.
Related papers
- Towards Fair and Comprehensive Evaluation of Routers in Collaborative LLM Systems [46.00150374727385]
Large language models (LLMs) have achieved success, but cost and privacy constraints necessitate deploying smaller models locally.<n>We propose RouterXBench, a principled evaluation framework with three dimensions: router ability, scenario alignment, and cross-domain robustness.<n>We introduce ProbeDirichlet, a lightweight router that aggregates cross-layer hidden states via learnable Dirichlet with probabilistic training.
arXiv Detail & Related papers (2026-02-12T12:28:27Z) - Co-Training Vision Language Models for Remote Sensing Multi-task Learning [68.15604397741753]
Vision language models (VLMs) have achieved promising results in RS image understanding, grounding, and ultra-high-resolution (UHR) image reasoning.<n>We present RSCoVLM, a simple yet flexible VLM baseline for RS MTL.<n>We propose a unified dynamic-resolution strategy to address the diverse image scales inherent in RS imagery.
arXiv Detail & Related papers (2025-11-26T10:55:07Z) - Towards Resource-Efficient Multimodal Intelligence: Learned Routing among Specialized Expert Models [0.0]
Large language models (LLMs) increasingly power vision, audio, and document understanding.<n>Small open-source models offer cost advantages but struggle with complex or multimodal queries.<n>We introduce a unified, modular framework that intelligently routes each query to the most fitting expert model.
arXiv Detail & Related papers (2025-11-09T16:14:56Z) - Leveraging Generative Models for Real-Time Query-Driven Text Summarization in Large-Scale Web Search [54.987957691350665]
Query-Driven Text Summarization (QDTS) aims to generate concise and informative summaries from textual documents based on a given query.<n>Traditional extractive summarization models, based primarily on ranking candidate summary segments, have been the dominant approach in industrial applications.<n>We propose a novel framework to pioneer the application of generative models to address real-time QDTS in industrial web search.
arXiv Detail & Related papers (2025-08-28T08:51:51Z) - RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory [57.449129198822476]
RCR is a role-aware context routing framework for multi-agent large language model (LLM) systems.<n>It dynamically selects semantically relevant memory subsets for each agent based on its role and task stage.<n>A lightweight scoring policy guides memory selection, and agent outputs are integrated into a shared memory store.
arXiv Detail & Related papers (2025-08-06T21:59:34Z) - BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute [25.740809143951815]
BEST-Route is a novel routing framework that chooses a model and the number of responses to sample from it based on query difficulty and the quality thresholds.<n> Experiments on real-world datasets demonstrate that our method reduces costs by up to 60% with less than 1% performance drop.
arXiv Detail & Related papers (2025-06-28T01:52:50Z) - Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning [27.70756702796812]
We present textbf generalization-R1, a reinforcement learning framework that formulates multi-LLM routing and aggregation as a sequential decision process.<n>To facilitate learning, we employ a lightweight rule-based reward comprising format rewards, final outcome rewards, and a novel cost reward for optimizing the balance between performance and cost.
arXiv Detail & Related papers (2025-06-10T17:56:45Z) - Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques [14.892995952768352]
Language Models (LMs) have excelled at tasks like text generation, summarization, and question answering.<n>Their inference remains computationally expensive and energy intensive in settings with limited hardware, power, or bandwidth.<n>Recent approaches have introduced multi LLM intelligent model selection strategies that dynamically allocate computational resources based on query complexity.
arXiv Detail & Related papers (2025-06-06T23:13:08Z) - EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models [64.18350535770357]
We propose an automatic pruning method for large vision-language models to enhance the efficiency of multimodal reasoning.<n>Our approach only leverages a small number of samples to search for the desired pruning policy.<n>We conduct extensive experiments on the ScienceQA, Vizwiz, MM-vet, and LLaVA-Bench datasets for the task of visual question answering.
arXiv Detail & Related papers (2025-03-19T16:07:04Z) - OmniRouter: Budget and Performance Controllable Multi-LLM Routing [31.60019342381251]
Large language models (LLMs) deliver superior performance but require substantial computational resources and operate with relatively low efficiency.<n>We introduce Omni, a controllable routing framework for multi-LLM serving.<n>Experiments show that Omni achieves up to 6.30% improvement in response accuracy while simultaneously reducing computational costs by at least 10.15%.
arXiv Detail & Related papers (2025-02-27T22:35:31Z) - Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing [53.748685766139715]
Large language models (LLMs) excel in most NLP tasks but also require expensive cloud servers for deployment due to their size.
We propose a hybrid inference approach which combines their respective strengths to save cost and maintain quality.
In experiments our approach allows us to make up to 40% fewer calls to the large model, with no drop in response quality.
arXiv Detail & Related papers (2024-04-22T23:06:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.