Related papers: Efficient Test-Time Scaling for Small Vision-Language Models

Efficient Test-Time Scaling for Small Vision-Language Models

URL: http://arxiv.org/abs/2510.03574v1
Date: Fri, 03 Oct 2025 23:49:06 GMT
Title: Efficient Test-Time Scaling for Small Vision-Language Models
Authors: Mehmet Onurcan Kaya, Desmond Elliott, Dim P. Papadopoulos,
Abstract summary: Small Vision-Language Models (VLMs) provide a computationally efficient alternative to larger models.<n>Existing methods are typically computationally demanding, contradicting the resource-efficient design goals of small models.<n>We propose two novel and efficient test-time scaling strategies that leverage the model-internal features rather than external supervision.
Score: 14.654047034885288
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Small Vision-Language Models (VLMs) provide a computationally efficient alternative to larger models, at the cost of weaker generalization abilities and downstream task performance. These shortcomings could be addressed by test-time scaling techniques, but existing methods are typically computationally demanding, contradicting the resource-efficient design goals of small models. To address these limitations, we propose two novel and efficient test-time scaling strategies that leverage the model-internal features rather than external supervision: (i) Test-Time Augmentation (TTAug), which generates multiple augmented inputs and aggregates outputs at the token level without parameter updates, and (ii) Test-Time Adaptation (TTAdapt), which adapts model parameters during inference using consensus-based pseudolabels from TTAug. Through extensive experiments across nine benchmarks, we demonstrate consistent performance improvements while maintaining computational efficiency suitable for resource-constrained environments. The generality of our approach is demonstrated both within models at different scales and across different VLMs without additional tuning.

Related papers

MatryoshkaThinking: Recursive Test-Time Scaling Enables Efficient Reasoning [33.47806621047652]
MatryoshkaThinking is a novel method that significantly reduces computational cost while maintaining state-of-the-art performance.<n>MatryoshkaThinking attains a score of 99.79 on AIME2025 using only 4% of the computation required by DeepConf.
arXiv Detail & Related papers (2025-10-11T17:18:12Z)
ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models [102.4511331368587]
ARISE (Adaptive Resolution-aware Scaling Evaluation) is a novel metric designed to assess the test-time scaling effectiveness of large reasoning models.<n>We conduct comprehensive experiments evaluating state-of-the-art reasoning models across diverse domains.
arXiv Detail & Related papers (2025-10-07T15:10:51Z)
Test-Time Model Adaptation for Quantized Neural Networks [37.84294929199108]
Quantized models often suffer from severe performance degradation in dynamic environments with potential domain shifts.<n>Test-time adaptation (TTA) has emerged as an effective solution by enabling models to learn adaptively from test data.<n>We propose a continual zeroth-order adaptation (ZOA) framework that enables efficient model adaptation using only two forward passes.
arXiv Detail & Related papers (2025-08-04T08:24:19Z)
TAPS : Frustratingly Simple Test Time Active Learning for VLMs [0.0]
Test-Time Optimization enables models to adapt to new data during inference by updating parameters on-the-fly.<n>We propose a novel Test-Time Active Learning framework that adaptively queries uncertain samples and updates prompts dynamically.<n>Our framework provides a practical and effective solution for real-world deployment in safety-critical applications such as autonomous systems and medical diagnostics.
arXiv Detail & Related papers (2025-07-26T18:04:49Z)
Accelerated Test-Time Scaling with Model-Free Speculative Sampling [58.69141724095398]
We introduce STAND (STochastic Adaptive N-gram Drafting), a novel model-free speculative decoding approach.<n>We show that STAND reduces inference latency by 60-65% compared to standard autoregressive decoding.<n>As a model-free approach, STAND can be applied to any existing language model without additional training.
arXiv Detail & Related papers (2025-06-05T07:31:18Z)
Latent Thought Models with Variational Bayes Inference-Time Computation [52.63299874322121]
Latent Thought Models (LTMs) incorporate explicit latent thought vectors that follow an explicit prior model in latent space.<n>LTMs demonstrate superior sample and parameter efficiency compared to autoregressive models and discrete diffusion models.
arXiv Detail & Related papers (2025-02-03T17:50:34Z)
SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling [39.57154199908565]
Self-Enhanced Test-Time Scaling (SETS) is a simple yet effective approach that overcomes limitations by strategically combining parallel and sequential techniques.<n>SETS exploits the inherent self-verification and self- computation capabilities of Large Language Models, unifying sampling, verification, and correction within a single framework.<n>Our results show SETS achieves significant performance improvements and more advantageous test-time scaling behavior than the alternatives.
arXiv Detail & Related papers (2025-01-31T17:03:16Z)
Words Matter: Leveraging Individual Text Embeddings for Code Generation in CLIP Test-Time Adaptation [21.20806568508201]
We show how to leverage class text information to mitigate distribution drifts encountered by vision-language models (VLMs) during test-time inference.<n>We propose to generate pseudo-labels for the test-time samples by exploiting generic class text embeddings as fixed centroids of a label assignment problem.<n>Experiments on multiple popular test-time adaptation benchmarks presenting diverse complexity empirically show the superiority of CLIP-OT.
arXiv Detail & Related papers (2024-11-26T00:15:37Z)
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive. We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation. Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.