Forest Before Trees: Latent Superposition for Efficient Visual Reasoning
- URL: http://arxiv.org/abs/2601.06803v1
- Date: Sun, 11 Jan 2026 08:30:49 GMT
- Title: Forest Before Trees: Latent Superposition for Efficient Visual Reasoning
- Authors: Yubo Wang, Juntian Zhang, Yichen Wu, Yankai Lin, Nils Lukas, Yuhan Liu,
- Abstract summary: Laser is a novel paradigm that reformulates visual deduction via Dynamic Windowed Alignment Learning (DWAL)<n>Laser achieves state-of-the-art performance among latent reasoning methods, surpassing the strong baseline Monet by 5.03% on average.
- Score: 61.29300723302152
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: While Chain-of-Thought empowers Large Vision-Language Models with multi-step reasoning, explicit textual rationales suffer from an information bandwidth bottleneck, where continuous visual details are discarded during discrete tokenization. Recent latent reasoning methods attempt to address this challenge, but often fall prey to premature semantic collapse due to rigid autoregressive objectives. In this paper, we propose Laser, a novel paradigm that reformulates visual deduction via Dynamic Windowed Alignment Learning (DWAL). Instead of forcing a point-wise prediction, Laser aligns the latent state with a dynamic validity window of future semantics. This mechanism enforces a "Forest-before-Trees" cognitive hierarchy, enabling the model to maintain a probabilistic superposition of global features before narrowing down to local details. Crucially, Laser maintains interpretability via decodable trajectories while stabilizing unconstrained learning via Self-Refined Superposition. Extensive experiments on 6 benchmarks demonstrate that Laser achieves state-of-the-art performance among latent reasoning methods, surpassing the strong baseline Monet by 5.03% on average. Notably, it achieves these gains with extreme efficiency, reducing inference tokens by more than 97%, while demonstrating robust generalization to out-of-distribution domains.
Related papers
- Seeing Farther and Smarter: Value-Guided Multi-Path Reflection for VLM Policy Optimization [41.15414881730464]
Vision-Language Models (VLMs) offer a general perceive-reason-act framework for this goal.<n>Previous approaches rely on inefficient and often inaccurate implicit learning of state-values from noisy foresight predictions.<n>We propose a novel test-time computation framework that decouples state evaluation from action generation.
arXiv Detail & Related papers (2026-02-22T22:53:16Z) - Accordion-Thinking: Self-Regulated Step Summaries for Efficient and Readable LLM Reasoning [62.680551162054975]
We introduce an end-to-end framework where LLMs learn to self-regulate the granularity of the reasoning steps through dynamic summarization.<n>We apply reinforcement learning to incentivize this capability further, uncovering a critical insight: the accuracy gap between the highly efficient Fold mode and the exhaustive Unfold mode progressively narrows.<n>Our Accordion-Thinker demonstrates that with learned self-compression, LLMs can tackle complex reasoning tasks with minimal dependency token overhead.
arXiv Detail & Related papers (2026-02-03T08:34:20Z) - Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization [56.59356959631999]
Gated Perception-Reasoning Optimization (GPRO) is a meta-reasoning controller that dynamically routes computation among three decision paths.<n>GPRO substantially improves both accuracy and efficiency, outperforming recent slow-thinking methods.
arXiv Detail & Related papers (2026-01-07T23:05:17Z) - Analyzing Reasoning Consistency in Large Multimodal Models under Cross-Modal Conflicts [74.47786985522762]
We identify a critical failure mode termed textual inertia, where models tend to blindly adhere to the erroneous text while neglecting conflicting visual evidence.<n>We propose the LogicGraph Perturbation Protocol that structurally injects perturbations into the reasoning chains of diverse LMMs.<n>Results reveal that models successfully self-correct in less than 10% of cases and predominantly succumb to blind textual error propagation.
arXiv Detail & Related papers (2026-01-07T16:39:34Z) - A Survey on Latent Reasoning [100.54120559169735]
Large Language Models (LLMs) have demonstrated impressive reasoning capabilities.<n>CoT reasoning that verbalizes intermediate steps limits the model's expressive bandwidth.<n>Latent reasoning tackles this bottleneck by performing multi-step inference entirely in the model's continuous hidden state.
arXiv Detail & Related papers (2025-07-08T17:29:07Z) - Lost at the Beginning of Reasoning [85.17612793300238]
We show that the first reasoning step exerts a disproportionately large influence on the final prediction.<n>We propose an efficient sampling strategy that leverages a reward model to identify and retain high-quality first reasoning steps.
arXiv Detail & Related papers (2025-06-27T09:53:57Z) - Efficient Post-Training Refinement of Latent Reasoning in Large Language Models [22.878147805601706]
Chain-of-Thought prompting suffers from sufficient token overhead and a fixed reasoning trajectory, preventing step-wise refinement.<n>Recent advances in latent reasoning address these limitations by refining internal reasoning processes directly in the model's latent space.<n>We propose a lightweight post-training framework that refines latent reasoning trajectories using two novel strategies.
arXiv Detail & Related papers (2025-06-10T08:17:16Z) - Beyond Exponential Decay: Rethinking Error Accumulation in Large Language Models [0.0]
We show that errors are not uniformly distributed but are concentrated at sparse "key tokens" representing critical decision junctions.<n>We propose a framework for next-generation systems centered on selective preservation of semantically vital tokens.
arXiv Detail & Related papers (2025-05-30T03:57:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.