Dual-Density Inference for Efficient Language Model Reasoning
- URL: http://arxiv.org/abs/2512.15358v1
- Date: Wed, 17 Dec 2025 12:04:05 GMT
- Title: Dual-Density Inference for Efficient Language Model Reasoning
- Authors: Zhengyi Zhao, Shubo Zhang, Yuxi Zhang, Huimin Wang, Binyang Li, Kam-Fai Wong,
- Abstract summary: We present Denser: underlineDual-dunderlineensity infunderlineerence, a novel framework that optimize information density separately for reasoning and answering phases.<n>Our framework implements this through three components: a query processing module that analyzes input problems, a high-density compressed reasoning mechanism for efficient intermediate computations, and an answer generation component that translates compressed reasoning into human-readable solutions.
- Score: 26.002819535382855
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have shown impressive capabilities in complex reasoning tasks. However, current approaches employ uniform language density for both intermediate reasoning and final answers, leading to computational inefficiency. Our observation found that reasoning process serves a computational function for the model itself, while answering serves a communicative function for human understanding. This distinction enables the use of compressed, symbol-rich language for intermediate computations while maintaining human-readable final explanations. To address this inefficiency, we present Denser: \underline{D}ual-d\underline{ens}ity inf\underline{er}ence, a novel framework that optimizes information density separately for reasoning and answering phases. Our framework implements this through three components: a query processing module that analyzes input problems, a high-density compressed reasoning mechanism for efficient intermediate computations, and an answer generation component that translates compressed reasoning into human-readable solutions. Experimental evaluation across multiple reasoning question answering benchmarks demonstrates that Denser reduces token consumption by up to 62\% compared to standard Chain-of-Thought methods while preserving or improving accuracy. These efficiency gains are particularly significant for complex multi-step reasoning problems where traditional methods generate extensive explanations.
Related papers
- ProofSketch: Efficient Verified Reasoning for Large Language Models [0.0]
We propose ProofSketch, a verification-guided reasoning framework that integrates symbolic closure, lexicographic verification and adaptive sketch generation.<n>Our experiments show that ProofSketch consistently reduces token usage while improving accuracy, demonstrating that this approach offers a promising path for efficient and trustworthy reasoning.
arXiv Detail & Related papers (2025-10-28T06:34:15Z) - Selection, Reflection and Self-Refinement: Revisit Reasoning Tasks via a Causal Lens [19.316594303998667]
Reasoning tasks have long been regarded as rigorous benchmarks for assessing the capabilities of machine learning models.<n>We revisit reasoning tasks from a causal perspective, seeking to understand their behavior in latent space.<n>We introduce a framework, called SR$2$, that incorporates the estimated latent variables as feedback into the selection mechanism.
arXiv Detail & Related papers (2025-10-09T13:45:31Z) - Adaptive Test-Time Reasoning via Reward-Guided Dual-Phase Search [62.1546099504045]
We propose a dual-phase test-time scaling framework that separates reasoning into planning and execution.<n>Specifically, we decompose reasoning trajectories and develop reward models for each phase, enabling the search to explore and prune plans and executions separately.<n> Experiments on both mathematical reasoning and code generation benchmarks demonstrate that our approach consistently improves accuracy while reducing computation redundant.
arXiv Detail & Related papers (2025-09-29T19:27:23Z) - A Formal Comparison Between Chain-of-Thought and Latent Thought [32.84174396586435]
Chain-of-Thought (CoT) elicits reasoning in large language models by explicitly generating intermediate steps in natural language.<n>Latent Thought in looped models operates directly in the continuous latent space, enabling computation beyond discrete linguistic representations.
arXiv Detail & Related papers (2025-09-25T11:27:52Z) - Implicit Reasoning in Large Language Models: A Comprehensive Survey [67.53966514728383]
Large Language Models (LLMs) have demonstrated strong generalization across a wide range of tasks.<n>Recent studies have shifted attention from explicit chain-of-thought prompting toward implicit reasoning.<n>This survey introduces a taxonomy centered on execution paradigms, shifting the focus from representational forms to computational strategies.
arXiv Detail & Related papers (2025-09-02T14:16:02Z) - Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute [60.151643048803145]
We propose Fractional Reasoning, a framework that enables continuous control over reasoning intensity at inference time.<n>Our method operates by extracting the latent steering vector associated with deeper reasoning and reapplying it with a tunable scaling factor.<n> Experiments on GSM8K, MATH500, and GPQA demonstrate that Fractional Reasoning consistently improves performance across diverse reasoning tasks and models.
arXiv Detail & Related papers (2025-06-18T21:15:59Z) - PixelThink: Towards Efficient Chain-of-Pixel Reasoning [70.32510083790069]
PixelThink is a simple yet effective scheme that integrates externally estimated task difficulty and internally measured model uncertainty.<n>It learns to compress reasoning length in accordance with scene complexity and predictive confidence.<n> Experimental results demonstrate that the proposed approach improves both reasoning efficiency and overall segmentation performance.
arXiv Detail & Related papers (2025-05-29T17:55:49Z) - Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models [49.61246073215651]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks.<n>Recent advancements in OpenAI o1 and DeepSeek-R1 have further improved performance in System-2 reasoning domains.<n>However, they also introduce significant computational overhead due to verbose and redundant outputs.
arXiv Detail & Related papers (2025-03-20T17:59:38Z) - Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [64.74765550805024]
Chain-of-Thought prompting elicits step-by-step problem solving, but often at the cost of excessive verbosity in intermediate outputs.<n>We propose Sketch-of-Thought (SoT), a prompting framework that integrates cognitively inspired reasoning paradigms with linguistic constraints.<n>SoT achieves token reductions of up to 84% with minimal accuracy loss across 18 reasoning datasets.
arXiv Detail & Related papers (2025-03-07T06:57:17Z) - Preventing Language Models From Hiding Their Reasoning [0.0]
Large language models (LLMs) often benefit from intermediate steps of reasoning to generate answers to complex problems.
In this work, we focus on one potential way intermediate steps of reasoning could be unfaithful: encoded reasoning.
We show that language models can be trained to make use of encoded reasoning to get higher performance without the user understanding the intermediate steps of reasoning.
arXiv Detail & Related papers (2023-10-27T22:02:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.