Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
- URL: http://arxiv.org/abs/2503.05179v2
- Date: Wed, 21 May 2025 07:47:26 GMT
- Title: Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
- Authors: Simon A. Aytes, Jinheon Baek, Sung Ju Hwang,
- Abstract summary: Chain-of-Thought prompting elicits step-by-step problem solving, but often at the cost of excessive verbosity in intermediate outputs.<n>We propose Sketch-of-Thought (SoT), a prompting framework that integrates cognitively inspired reasoning paradigms with linguistic constraints.<n>SoT achieves token reductions of up to 78% with minimal accuracy loss across 15 reasoning datasets.
- Score: 60.04718679054704
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in large language models (LLMs) have enabled strong reasoning capabilities through Chain-of-Thought (CoT) prompting, which elicits step-by-step problem solving, but often at the cost of excessive verbosity in intermediate outputs, leading to increased computational overhead. We propose Sketch-of-Thought (SoT), a prompting framework that integrates cognitively inspired reasoning paradigms with linguistic constraints to reduce token usage while preserving reasoning accuracy. SoT is designed as a flexible, modular approach and is instantiated with three paradigms--Conceptual Chaining, Chunked Symbolism, and Expert Lexicons--each tailored to distinct reasoning tasks and selected dynamically at test-time by a lightweight routing model. Across 15 reasoning datasets spanning multiple domains, languages, and modalities, SoT achieves token reductions of up to 78% with minimal accuracy loss. In tasks such as mathematical and multi-hop reasoning, it even improves accuracy while shortening outputs.
Related papers
- R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning [60.37610817226533]
Chain-of-thought (CoT) reasoning encourages step-by-step intermediate reasoning during inference.<n>CoT introduces substantial computational overhead due to its reliance on autoregressive decoding over long token sequences.<n>We present R-Stitch, a token-level, confidence-based hybrid decoding framework that accelerates CoT inference.
arXiv Detail & Related papers (2025-07-23T08:14:36Z) - PixelThink: Towards Efficient Chain-of-Pixel Reasoning [70.32510083790069]
PixelThink is a simple yet effective scheme that integrates externally estimated task difficulty and internally measured model uncertainty.<n>It learns to compress reasoning length in accordance with scene complexity and predictive confidence.<n> Experimental results demonstrate that the proposed approach improves both reasoning efficiency and overall segmentation performance.
arXiv Detail & Related papers (2025-05-29T17:55:49Z) - Adaptive Deep Reasoning: Triggering Deep Thinking When Needed [28.575411507835973]
Large language models (LLMs) have shown impressive capabilities in handling complex tasks through long-chain reasoning.<n>We propose a novel approach that autonomously switches between short and long-chain reasoning chains based on problem complexity.<n>This advancement enhances the practicality of reasoning in large language models for real-world applications.
arXiv Detail & Related papers (2025-05-26T15:08:51Z) - Fractured Chain-of-Thought Reasoning [61.647243580650446]
We introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling.<n>We show that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget.
arXiv Detail & Related papers (2025-05-19T11:30:41Z) - Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models [42.407188124841234]
Landscape of thoughts is a tool to inspect the reasoning paths of chain-of-thought on any multi-choice dataset.
It distinguishes between strong and weak models, correct and incorrect answers, as well as different reasoning tasks.
It also uncovers undesirable reasoning patterns, such as low consistency and high uncertainty.
arXiv Detail & Related papers (2025-03-28T06:09:51Z) - Chain of Draft: Thinking Faster by Writing Less [37.492654173517046]
Chain of Draft (CoD) is a novel paradigm inspired by human cognitive processes.<n>CoD generates minimalistic yet informative intermediate reasoning outputs while solving tasks.
arXiv Detail & Related papers (2025-02-25T19:36:06Z) - LightThinker: Thinking Step-by-Step Compression [53.8069487638972]
We propose LightThinker, a method that enables large language models to dynamically compress intermediate thoughts during reasoning.<n>Inspired by human cognitive processes, LightThinker compresses thought steps into compact representations and discards the original reasoning chains.<n>Experiments show that LightThinker reduces peak memory usage and inference time, while maintaining competitive accuracy.
arXiv Detail & Related papers (2025-02-21T16:57:22Z) - SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs [48.28847964704554]
Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to solve complex reasoning tasks.<n>We propose a novel approach for continuous-space reasoning that does not require modifying the LLM.
arXiv Detail & Related papers (2025-02-17T18:52:29Z) - Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning [44.84219266082269]
Large Language Models (LLMs) excel at reasoning and planning when trained on chainof-thought (CoT) data.<n>We propose a hybrid representation of the reasoning process, where we partially abstract away the initial reasoning steps using latent discrete tokens.
arXiv Detail & Related papers (2025-02-05T15:33:00Z) - Efficient Reasoning with Hidden Thinking [48.96945580741641]
Chain-of-Thought (CoT) reasoning has become a powerful framework for improving complex problem-solving capabilities.
We propose $textbfHeima$ (as hidden llama), an efficient reasoning framework that leverages reasoning CoTs at hidden latent space.
Heima model achieves higher generation efficiency while maintaining or even better zero-shot task accuracy.
arXiv Detail & Related papers (2025-01-31T15:10:29Z) - Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning [40.069109287947875]
We propose a novel reasoning framework called Forest-of-Thought (FoT)<n>FoT integrates multiple reasoning trees to leverage collective decision-making for solving complex logical problems.<n>FoT employs sparse activation strategies to select the most relevant reasoning paths, improving both efficiency and accuracy.
arXiv Detail & Related papers (2024-12-12T09:01:18Z) - Training Large Language Models to Reason in a Continuous Latent Space [84.5618790930725]
We introduce a new paradigm Coconut (Chain of Continuous Thought) to explore the potential of large language models (LLMs) reasoning in an unrestricted latent space.<n>Experiments show that Coconut can effectively augment the LLM on several reasoning tasks.<n>These findings demonstrate the promise of latent reasoning and offer valuable insights for future research.
arXiv Detail & Related papers (2024-12-09T18:55:56Z) - Markov Chain of Thought for Efficient Mathematical Reasoning [10.678633785012691]
Chain of Thought (CoT) of multi-step benefits from the logical structure of the reasoning steps and task-specific actions.<n>We conceptualize the standard multi-step CoT as a novel Markov Chain of Thought (MCoT)<n>Our MCoT aims to compress previous reasoning steps into a simplified question, enabling efficient next-step inference.
arXiv Detail & Related papers (2024-10-23T07:53:29Z) - Think Beyond Size: Adaptive Prompting for More Effective Reasoning [0.0]
We introduce Adaptive Prompting, a dynamic and iterative framework designed to enhance reasoning by incorporating real-time adjustments to prompt structures and validation mechanisms.<n>Results demonstrate that Adaptive Prompting significantly improves performance on diverse reasoning benchmarks, including arithmetic reasoning (GSM8K, MultiArithm), logical reasoning and commonsense tasks.<n>Our approach enables smaller models to achieve competitive performance with larger counterparts, such as GPT-4, while maintaining computational efficiency.
arXiv Detail & Related papers (2024-10-10T17:14:36Z) - LINC: A Neurosymbolic Approach for Logical Reasoning by Combining
Language Models with First-Order Logic Provers [60.009969929857704]
Logical reasoning is an important task for artificial intelligence with potential impacts on science, mathematics, and society.
In this work, we reformulating such tasks as modular neurosymbolic programming, which we call LINC.
We observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate.
arXiv Detail & Related papers (2023-10-23T17:58:40Z) - Visual Chain of Thought: Bridging Logical Gaps with Multimodal
Infillings [61.04460792203266]
We introduce VCoT, a novel method that leverages chain-of-thought prompting with vision-language grounding to bridge the logical gaps within sequential data.
Our method uses visual guidance to generate synthetic multimodal infillings that add consistent and novel information to reduce the logical gaps for downstream tasks.
arXiv Detail & Related papers (2023-05-03T17:58:29Z) - Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models [81.01397924280612]
Large language models (LLMs) can achieve highly effective performance on various reasoning tasks by incorporating step-by-step chain-of-thought (CoT) prompting as demonstrations.
We introduce Iter-CoT (Iterative bootstrapping in Chain-of-Thoughts Prompting), an iterative bootstrapping approach for selecting exemplars and generating reasoning chains.
arXiv Detail & Related papers (2023-04-23T13:54:39Z) - Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep.
We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.