Sticker-TTS: Learn to Utilize Historical Experience with a Sticker-driven Test-Time Scaling Framework
- URL: http://arxiv.org/abs/2509.05007v2
- Date: Mon, 08 Sep 2025 03:26:03 GMT
- Title: Sticker-TTS: Learn to Utilize Historical Experience with a Sticker-driven Test-Time Scaling Framework
- Authors: Jie Chen, Jinhao Jiang, Yingqian Min, Zican Dong, Shijie Wang, Wayne Xin Zhao, Ji-Rong Wen,
- Abstract summary: We propose Sticker-TTS, a novel test-time scaling framework for large reasoning models.<n>At the core of our framework are distilled key conditions-termed stickers that drive the extraction, refinement, and reuse of critical information.<n>We show that Sticker-TTS consistently surpasses strong baselines, including self-consistency and advanced reinforcement learning approaches.
- Score: 97.11629413081651
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large reasoning models (LRMs) have exhibited strong performance on complex reasoning tasks, with further gains achievable through increased computational budgets at inference. However, current test-time scaling methods predominantly rely on redundant sampling, ignoring the historical experience utilization, thereby limiting computational efficiency. To overcome this limitation, we propose Sticker-TTS, a novel test-time scaling framework that coordinates three collaborative LRMs to iteratively explore and refine solutions guided by historical attempts. At the core of our framework are distilled key conditions-termed stickers-which drive the extraction, refinement, and reuse of critical information across multiple rounds of reasoning. To further enhance the efficiency and performance of our framework, we introduce a two-stage optimization strategy that combines imitation learning with self-improvement, enabling progressive refinement. Extensive evaluations on three challenging mathematical reasoning benchmarks, including AIME-24, AIME-25, and OlymMATH, demonstrate that Sticker-TTS consistently surpasses strong baselines, including self-consistency and advanced reinforcement learning approaches, under comparable inference budgets. These results highlight the effectiveness of sticker-guided historical experience utilization. Our code and data are available at https://github.com/RUCAIBox/Sticker-TTS.
Related papers
- S3-CoT: Self-Sampled Succinct Reasoning Enables Efficient Chain-of-Thought LLMs [48.80914119283909]
Large language models equipped with chain-of-thought (CoT) achieve strong performance and offer a window into behavior.<n>Recent evidence suggests that improvements in CoT capabilities often come with redundant reasoning processes.<n>Our study presents a self-sampling framework based on activation steering for efficient CoT learning.
arXiv Detail & Related papers (2026-02-02T11:37:36Z) - TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs [50.820065021136024]
DeepSeek R1 has significantly advanced complex reasoning for large language models (LLMs)<n>Recent methods have attempted to replicate R1's reasoning capabilities in multimodal settings.<n>We propose TACO, a novel reinforcement learning algorithm for visual reasoning.
arXiv Detail & Related papers (2025-05-27T06:30:48Z) - Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning [54.65050470296886]
We propose the CoT Thought Leap Bridge Task, which aims to automatically detect leaps and generate missing intermediate reasoning steps.<n>We demonstrate that models fine-tuned on bridged datasets consistently outperform those trained on original datasets.<n>Our approach effectively enhances distilled data and provides better starting points for reinforcement learning.
arXiv Detail & Related papers (2025-05-20T17:59:31Z) - On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows [71.92083784393418]
Agentic AI (systems that autonomously plan and act) are becoming widespread, yet their task success rate on complex tasks remains low.<n>Inference-time alignment relies on three components: sampling, evaluation, and feedback.<n>We introduce Iterative Agent Decoding (IAD), a procedure that repeatedly inserts feedback extracted from different forms of critiques.
arXiv Detail & Related papers (2025-04-02T17:40:47Z) - TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers' Guidance [42.8895384120507]
We propose TwT, a method that reduces inference-time costs through habitual reasoning distillation with multi-teachers' guidance.<n>Our approach internalizes explicit reasoning into the model's habitual behavior through a Teacher-Guided compression strategy.<n> Experimental results demonstrate that TwT effectively reduces inference costs while preserving superior performance.
arXiv Detail & Related papers (2025-03-31T15:16:31Z) - Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models [81.27391252152199]
Large language models (LLMs) have achieved impressive performance across various natural language benchmarks.
We propose to automate dataset updating and provide systematic analysis regarding its effectiveness.
There are two updating strategies: 1) mimicking strategy to generate similar samples based on original data, and 2) extending strategy that further expands existing samples.
arXiv Detail & Related papers (2024-02-19T07:15:59Z) - CoT-Driven Framework for Short Text Classification: Enhancing and Transferring Capabilities from Large to Smaller Model [5.331916925505735]
Short Text Classification (STC) is crucial for processing and understanding the brief but substantial content prevalent on contemporary digital platforms.<n>We propose the Syntactic and Semantic Enrichment CoT (SSE-CoT) method, effectively decomposing the STC tasks into four distinct steps.<n>We then introduce the CoT-Driven Multi-Task Learning (CDMT) framework to extend these capabilities to smaller models.
arXiv Detail & Related papers (2024-01-06T08:28:20Z) - Dynamic Contrastive Distillation for Image-Text Retrieval [90.05345397400144]
We present a novel plug-in dynamic contrastive distillation (DCD) framework to compress image-text retrieval models.
We successfully apply our proposed DCD strategy to two state-of-the-art vision-language pretrained models, i.e. ViLT and METER.
Experiments on MS-COCO and Flickr30K benchmarks show the effectiveness and efficiency of our DCD framework.
arXiv Detail & Related papers (2022-07-04T14:08:59Z) - Robust Dialogue State Tracking with Weak Supervision and Sparse Data [2.580163308334609]
Generalising dialogue state tracking (DST) to new data is challenging due to the strong reliance on abundant and fine-grained supervision during training.
Sample sparsity, distributional shift and the occurrence of new concepts and topics frequently lead to severe performance degradation during inference.
We propose a training strategy to build extractive DST models without the need for fine-grained manual span labels.
arXiv Detail & Related papers (2022-02-07T16:58:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.