Related papers: CyclicReflex: Improving Large Reasoning Models via Cyclical Reflection Token Scheduling

CyclicReflex: Improving Large Reasoning Models via Cyclical Reflection Token Scheduling

URL: http://arxiv.org/abs/2506.11077v1
Date: Wed, 04 Jun 2025 03:43:38 GMT
Title: CyclicReflex: Improving Large Reasoning Models via Cyclical Reflection Token Scheduling
Authors: Chongyu Fan, Yihua Zhang, Jinghan Jia, Alfred Hero, Sijia Liu,
Abstract summary: Large reasoning models (LRMs) harness test-time scaling to perform multi-step reasoning for complex problem-solving.<n>We treat reflection tokens as a "resource" and introduce the problem of resource allocation.<n>We propose cyclical reflection token scheduling (termed CyclicReflex), a decoding strategy that dynamically modulates reflection token logits.
Score: 16.151066326284376
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large reasoning models (LRMs), such as OpenAI's o1 and DeepSeek-R1, harness test-time scaling to perform multi-step reasoning for complex problem-solving. This reasoning process, executed before producing final answers, is often guided by special juncture tokens or textual segments that prompt self-evaluative reflection. We refer to these transition markers and reflective cues as "reflection tokens" (e.g., "wait", "but", "alternatively"). In this work, we treat reflection tokens as a "resource" and introduce the problem of resource allocation, aimed at improving the test-time compute performance of LRMs by adaptively regulating the frequency and placement of reflection tokens. Through empirical analysis, we show that both excessive and insufficient use of reflection tokens, referred to as over-reflection and under-reflection, can degrade model performance. To better understand and manage this trade-off, we draw an analogy between reflection token usage and learning rate scheduling in optimization. Building on this insight, we propose cyclical reflection token scheduling (termed CyclicReflex), a decoding strategy that dynamically modulates reflection token logits using a position-dependent triangular waveform. Experiments on MATH500, AIME2024/2025, and AMC2023 demonstrate that CyclicReflex consistently improves performance across model sizes (1.5B-8B), outperforming standard decoding and more recent approaches such as TIP (thought switching penalty) and S1. Codes are available at https://github.com/OPTML-Group/CyclicReflex.

Related papers

Efficient Reasoning for Large Reasoning Language Models via Certainty-Guided Reflection Suppression [30.653381666162275]
Certainty-Guided Reflection Suppression (CGRS) is a novel method that mitigates overthinking in Large Reasoning Language Models (LRLMs)<n>CGRS operates by dynamically suppressing the model's generation of reflection triggers when it exhibits high confidence in its current response.<n>Our approach is model-agnostic, requires no retraining or architectural modifications, and can be integrated seamlessly with existing autoregressive generation pipelines.
arXiv Detail & Related papers (2025-08-07T12:38:22Z)
Signal-First Architectures: Rethinking Front-End Reactivity [0.0]
This paper introduces Signal-First Architecture, a novel paradigm where dependency-tracked signals are the atomic unit of reactivity.<n>Unlike traditional RxJS or NgRx patterns, Signal-First enforces reactive flows from explicit signal declarations.<n>We present a comparative analysis of three Angular reactivity models: RxJS service-based, NgRx global stores, and pure Signal-First implementations.
arXiv Detail & Related papers (2025-06-14T20:34:48Z)
REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Large Reasoning Models [33.05490585699939]
Large Reasoning Models (LRMs) demonstrate strong performance in complex tasks but often face the challenge of overthinking.<n>Existing approaches synthesize shorter reasoning responses for LRMs to learn, but are inefficient for online usage due to the time-consuming data generation and filtering processes.<n>We propose REA-RL, which introduces a small reflection model for efficient scaling in online training, offering both parallel sampling and sequential revision.
arXiv Detail & Related papers (2025-05-26T11:47:16Z)
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning [64.7863715647187]
ReflectionFlow is an inference-time framework enabling text-to-image diffusion models to iteratively reflect upon and refine their outputs.<n>To facilitate reflection-level scaling, we construct GenRef, a large-scale dataset comprising 1 million triplets, each containing a reflection, a flawed image, and an enhanced image.
arXiv Detail & Related papers (2025-04-22T17:58:07Z)
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z)
Enhancing Sequential Recommendations through Multi-Perspective Reflections and Iteration [16.10791252542592]
Sequence recommendation (SeqRec) aims to predict the next item a user will interact with by understanding user intentions and leveraging collaborative filtering information. Large language models (LLMs) have shown great promise in recommendation tasks through prompt-based, fixed reflection libraries, and fine-tuning techniques. MoRE introduces three reflectors for generating LLM-based reflections on explicit preferences, implicit preferences, and collaborative signals.
arXiv Detail & Related papers (2024-09-10T09:58:55Z)
FIRM: Flexible Interactive Reflection reMoval [75.38207315080624]
This paper presents FIRM, a novel framework for Flexible Interactive image Reflection reMoval.<n>The proposed framework requires only 10% of the guidance time needed by previous interactive methods.<n>Results on public real-world reflection removal datasets validate that our method demonstrates state-of-the-art reflection removal performance.
arXiv Detail & Related papers (2024-06-03T17:34:37Z)
Continual Referring Expression Comprehension via Dual Modular Memorization [133.46886428655426]
Referring Expression (REC) aims to localize an image region of a given object described by a natural-language expression. Existing REC algorithms make a strong assumption that training data feeding into a model are given upfront, which degrades its practicality for real-world scenarios. In this paper, we propose Continual Referring Expression (CREC), a new setting for REC, where a model is learning on a stream of incoming tasks. In order to continuously improve the model on sequential tasks without forgetting prior learned knowledge and without repeatedly re-training from a scratch, we propose an effective baseline method named Dual Modular Memorization
arXiv Detail & Related papers (2023-11-25T02:58:51Z)
Re-Reading Improves Reasoning in Large Language Models [87.46256176508376]
We introduce a simple, yet general and effective prompting method, Re2, to enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs) Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process. We evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality.
arXiv Detail & Related papers (2023-09-12T14:36:23Z)
SRFormer: Text Detection Transformer with Incorporated Segmentation and Regression [6.74412860849373]
We propose SRFormer, a unified DETR-based model with amalgamated and Regression. Our empirical analysis indicates that favorable segmentation predictions can be obtained at the initial decoder layers. Our method's exceptional robustness, superior training and data efficiency, as well as its state-of-the-art performance.
arXiv Detail & Related papers (2023-08-21T07:34:31Z)
Reflection Invariance Learning for Few-shot Semantic Segmentation [53.20466630330429]
Few-shot semantic segmentation (FSS) aims to segment objects of unseen classes in query images with only a few annotated support images. This paper proposes a fresh few-shot segmentation framework to mine the reflection invariance in a multi-view matching manner. Experiments on both PASCAL-$5textiti$ and COCO-$20textiti$ datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-01T15:14:58Z)
Two-Stage Single Image Reflection Removal with Reflection-Aware Guidance [78.34235841168031]
We present a novel two-stage network with reflection-aware guidance (RAGNet) for single image reflection removal (SIRR) RAG can be used (i) to mitigate the effect of reflection from the observation, and (ii) to generate mask in partial convolution for mitigating the effect of deviating from linear combination hypothesis. Experiments on five commonly used datasets demonstrate the quantitative and qualitative superiority of our RAGNet in comparison to the state-of-the-art SIRR methods.
arXiv Detail & Related papers (2020-12-02T03:14:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.