Related papers: Solving LLM Repetition Problem in Production: A Comprehensive Study of Multiple Solutions

Solving LLM Repetition Problem in Production: A Comprehensive Study of Multiple Solutions

URL: http://arxiv.org/abs/2512.04419v1
Date: Thu, 04 Dec 2025 03:30:18 GMT
Title: Solving LLM Repetition Problem in Production: A Comprehensive Study of Multiple Solutions
Authors: Weiwei Wang, Weijie Zou, Jiyong Min,
Abstract summary: Large Language Models (LLMs) continuously generate repetitive content without proper termination, causing severe performance degradation and system stalling.<n>This paper presents a comprehensive investigation and multiple practical solutions for the repetition problem encountered in real-world batch code interpretation tasks.
Score: 2.085792950847639
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The repetition problem, where Large Language Models (LLMs) continuously generate repetitive content without proper termination, poses a critical challenge in production deployments, causing severe performance degradation and system stalling. This paper presents a comprehensive investigation and multiple practical solutions for the repetition problem encountered in real-world batch code interpretation tasks. We identify three distinct repetition patterns: (1) business rule generation repetition, (2) method call relationship analysis repetition, and (3) PlantUML diagram syntax generation repetition. Through rigorous theoretical analysis based on Markov models, we establish that the root cause lies in greedy decoding's inability to escape repetitive loops, exacerbated by self-reinforcement effects. Our comprehensive experimental evaluation demonstrates three viable solutions: (1) Beam Search decoding with early_stopping=True serves as a universal post-hoc mechanism that effectively resolves all three repetition patterns; (2) presence_penalty hyperparameter provides an effective solution specifically for BadCase 1; and (3) Direct Preference Optimization (DPO) fine-tuning offers a universal model-level solution for all three BadCases. The primary value of this work lies in combining first-hand production experience with extensive experimental validation. Our main contributions include systematic theoretical analysis of repetition mechanisms, comprehensive evaluation of multiple solutions with task-specific applicability mapping, identification of early_stopping as the critical parameter for Beam Search effectiveness, and practical production-ready solutions validated in real deployment environments.

Related papers

ReLoop: Structured Modeling and Behavioral Verification for Reliable LLM-Based Optimization [6.572539312871392]
Large language models (LLMs) can translate natural language into optimization code, but silent failures pose a critical risk.<n>We introduce ReLoop, addressing silent failures from two complementary directions.
arXiv Detail & Related papers (2026-02-17T20:20:33Z)
ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation [54.071574153853994]
ProRAG is a process-supervised reinforcement learning framework designed to integrate learned step-level supervision into the online optimization loop.<n>Our framework consists of four stages: (1) Supervised Policy Warmup to initialize the model with a structured reasoning format; (2) construction of an MCTS-based Process Reward Model (PRM) to quantify intermediate reasoning quality; (3) PRM-Guided Reasoning Refinement to align the policy with fine-grained process preferences; and (4) Process-Supervised Reinforcement Learning with a dual-granularity advantage mechanism.
arXiv Detail & Related papers (2026-01-29T16:04:59Z)
The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving [57.652356955571065]
State-of-the-art large language model (LLM) pipelines rely on bootstrapped reasoning loops.<n>We analyze how this design choice is sensitive to the collapse of the model's distribution over reasoning paths.<n>We introduce Distributional Creative Reasoning (DCR), a unified variational objective that casts training as gradient flow through probability measures on solution traces.
arXiv Detail & Related papers (2026-01-02T17:10:31Z)
SolverLLM: Leveraging Test-Time Scaling for Optimization Problem via LLM-Guided Search [58.116954449750544]
We introduce a training-free framework that leverages test-time scaling to solve diverse optimization problems.<n>Rather than solving directly, it generates mathematical formulations and translates them into solver-ready code, guided by a novel Monte Carlo Tree Search strategy.
arXiv Detail & Related papers (2025-10-19T16:21:19Z)
Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs [102.48588475875749]
We introduce Generative Self-Refinement (GSR), a novel parallel test-time scaling framework.<n>GSR generates a set of candidate responses in parallel and then performs self-refinement to synthesize a new superior solution.<n>We show that our method achieves state-of-the-art performance across five mathematical benchmarks.
arXiv Detail & Related papers (2025-08-27T06:51:48Z)
RL for Reasoning by Adaptively Revealing Rationales [36.50924054394857]
Supervised fine-tuning (SFT) relies on dense ground-truth labels, which become increasingly costly as sequence length grows.<n>We address this by adaptive backtracking (AdaBack), a per-sample curriculum learning algorithm that reveals only a partial prefix of the target output during training.<n>We show that our adaptive curriculum over partial answers reliably solves problems that are otherwise intractable.
arXiv Detail & Related papers (2025-06-22T17:46:14Z)
SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling [58.05959902776133]
We introduce Single-Pass.<n>with Reference-Guided Evaluation (SPARE), a novel structured framework that enables efficient per-step annotation.<n>We demonstrate SPARE's effectiveness across four diverse datasets spanning mathematical reasoning (GSM8K, MATH), multi-hop question answering (MuSiQue-Ans), and spatial reasoning (SpaRP)<n>On ProcessBench, SPARE demonstrates data-efficient out-of-distribution generalization, using only $sim$16% of training samples compared to human-labeled and other synthetically trained baselines.
arXiv Detail & Related papers (2025-06-18T14:37:59Z)
On the Effect of Sampling Diversity in Scaling LLM Inference [57.31028064284527]
Large language model (LLM) scaling inference is key to unlocking greater performance.<n>Motivated by the observed relationship between solution accuracy and meaningful response diversity, we systematically study the effect of prompt diversity in scaling inference.<n>We show that responses generated from meaningful diverse prompts after Best-of-$N$ selection exhibit significantly lower error rates than those produced from stationary prompts.
arXiv Detail & Related papers (2025-02-16T07:37:58Z)
Vision-Language Models Can Self-Improve Reasoning via Reflection [20.196406628954303]
Chain-of-thought (CoT) has proven to improve the reasoning capability of large language models (LLMs) We propose a self-training framework, R3V, which iteratively enhances the model's Vision-language Reasoning by Reflecting on CoT Rationales. Our approach supports self-reflection on generated solutions, further boosting performance through test-time computation.
arXiv Detail & Related papers (2024-10-30T14:45:00Z)
MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning [80.15393178083607]
Large Language Models' (LLM) reasoning can be improved using test-time aggregation strategies, i.e., generating multiple samples and voting among generated samples.<n> Refinement offers an alternative by using LLM-generated feedback to improve solution quality.<n>We propose MAgICoRe, which avoids excessive refinement by categorizing problem difficulty as easy or hard.
arXiv Detail & Related papers (2024-09-18T17:12:41Z)
Continuous Parallel Relaxation for Finding Diverse Solutions in Combinatorial Optimization Problems [0.6906005491572401]
Finding the optimal solution is often the primary goal in optimization (CO)<n>This study introduces Continual Parallel Relaxation Annealing (CPRA), a computationally efficient framework for unsupervised-learning (UL)-based CO solvers.
arXiv Detail & Related papers (2024-02-03T15:31:05Z)
Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models [62.96551299003463]
We propose textbftextitThought Propagation (TP) to enhance the complex reasoning ability of Large Language Models. TP first prompts LLMs to propose and solve a set of analogous problems that are related to the input one. TP reuses the results of analogous problems to directly yield a new solution or derive a knowledge-intensive plan for execution to amend the initial solution obtained from scratch.
arXiv Detail & Related papers (2023-10-06T01:40:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.