DenoiseFlow: Uncertainty-Aware Denoising for Reliable LLM Agentic Workflows
- URL: http://arxiv.org/abs/2603.00532v1
- Date: Sat, 28 Feb 2026 08:11:38 GMT
- Title: DenoiseFlow: Uncertainty-Aware Denoising for Reliable LLM Agentic Workflows
- Authors: Yandong Yan, Junwei Peng, Shijie Li, Chenxi Li, Yifei Shang, Can Deng, Ruiting Dai, Yongqiang Zhao, Jiaqi Zhu, Yu Huang,
- Abstract summary: We formalize the multi-step reasoning process as a Noisy MDP.<n>We propose DenoiseFlow, a closed-loop framework that performs progressive denoising through three coordinated stages.
- Score: 20.319113495948294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous agents are increasingly entrusted with complex, long-horizon tasks, ranging from mathematical reasoning to software generation. While agentic workflows facilitate these tasks by decomposing them into multi-step reasoning chains, reliability degrades significantly as the sequence lengthens. Specifically, minor interpretation errors in natural-language instructions tend to compound silently across steps. We term this failure mode accumulated semantic ambiguity. Existing approaches to mitigate this often lack runtime adaptivity, relying instead on static exploration budgets, reactive error recovery, or single-path execution that ignores uncertainty entirely. We formalize the multi-step reasoning process as a Noisy MDP and propose DenoiseFlow, a closed-loop framework that performs progressive denoising through three coordinated stages: (1)Sensing estimates per-step semantic uncertainty; (2)Regulating adaptively allocates computation by routing between fast single-path execution and parallel exploration based on estimated risk; and (3)Correcting performs targeted recovery via influence-based root-cause localization. Online self-calibration continuously aligns decision boundaries with verifier feedback, requiring no ground-truth labels. Experiments on six benchmarks spanning mathematical reasoning, code generation, and multi-hop QA show that DenoiseFlow achieves the highest accuracy on every benchmark (83.3% average, +1.3% over the strongest baseline) while reducing cost by 40--56% through adaptive branching. Detailed ablation studies further confirm framework-level's robustness and generality. Code is available at https://anonymous.4open.science/r/DenoiseFlow-21D3/.
Related papers
- Beyond Scattered Acceptance: Fast and Coherent Inference for DLMs via Longest Stable Prefixes [10.877713536966601]
Longestahead Prefix (LSP) scheduler is a training-free and model-agnostic inference paradigm based on monolithic prefix absorption.<n>LSP evaluates token stability via a single forward pass, dynamically identifies a contiguous left-aligned block of stable predictions.<n>It snaps its boundary to natural linguistic or structural acceptances before an atomic commitment.
arXiv Detail & Related papers (2026-03-05T18:25:26Z) - Test-Time Scaling with Diffusion Language Models via Reward-Guided Stitching [66.39914384073145]
We propose a self-consistency framework that turns cheap diffusion-sampled reasoning into a reusable pool of step-level candidates.<n>We find that step-level recombination is most beneficial on harder problems.<n>Our training-free framework improves average accuracy by up to 2 across six math and coding tasks.
arXiv Detail & Related papers (2026-02-26T11:08:39Z) - Internalizing LLM Reasoning via Discovery and Replay of Latent Actions [4.830503861275364]
Internalization of chain-of-thought processes into hidden states has emerged as a highly efficient paradigm for scaling test-time compute.<n>We propose STIR (Self-Distilled Tools for Internal Reasoning), a framework that reformulates reasoning enhancement as a dynamic latent trajectory control problem.
arXiv Detail & Related papers (2026-02-04T08:44:57Z) - CORE: Context-Robust Remasking for Diffusion Language Models [51.59514489363897]
We propose Context-Robust Remasking (CORE), a training-free framework for inference-time revision.<n>Rather than trusting static token probabilities, CORE identifies context-brittle tokens by probing their sensitivity to targeted masked-context perturbations.<n>On LLaDA-8B-Base, CORE delivers consistent improvements across reasoning and code benchmarks, outperforming compute-matched baselines and improving MBPP by up to 9.2 percentage points.
arXiv Detail & Related papers (2026-02-04T00:12:30Z) - Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models [96.0074341403456]
Inference-time compute has re-emerged as a practical way to improve LLM reasoning.<n>Most test-time scaling (TTS) algorithms rely on autoregressive decoding.<n>We propose Prism, an efficient TTS framework for dLLMs.
arXiv Detail & Related papers (2026-02-02T09:14:51Z) - EDIT: Early Diffusion Inference Termination for dLLMs Based on Dynamics of Training Gradients [6.736735746633275]
Diffusion-based large language models (dLLMs) refine token generations through iterative denoising, but answers often stabilize before all steps complete.<n>We propose EDIT, an inference-time criterion that adaptively stops denoising once sufficient reasoning stability relative to training-time reasoning is detected.
arXiv Detail & Related papers (2025-11-29T23:47:47Z) - Intra-request branch orchestration for efficient LLM reasoning [52.68946975865865]
Large Language Models (LLMs) increasingly rely on inference-time reasoning algorithms to improve accuracy on complex tasks.<n>Prior work has largely focused on reducing token usage, often at the expense of accuracy, while overlooking other latency factors.<n>We present DUCHESS, an LLM serving system that reduces cost and latency without sacrificing accuracy through intra-request branch orchestration guided by predictions.
arXiv Detail & Related papers (2025-09-29T15:52:08Z) - SparseD: Sparse Attention for Diffusion Language Models [98.05780626106555]
diffusion language models (DLMs) offer a promising alternative to autoregressive models (ARs)<n>Existing open-source DLMs suffer from high inference latency.<n>We propose SparseD, a novel sparse attention method for DLMs.
arXiv Detail & Related papers (2025-09-28T18:10:10Z) - Abduct, Act, Predict: Scaffolding Causal Inference for Automated Failure Attribution in Multi-Agent Systems [20.846301581161978]
Failure attribution in multi-agent systems is a critical yet unsolved challenge.<n>Current methods treat this as a pattern recognition task over long conversation logs.<n>A2P Scaffolding transforms failure attribution from pattern recognition into a structured causal inference task.
arXiv Detail & Related papers (2025-09-12T16:51:15Z) - From Entity Reliability to Clean Feedback: An Entity-Aware Denoising Framework Beyond Interaction-Level Signals [20.323837731778358]
Implicit feedback is central to recommender systems but is inherently noisy, often impairing model training and degrading user experience.<n>We propose textbfEARD (textbfEntity-textbfAware textbfReliability-textbfDriven Denoising), a lightweight framework that shifts the focus from interaction-level signals to entity-level reliability.
arXiv Detail & Related papers (2025-08-14T17:20:12Z) - Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep.
We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.