Related papers: CORE: Context-Robust Remasking for Diffusion Language Models

CORE: Context-Robust Remasking for Diffusion Language Models

URL: http://arxiv.org/abs/2602.04096v2
Date: Thu, 05 Feb 2026 20:29:29 GMT
Title: CORE: Context-Robust Remasking for Diffusion Language Models
Authors: Kevin Zhai, Sabbir Mollah, Zhenyi Wang, Mubarak Shah,
Abstract summary: We propose Context-Robust Remasking (CORE), a training-free framework for inference-time revision.<n>Rather than trusting static token probabilities, CORE identifies context-brittle tokens by probing their sensitivity to targeted masked-context perturbations.<n>On LLaDA-8B-Base, CORE delivers consistent improvements across reasoning and code benchmarks, outperforming compute-matched baselines and improving MBPP by up to 9.2 percentage points.
Score: 51.59514489363897
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Standard decoding in Masked Diffusion Models (MDMs) is hindered by context rigidity: tokens are retained based on transient high confidence, often ignoring that early predictions lack full context. This creates cascade effects where initial inconsistencies misguide the remaining generation. Existing revision strategies attempt to mitigate this by relying on static confidence scores, but these signals are inherently myopic; inconsistent tokens can appear confident to the model itself. We propose Context-Robust Remasking (CORE), a training-free framework for inference-time revision. Rather than trusting static token probabilities, CORE identifies context-brittle tokens by probing their sensitivity to targeted masked-context perturbations. We formalize revision as a robust optimization objective over context shifts and efficiently approximate this objective to prioritize unstable tokens for revision. On LLaDA-8B-Base, CORE delivers consistent improvements across reasoning and code benchmarks, outperforming compute-matched baselines and improving MBPP by up to 9.2 percentage points.

Related papers

Thinking by Subtraction: Confidence-Driven Contrastive Decoding for LLM Reasoning [58.331709210563616]
Thinking by Subtraction is a confidence-driven contrastive decoding approach.<n>A small subset of low-confidence tokens disproportionately contributes to reasoning errors and unnecessary output expansion.<n>Our method, Confidence-Driven Contrastive Decoding, detects low-confidence tokens during decoding and intervenes at these positions.
arXiv Detail & Related papers (2026-02-20T14:13:22Z)
RoParQ: Paraphrase-Aware Alignment of Large Language Models Towards Robustness to Paraphrased Questions [0.0]
Large Language Models (LLMs) often exhibit inconsistent behavior when answering paraphrased questions.<n>We introduce RoParQ, a benchmark to evaluate cross-paraphrase consistency in closed-book multiple-choice QA.<n>We also propose XParaCon, a novel evaluation metric that quantifies a model's robustness.
arXiv Detail & Related papers (2025-11-26T16:40:53Z)
CodeFuse-CommitEval: Towards Benchmarking LLM's Power on Commit Message and Code Change Inconsistency Detection [8.631593963090985]
Version control relies on commit messages to convey the rationale for code changes, but these messages are often low quality and inconsistent with their diffs-known as message-code inconsistency (MCI)<n>We introduce CODEFUSE-COMMITEVAL, the first benchmark designed for MCI detection using large language models (LLMs)<n>We generate seven types of inconsistent messages through rule-guided mutations of originally consistent commits and apply two-fold validation to verify both positive and negative samples.
arXiv Detail & Related papers (2025-11-25T03:33:57Z)
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States [28.663951525871756]
We introduce Latent Refinement Decoding (LRD), a two-stage framework with Latent Refinement and a Predictive Feedback Loop.<n>LRD improves accuracy while delivering speedups of up to 10.6x, making it a strong and versatile alternative for parallel sequence generation.
arXiv Detail & Related papers (2025-10-13T06:38:13Z)
MARS-Sep: Multimodal-Aligned Reinforced Sound Separation [72.85468563236005]
MARS-Sep is a reinforcement learning framework for sound separation.<n>It learns a factorized Beta mask policy that is optimized by a clipped trust-region surrogate.<n>Experiments on multiple benchmarks demonstrate consistent gains in Text-, Audio-, and Image-Queried separation.
arXiv Detail & Related papers (2025-10-12T09:05:28Z)
Do LLMs Know They Are Being Tested? Evaluation Awareness and Incentive-Sensitive Failures in GPT-OSS-20B [1.948261185683419]
We investigate whether "evaluation scent" inflates measured performance without commensurate capability gains.<n>We run six paired A/B scenarios that hold task content and decoding fixed while varying framing.<n>We provide a reproducible A/B framework (prompt banks, validators, per-run scores, scripts) and practical guidance.
arXiv Detail & Related papers (2025-10-08T09:49:05Z)
Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies [47.6755955972232]
We cast denoising as a KL-regularized Markov decision process (MDP) with an explicit reference policy and optimize a regularized objective.<n>We prove that the optimized policy under this framework generates samples that more closely match the data distribution than schedules.
arXiv Detail & Related papers (2025-10-07T09:44:24Z)
Inducing Faithfulness in Structured Reasoning via Counterfactual Sensitivity [6.908972852063454]
Large language models often generate a correct answer while relying on a flawed or irrelevant reasoning trace.<n>This paper introduces textbfCounterfactual Sensitivity Regularization (CSR), a novel training objective.<n>CSR improves faithfulness over standard fine-tuning and process supervision by up to 70 percentage points.
arXiv Detail & Related papers (2025-09-01T15:18:46Z)
Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations [83.93566096400723]
We find that instruction-tuned models retain up to 93.4% of their original performance when given a randomly sampled tokenization.<n>Character-level segmentation improves string manipulation and code understanding tasks by up to +14%.<n>Right-aligned digit grouping enhances large-number arithmetic by +33%.
arXiv Detail & Related papers (2025-06-23T18:02:26Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting [133.55037976429088]
We investigate the adversarial robustness of vision transformers equipped with BERT pretraining (e.g., BEiT, MAE) A surprising observation is that MAE has significantly worse adversarial robustness than other BERT pretraining methods. We propose a simple yet effective way to boost the adversarial robustness of MAE.
arXiv Detail & Related papers (2023-08-20T16:27:17Z)
Latent Feature Relation Consistency for Adversarial Robustness [80.24334635105829]
misclassification will occur when deep neural networks predict adversarial examples which add human-imperceptible adversarial noise to natural examples. We propose textbfLatent textbfFeature textbfRelation textbfConsistency (textbfLFRC) LFRC constrains the relation of adversarial examples in latent space to be consistent with the natural examples.
arXiv Detail & Related papers (2023-03-29T13:50:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.