MGSC: A Multi-granularity Consistency Framework for Robust End-to-end Asr
- URL: http://arxiv.org/abs/2508.15853v1
- Date: Wed, 20 Aug 2025 09:51:49 GMT
- Title: MGSC: A Multi-granularity Consistency Framework for Robust End-to-end Asr
- Authors: Xuwen Yang,
- Abstract summary: We introduce the Multi-Granularity Soft Consistency framework, a model-agnostic, plug-and-play module that enforces internal self-consistency.<n>Cru-cially, our work is the first to uncover a powerful synergy between these two consistency granularities.<n>Our work demonstrates that enforcing in-ternal consistency is a crucial step towards building more robust and trust-worthy AI.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-to-end ASR models, despite their success on benchmarks, often pro-duce catastrophic semantic errors in noisy environments. We attribute this fragility to the prevailing 'direct mapping' objective, which solely penalizes final output errors while leaving the model's internal computational pro-cess unconstrained. To address this, we introduce the Multi-Granularity Soft Consistency (MGSC) framework, a model-agnostic, plug-and-play module that enforces internal self-consistency by simultaneously regulariz-ing macro-level sentence semantics and micro-level token alignment. Cru-cially, our work is the first to uncover a powerful synergy between these two consistency granularities: their joint optimization yields robustness gains that significantly surpass the sum of their individual contributions. On a public dataset, MGSC reduces the average Character Error Rate by a relative 8.7% across diverse noise conditions, primarily by preventing se-vere meaning-altering mistakes. Our work demonstrates that enforcing in-ternal consistency is a crucial step towards building more robust and trust-worthy AI.
Related papers
- Same Answer, Different Representations: Hidden instability in VLMs [65.36933543377346]
We introduce a representation-aware and frequency-aware evaluation framework that measures internal embedding drift, spectral sensitivity, and structural smoothness.<n>We apply this framework to modern Vision Language Models (VLMs) across the SEEDBench, MMMU, and POPE datasets.
arXiv Detail & Related papers (2026-02-06T12:24:26Z) - CORE: Context-Robust Remasking for Diffusion Language Models [51.59514489363897]
We propose Context-Robust Remasking (CORE), a training-free framework for inference-time revision.<n>Rather than trusting static token probabilities, CORE identifies context-brittle tokens by probing their sensitivity to targeted masked-context perturbations.<n>On LLaDA-8B-Base, CORE delivers consistent improvements across reasoning and code benchmarks, outperforming compute-matched baselines and improving MBPP by up to 9.2 percentage points.
arXiv Detail & Related papers (2026-02-04T00:12:30Z) - Enhancing Language Models for Robust Greenwashing Detection [36.1214446480006]
Greenwashing and vague claims undermine sustainability reports.<n>We propose a parameter-efficient framework that structures latent spaces by combining contrastive learning with an ordinal ranking objective.
arXiv Detail & Related papers (2026-01-29T13:46:15Z) - RMBRec: Robust Multi-Behavior Recommendation towards Target Behaviors [26.88506691092044]
We propose Robust Multi-Behavior Recommendation towards Target Behaviors (RMBRec)<n>RMBRec is a robust multi-behavior recommendation framework grounded in an information-theoretic robustness principle.<n>We show that RMBRec outperforms state-of-the-art methods in accuracy and maintains remarkable stability under various noise perturbations.
arXiv Detail & Related papers (2026-01-13T16:34:17Z) - Self-Rewarded Multimodal Coherent Reasoning Across Diverse Visual Domains [16.357026482329232]
Multimodal LLMs produce fluent yet unreliable reasoning.<n>We introduce SR-MCR, a lightweight and label-free framework that aligns reasoning.<n> SR-MCR improves both answer accuracy and reasoning coherence across a broad set of visual benchmarks.
arXiv Detail & Related papers (2025-12-27T10:14:14Z) - Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding [54.05243949024302]
Existing robust MLLMs rely on implicit training/adaptation that focuses solely on visual encoder generalization.<n>We propose Robust-R1, a novel framework that explicitly models visual degradations through structured reasoning chains.<n>Our approach integrates: (i) supervised fine-tuning for degradation-aware reasoning foundations, (ii) reward-driven alignment for accurately perceiving degradation parameters, and (iii) dynamic reasoning depth scaling adapted to degradation intensity.
arXiv Detail & Related papers (2025-12-19T12:56:17Z) - Harnessing Consistency for Robust Test-Time LLM Ensemble [88.55393815158608]
CoRE is a plug-and-play technique that harnesses model consistency for robust LLM ensemble.<n> Token-level consistency captures fine-grained disagreements by applying a low-pass filter to downweight uncertain tokens.<n>Model-level consistency models global agreement by promoting model outputs with high self-confidence.
arXiv Detail & Related papers (2025-10-12T04:18:45Z) - Every Subtlety Counts: Fine-grained Person Independence Micro-Action Recognition via Distributionally Robust Optimization [36.230001277076376]
Micro-action Recognition is vital for psychological assessment and human-computer interaction.<n>Existing methods often fail in real-world scenarios because inter-person variability causes the same action to manifest differently.<n>We propose the Person Independence Universal Micro-action Recognition Framework, which integrates Distributionally Robust Optimization principles to learn person-agnostic representations.
arXiv Detail & Related papers (2025-09-25T14:54:24Z) - Test-Time Consistency in Vision Language Models [26.475993408532304]
Vision-Language Models (VLMs) have achieved impressive performance across a wide range of multimodal tasks.<n>Recent benchmarks, such as MM-R3, highlight that even state-of-the-art VLMs can produce divergent predictions across semantically equivalent inputs.<n>We propose a simple and effective test-time consistency framework that enhances semantic consistency without supervised re-training.
arXiv Detail & Related papers (2025-06-27T17:09:44Z) - NDCG-Consistent Softmax Approximation with Accelerated Convergence [67.10365329542365]
We propose novel loss formulations that align directly with ranking metrics.<n>We integrate the proposed RG losses with the highly efficient Alternating Least Squares (ALS) optimization method.<n> Empirical evaluations on real-world datasets demonstrate that our approach achieves comparable or superior ranking performance.
arXiv Detail & Related papers (2025-06-11T06:59:17Z) - Token Constraint Decoding Improves Robustness on Question Answering for Large Language Models [4.078176555898098]
We introduce and evaluate Token Constraint Decoding (TCD)<n>This simple yet effective inference-time algorithm enforces alignment between token-level predictions to enhance robustness in noisy settings.<n>Our findings establish TCD as a practical, model-agnostic approach for improving reasoning stability under real-world imperfections.
arXiv Detail & Related papers (2025-06-11T05:33:56Z) - Robust and Computation-Aware Gaussian Processes [18.264598332579748]
We introduce Robust Computation-aware Gaussian Process (RCaGP), a novel GP model that combines a principled treatment of approximation-induced uncertainty with robust generalized Bayesian updating.<n>Our model ensures more conservative and reliable uncertainty estimates, a property we rigorously demonstrate.<n> Empirical results confirm that solving these challenges jointly leads to superior performance across both clean and outlier-contaminated settings.
arXiv Detail & Related papers (2025-05-27T12:49:14Z) - EVALOOOP: A Self-Consistency-Centered Framework for Assessing Large Language Model Robustness in Programming [8.52533297070733]
EVALOOOP is an assessment framework that evaluates robustness from a self-consistency perspective.<n>We evaluate 96 popular large language models (LLMs) on the MBPP Plus benchmark.<n> EVALOOOP induces a 2.65%-47.62% absolute drop in pass@1 accuracy within ten loops.
arXiv Detail & Related papers (2025-05-18T01:02:33Z) - Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - Towards Calibrated Robust Fine-Tuning of Vision-Language Models [97.19901765814431]
This work proposes a robust fine-tuning method that improves both OOD accuracy and confidence calibration simultaneously in vision language models.
We show that both OOD classification and OOD calibration errors have a shared upper bound consisting of two terms of ID data.
Based on this insight, we design a novel framework that conducts fine-tuning with a constrained multimodal contrastive loss enforcing a larger smallest singular value.
arXiv Detail & Related papers (2023-11-03T05:41:25Z) - Exposing and Addressing Cross-Task Inconsistency in Unified
Vision-Language Models [80.23791222509644]
Inconsistent AI models are considered brittle and untrustworthy by human users.
We find that state-of-the-art vision-language models suffer from a surprisingly high degree of inconsistent behavior across tasks.
We propose a rank correlation-based auxiliary training objective, computed over large automatically created cross-task contrast sets.
arXiv Detail & Related papers (2023-03-28T16:57:12Z) - Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature.
We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance.
By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.