Related papers: Label-Context-Dependent Internal Language Model Estimation for CTC

Label-Context-Dependent Internal Language Model Estimation for CTC

URL: http://arxiv.org/abs/2506.06096v1
Date: Fri, 06 Jun 2025 13:54:43 GMT
Title: Label-Context-Dependent Internal Language Model Estimation for CTC
Authors: Zijian Yang, Minh-Nghia Phan, Ralf Schlüter, Hermann Ney,
Abstract summary: We propose novel context-dependent ILM estimation methods for connectionist temporal classification.<n> Experimental results show that context-dependent ILMs outperform the context-independent priors in cross-domain evaluation.<n>The proposed label-level KD with smoothing method surpasses other ILM estimation approaches.
Score: 50.25063912757367
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although connectionist temporal classification (CTC) has the label context independence assumption, it can still implicitly learn a context-dependent internal language model (ILM) due to modern powerful encoders. In this work, we investigate the implicit context dependency modeled in the ILM of CTC. To this end, we propose novel context-dependent ILM estimation methods for CTC based on knowledge distillation (KD) with theoretical justifications. Furthermore, we introduce two regularization methods for KD. We conduct experiments on Librispeech and TED-LIUM Release 2 datasets for in-domain and cross-domain evaluation, respectively. Experimental results show that context-dependent ILMs outperform the context-independent priors in cross-domain evaluation, indicating that CTC learns a context-dependent ILM. The proposed label-level KD with smoothing method surpasses other ILM estimation approaches, with more than 13% relative improvement in word error rate compared to shallow fusion.

Related papers

A theoretical framework for self-supervised contrastive learning for continuous dependent data [86.50780641055258]
Self-supervised learning (SSL) has emerged as a powerful approach to learning representations, particularly in the field of computer vision.<n>We propose a novel theoretical framework for contrastive SSL tailored to emphsemantic independence between samples.<n>Specifically, we outperform TS2Vec on the standard UEA and UCR benchmarks, with accuracy improvements of $4.17$% and $2.08$%, respectively.
arXiv Detail & Related papers (2025-06-11T14:23:47Z)
LLMs Are Not Scorers: Rethinking MT Evaluation with Generation-Based Methods [0.0]
We propose a generation-based evaluation paradigm that leverages decoder-only language models to produce high-quality references.<n> Empirical results show that our method outperforms both intra-LLM direct scoring baselines and external non-LLM reference-free metrics from MTME.
arXiv Detail & Related papers (2025-05-22T02:14:38Z)
Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.<n>The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.<n>The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
Evaluating Consistencies in LLM responses through a Semantic Clustering of Question Answering [1.9214041945441436]
We present a new approach for evaluating semanticencies of Large Language Model (LLM) Our approach evaluates whether LLM re-sponses are semantically congruent for a given question, recognizing that as syntactically different sentences may convey the same meaning. Using the TruthfulQA dataset to assess LLM responses, the study induces N re-sponses per question and clusters semantically equivalent sentences to measure semantic consistency across 37 categories.
arXiv Detail & Related papers (2024-10-20T16:21:25Z)
Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter. The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates. The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z)
KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models [53.84677081899392]
KIEval is a Knowledge-grounded Interactive Evaluation framework for large language models. It incorporates an LLM-powered "interactor" role for the first time to accomplish a dynamic contamination-resilient evaluation. Extensive experiments on seven leading LLMs across five datasets validate KIEval's effectiveness and generalization.
arXiv Detail & Related papers (2024-02-23T01:30:39Z)
Knowledge-Prompted Estimator: A Novel Approach to Explainable Machine Translation Assessment [20.63045120292095]
Cross-lingual Machine Translation (MT) quality estimation plays a crucial role in evaluating translation performance. GEMBA, the first MT quality assessment metric based on Large Language Models (LLMs), employs one-step prompting to achieve state-of-the-art (SOTA) in system-level MT quality estimation. In this paper, we introduce Knowledge-Prompted Estor (KPE), a CoT prompting method that combines three one-step prompting techniques, including perplexity, token-level similarity, and sentence-level similarity.
arXiv Detail & Related papers (2023-06-13T01:18:32Z)
CTC-based Non-autoregressive Speech Translation [51.37920141751813]
We investigate the potential of connectionist temporal classification for non-autoregressive speech translation. We develop a model consisting of two encoders that are guided by CTC to predict the source and target texts. Experiments on the MuST-C benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67$times$.
arXiv Detail & Related papers (2023-05-27T03:54:09Z)
Estimating class separability of text embeddings with persistent homology [1.9956517534421363]
This paper introduces an unsupervised method to estimate the class separability of text datasets from a topological point of view. We show how this technique can be applied to detect when the training process stops improving the separability of the embeddings. Our results, validated across binary and multi-class text classification tasks, show that the proposed method's estimates of class separability align with those obtained from supervised methods.
arXiv Detail & Related papers (2023-05-24T10:58:09Z)
Improving CTC-based ASR Models with Gated Interlayer Collaboration [9.930655347717932]
We present a Gated Interlayer Collaboration mechanism which introduces contextual information into the models. We train the model with intermediate CTC losses calculated by the interlayer outputs of the model, in which the probability distributions of the intermediate layers naturally serve as soft label sequences.
arXiv Detail & Related papers (2022-05-25T03:21:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.