Related papers: SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft Pseudo-Labels

SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft Pseudo-Labels

URL: http://arxiv.org/abs/2212.02135v3
Date: Tue, 19 Sep 2023 13:21:49 GMT
Title: SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft Pseudo-Labels
Authors: Martin Ki\v{s}\v{s}, Michal Hradi\v{s}, Karel Bene\v{s}, Petr Buchal, Michal Kula
Abstract summary: This paper explores semi-supervised sequence, such as Optical Character Recognition or Automatic Automatic Recognition. We propose a novel loss filtering function $xx2013$ SoftCTC.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper explores semi-supervised training for sequence tasks, such as Optical Character Recognition or Automatic Speech Recognition. We propose a novel loss function $\unicode{x2013}$ SoftCTC $\unicode{x2013}$ which is an extension of CTC allowing to consider multiple transcription variants at the same time. This allows to omit the confidence based filtering step which is otherwise a crucial component of pseudo-labeling approaches to semi-supervised learning. We demonstrate the effectiveness of our method on a challenging handwriting recognition task and conclude that SoftCTC matches the performance of a finely-tuned filtering based pipeline. We also evaluated SoftCTC in terms of computational efficiency, concluding that it is significantly more efficient than a na\"ive CTC-based approach for training on multiple transcription variants, and we make our GPU implementation public.

Related papers

Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter. The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates. The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z)
Unimodal Aggregation for CTC-based Speech Recognition [7.6112706449833505]
A unimodal aggregation (UMA) is proposed to segment and integrate the feature frames that belong to the same text token. UMA learns better feature representations and shortens the sequence length, resulting in lower recognition error and computational complexity.
arXiv Detail & Related papers (2023-09-15T04:34:40Z)
Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach [14.69981874614434]
We show how to better optimize a text recognition model from the perspective of loss functions. CTC-based methods, widely used in practice due to their good balance between performance and inference speed, still grapple with degradation accuracy. We propose a self-distillation scheme for CTC-based model to address this issue.
arXiv Detail & Related papers (2023-08-17T06:32:57Z)
Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text. As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z)
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model [40.16332045057132]
BERT-CTC is a novel formulation of end-to-end speech recognition. It incorporates linguistic knowledge through the explicit output dependency obtained by BERT contextual embedding. BERT-CTC improves over conventional approaches across variations in speaking styles and languages.
arXiv Detail & Related papers (2022-10-29T18:19:44Z)
CTC Alignments Improve Autoregressive Translation [145.90587287444976]
We argue that CTC does in fact make sense for translation if applied in a joint CTC/attention framework. Our proposed joint CTC/attention models outperform pure-attention baselines across six benchmark translation tasks.
arXiv Detail & Related papers (2022-10-11T07:13:50Z)
Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents. Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages. We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z)
Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter [38.4211220941874]
We propose a simple, elegant and effective paradigm called Implicit Feature Alignment (IFA) IFA can be easily integrated into current text recognizers, resulting in a novel inference mechanism called IFAinference. We experimentally demonstrate that IFA achieves state-of-the-art performance on end-to-end document recognition tasks.
arXiv Detail & Related papers (2021-06-10T17:06:28Z)
Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation [62.943925893616196]
We study the possibilities of building a non-autoregressive speech-to-text translation model using connectionist temporal classification (CTC) CTC's success on translation is counter-intuitive due to its monotonicity assumption, so we analyze its reordering capability. Our analysis shows that transformer encoders have the ability to change the word order.
arXiv Detail & Related papers (2021-05-11T07:48:45Z)
Intermediate Loss Regularization for CTC-based Speech Recognition [58.33721897180646]
We present a simple and efficient auxiliary loss function for automatic speech recognition (ASR) based on the connectionist temporal classification ( CTC) objective. We evaluate the proposed method on various corpora, reaching word error rate (WER) 9.9% on the WSJ corpus and character error rate (CER) 5.2% on the AISHELL-1 corpus respectively.
arXiv Detail & Related papers (2021-02-05T15:01:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.