SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft
Pseudo-Labels
- URL: http://arxiv.org/abs/2212.02135v3
- Date: Tue, 19 Sep 2023 13:21:49 GMT
- Title: SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft
Pseudo-Labels
- Authors: Martin Ki\v{s}\v{s}, Michal Hradi\v{s}, Karel Bene\v{s}, Petr Buchal,
Michal Kula
- Abstract summary: This paper explores semi-supervised sequence, such as Optical Character Recognition or Automatic Automatic Recognition.
We propose a novel loss filtering function $xx2013$ SoftCTC.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper explores semi-supervised training for sequence tasks, such as
Optical Character Recognition or Automatic Speech Recognition. We propose a
novel loss function $\unicode{x2013}$ SoftCTC $\unicode{x2013}$ which is an
extension of CTC allowing to consider multiple transcription variants at the
same time. This allows to omit the confidence based filtering step which is
otherwise a crucial component of pseudo-labeling approaches to semi-supervised
learning. We demonstrate the effectiveness of our method on a challenging
handwriting recognition task and conclude that SoftCTC matches the performance
of a finely-tuned filtering based pipeline. We also evaluated SoftCTC in terms
of computational efficiency, concluding that it is significantly more efficient
than a na\"ive CTC-based approach for training on multiple transcription
variants, and we make our GPU implementation public.
Related papers
- Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter.
The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates.
The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z) - Unimodal Aggregation for CTC-based Speech Recognition [7.6112706449833505]
A unimodal aggregation (UMA) is proposed to segment and integrate the feature frames that belong to the same text token.
UMA learns better feature representations and shortens the sequence length, resulting in lower recognition error and computational complexity.
arXiv Detail & Related papers (2023-09-15T04:34:40Z) - Self-distillation Regularized Connectionist Temporal Classification Loss
for Text Recognition: A Simple Yet Effective Approach [14.69981874614434]
We show how to better optimize a text recognition model from the perspective of loss functions.
CTC-based methods, widely used in practice due to their good balance between performance and inference speed, still grapple with degradation accuracy.
We propose a self-distillation scheme for CTC-based model to address this issue.
arXiv Detail & Related papers (2023-08-17T06:32:57Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - BERT Meets CTC: New Formulation of End-to-End Speech Recognition with
Pre-trained Masked Language Model [40.16332045057132]
BERT-CTC is a novel formulation of end-to-end speech recognition.
It incorporates linguistic knowledge through the explicit output dependency obtained by BERT contextual embedding.
BERT-CTC improves over conventional approaches across variations in speaking styles and languages.
arXiv Detail & Related papers (2022-10-29T18:19:44Z) - CTC Alignments Improve Autoregressive Translation [145.90587287444976]
We argue that CTC does in fact make sense for translation if applied in a joint CTC/attention framework.
Our proposed joint CTC/attention models outperform pure-attention baselines across six benchmark translation tasks.
arXiv Detail & Related papers (2022-10-11T07:13:50Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Implicit Feature Alignment: Learn to Convert Text Recognizer to Text
Spotter [38.4211220941874]
We propose a simple, elegant and effective paradigm called Implicit Feature Alignment (IFA)
IFA can be easily integrated into current text recognizers, resulting in a novel inference mechanism called IFAinference.
We experimentally demonstrate that IFA achieves state-of-the-art performance on end-to-end document recognition tasks.
arXiv Detail & Related papers (2021-06-10T17:06:28Z) - Investigating the Reordering Capability in CTC-based Non-Autoregressive
End-to-End Speech Translation [62.943925893616196]
We study the possibilities of building a non-autoregressive speech-to-text translation model using connectionist temporal classification (CTC)
CTC's success on translation is counter-intuitive due to its monotonicity assumption, so we analyze its reordering capability.
Our analysis shows that transformer encoders have the ability to change the word order.
arXiv Detail & Related papers (2021-05-11T07:48:45Z) - Intermediate Loss Regularization for CTC-based Speech Recognition [58.33721897180646]
We present a simple and efficient auxiliary loss function for automatic speech recognition (ASR) based on the connectionist temporal classification ( CTC) objective.
We evaluate the proposed method on various corpora, reaching word error rate (WER) 9.9% on the WSJ corpus and character error rate (CER) 5.2% on the AISHELL-1 corpus respectively.
arXiv Detail & Related papers (2021-02-05T15:01:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.