A Study on Effects of Implicit and Explicit Language Model Information
for DBLSTM-CTC Based Handwriting Recognition
- URL: http://arxiv.org/abs/2008.01532v1
- Date: Fri, 31 Jul 2020 08:23:37 GMT
- Title: A Study on Effects of Implicit and Explicit Language Model Information
for DBLSTM-CTC Based Handwriting Recognition
- Authors: Qi Liu, Lijuan Wang, Qiang Huo
- Abstract summary: We study the effects of implicit and explicit language model information for DBLSTM-CTC based handwriting recognition.
Even using one million lines of training sentences to train the DBLSTM, using an explicit language model is still helpful.
- Score: 51.36957172200015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Bidirectional Long Short-Term Memory (D-BLSTM) with a Connectionist
Temporal Classification (CTC) output layer has been established as one of the
state-of-the-art solutions for handwriting recognition. It is well known that
the DBLSTM trained by using a CTC objective function will learn both local
character image dependency for character modeling and long-range contextual
dependency for implicit language modeling. In this paper, we study the effects
of implicit and explicit language model information for DBLSTM-CTC based
handwriting recognition by comparing the performance of using or without using
an explicit language model in decoding. It is observed that even using one
million lines of training sentences to train the DBLSTM, using an explicit
language model is still helpful. To deal with such a large-scale training
problem, a GPU-based training tool has been developed for CTC training of
DBLSTM by using a mini-batch based epochwise Back Propagation Through Time
(BPTT) algorithm.
Related papers
- CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning [4.004641316826348]
We introduce a novel language-image Contrastive Learning method with an Efficient large language model and prompt Fine-Tuning (CLEFT)
Our method demonstrates state-of-the-art performance on multiple chest X-ray and mammography datasets.
The proposed parameter efficient framework can reduce the total trainable model size by 39% and reduce the trainable language model to only 4% compared with the current BERT encoder.
arXiv Detail & Related papers (2024-07-30T17:57:32Z) - FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
Click-through rate (CTR) prediction plays as a core function module in personalized online services.
Traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality.
Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality.
We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction.
arXiv Detail & Related papers (2023-10-30T11:25:03Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - Improving Massively Multilingual ASR With Auxiliary CTC Objectives [40.10307386370194]
We introduce our work on improving performance on FLEURS, a 102-language open ASR benchmark.
We investigate techniques inspired from recent Connectionist Temporal Classification ( CTC) studies to help the model handle the large number of languages.
Our state-of-the-art systems using self-supervised models with the Conformer architecture improve over the results of prior work on FLEURS by a relative 28.4% CER.
arXiv Detail & Related papers (2023-02-24T18:59:51Z) - Prompt-based Learning for Unpaired Image Captioning [86.44188293709307]
Unpaired Image Captioning (UIC) has been developed to learn image descriptions from unaligned vision-language sample pairs.
Recent successes of Vision-Language Pre-Trained Models (VL-PTMs) have triggered the development of prompt-based learning.
We present in this paper a novel scheme based on prompt to train the UIC model, making best use of the powerful generalization ability.
arXiv Detail & Related papers (2022-05-26T03:13:43Z) - Improving CTC-based speech recognition via knowledge transferring from
pre-trained language models [30.599901925058873]
We propose two knowledge transferring methods to improve CTC-based models.
The first method is based on representation learning, in which the CTC-based models use the representation produced by BERT as an auxiliary learning target.
The second method is based on joint classification learning, which combines GPT2 for text modeling with a hybrid CTC/attention architecture.
arXiv Detail & Related papers (2022-02-22T11:30:55Z) - MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation [9.91548921801095]
We present, MATE-KD, a novel text-based adversarial training algorithm which improves the performance of knowledge distillation.
We evaluate our algorithm, using BERT-based models, on the GLUE benchmark and demonstrate that MATE-KD outperforms competitive adversarial learning and data augmentation baselines.
arXiv Detail & Related papers (2021-05-12T19:11:34Z) - Learning Contextual Representations for Semantic Parsing with
Generation-Augmented Pre-Training [86.91380874390778]
We present Generation-Augmented Pre-training (GAP), that jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data.
Based on experimental results, neural semantics that leverage GAP MODEL obtain new state-of-the-art results on both SPIDER and CRITERIA-TO-generative benchmarks.
arXiv Detail & Related papers (2020-12-18T15:53:50Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.