Related papers: Practical Fine-Tuning of Autoregressive Models on Limited Handwritten Texts

Practical Fine-Tuning of Autoregressive Models on Limited Handwritten Texts

URL: http://arxiv.org/abs/2503.19546v1
Date: Tue, 25 Mar 2025 11:01:05 GMT
Title: Practical Fine-Tuning of Autoregressive Models on Limited Handwritten Texts
Authors: Jan Kohút, Michal Hradiš,
Abstract summary: We show that fine-tuning can reliably start with just 16 lines, yielding a 10% relative improvement in CER, and scale up to 40% with 256 lines.<n>We also show that OCR models can be leveraged to cut annotation costs by half through confidence-based selection of informative lines.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A common use case for OCR applications involves users uploading documents and progressively correcting automatic recognition to obtain the final transcript. This correction phase presents an opportunity for progressive adaptation of the OCR model, making it crucial to adapt early, while ensuring stability and reliability. We demonstrate that state-of-the-art transformer-based models can effectively support this adaptation, gradually reducing the annotator's workload. Our results show that fine-tuning can reliably start with just 16 lines, yielding a 10% relative improvement in CER, and scale up to 40% with 256 lines. We further investigate the impact of model components, clarifying the roles of the encoder and decoder in the fine-tuning process. To guide adaptation, we propose reliable stopping criteria, considering both direct approaches and global trend analysis. Additionally, we show that OCR models can be leveraged to cut annotation costs by half through confidence-based selection of informative lines, achieving the same performance with fewer annotations.

Related papers

Automated Proof Generation for Rust Code via Self-Evolution [69.25795662658356]
We introduce SAFE, a framework that overcomes the lack of human-written snippets to enable automated proof generation of Rust code. SAFE re-purposes the large number of synthesized incorrect proofs to train the self-ging capability of the fine-tuned models. We achieve a 52.52% accuracy rate in a benchmark crafted by human experts, a significant leap over GPT-4o's performance of 14.39%.
arXiv Detail & Related papers (2024-10-21T08:15:45Z)
Analysing Zero-Shot Readability-Controlled Sentence Simplification [54.09069745799918]
We investigate how different types of contextual information affect a model's ability to generate sentences with the desired readability.<n>Results show that all tested models struggle to simplify sentences due to models' limitations and characteristics of the source sentences.<n>Our experiments also highlight the need for better automatic evaluation metrics tailored to RCTS.
arXiv Detail & Related papers (2024-09-30T12:36:25Z)
Spelling Correction through Rewriting of Non-Autoregressive ASR Lattices [8.77712061194924]
We present a finite-state transducer (FST) technique for rewriting wordpiece lattices generated by Transformer-based CTC models. Our algorithm performs grapheme-to-phoneme (G2P) conversion directly from wordpieces into phonemes, avoiding explicit word representations. We achieved up to a 15.2% relative reduction in sentence error rate (SER) on a test set with contextually relevant entities.
arXiv Detail & Related papers (2024-09-24T21:42:25Z)
ASR Error Correction using Large Language Models [4.75940708384553]
Error correction (EC) models play a crucial role in refining Automatic Speech Recognition (ASR) transcriptions.<n>This work investigates the use of large language models (LLMs) for error correction across diverse scenarios.
arXiv Detail & Related papers (2024-09-14T23:33:38Z)
Refining Corpora from a Model Calibration Perspective for Chinese Spelling Correction [40.11364098789309]
Chinese Spelling Correction (CSC) commonly lacks large-scale high-quality corpora. Two data augmentation methods are widely adopted: (1) textitRandom Replacement with the guidance of confusion sets and (2) textitOCR/ASR-based Generation that simulates character misusing.
arXiv Detail & Related papers (2024-07-22T09:26:35Z)
Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter. The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates. The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z)
Test-Time Model Adaptation with Only Forward Passes [68.11784295706995]
Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts. We propose a test-time Forward-Optimization Adaptation (FOA) method. FOA runs on quantized 8-bit ViT, outperforms gradient-based TENT on full-precision 32-bit ViT, and achieves an up to 24-fold memory reduction on ImageNet-C.
arXiv Detail & Related papers (2024-04-02T05:34:33Z)
Towards Calibrated Robust Fine-Tuning of Vision-Language Models [97.19901765814431]
This work proposes a robust fine-tuning method that improves both OOD accuracy and confidence calibration simultaneously in vision language models. We show that both OOD classification and OOD calibration errors have a shared upper bound consisting of two terms of ID data. Based on this insight, we design a novel framework that conducts fine-tuning with a constrained multimodal contrastive loss enforcing a larger smallest singular value.
arXiv Detail & Related papers (2023-11-03T05:41:25Z)
Enhancing OCR Performance through Post-OCR Models: Adopting Glyph Embedding for Improved Correction [0.0]
The novelty of our approach lies in embedding the OCR output using CharBERT and our unique embedding technique, capturing the visual characteristics of characters. Our findings show that post-OCR correction effectively addresses deficiencies in inferior OCR models, and glyph embedding enables the model to achieve superior results.
arXiv Detail & Related papers (2023-08-29T12:41:50Z)
Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR documents [2.6201102730518606]
We demonstrate an effective framework for mitigating OCR errors for any downstream NLP task. We first address the data scarcity problem for model training by constructing a document synthesis pipeline. For the benefit of the community, we have made the document synthesis pipeline available as an open-source project.
arXiv Detail & Related papers (2021-08-06T00:32:54Z)
Layer Pruning on Demand with Intermediate CTC [50.509073206630994]
We present a training and pruning method for ASR based on the connectionist temporal classification (CTC) We show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU.
arXiv Detail & Related papers (2021-06-17T02:40:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.