Self-paced learning to improve text row detection in historical
documents with missing lables
- URL: http://arxiv.org/abs/2201.12216v1
- Date: Fri, 28 Jan 2022 16:17:48 GMT
- Title: Self-paced learning to improve text row detection in historical
documents with missing lables
- Authors: Mihaela Gaman, Lida Ghadamiyan, Radu Tudor Ionescu, Marius Popescu
- Abstract summary: We propose a self-paced learning algorithm capable of improving the row detection performance.
We sort training examples in descending order with respect to the number of ground-truth bounding boxes.
Using our self-paced learning method, we train a row detector over k iterations, progressively adding batches with less ground-truth annotations.
- Score: 25.22937684446941
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An important preliminary step of optical character recognition systems is the
detection of text rows. To address this task in the context of historical data
with missing labels, we propose a self-paced learning algorithm capable of
improving the row detection performance. We conjecture that pages with more
ground-truth bounding boxes are less likely to have missing annotations. Based
on this hypothesis, we sort the training examples in descending order with
respect to the number of ground-truth bounding boxes, and organize them into k
batches. Using our self-paced learning method, we train a row detector over k
iterations, progressively adding batches with less ground-truth annotations. At
each iteration, we combine the ground-truth bounding boxes with pseudo-bounding
boxes (bounding boxes predicted by the model itself) using non-maximum
suppression, and we include the resulting annotations at the next training
iteration. We demonstrate that our self-paced learning strategy brings
significant performance gains on two data sets of historical documents,
improving the average precision of YOLOv4 with more than 12% on one data set
and 39% on the other.
Related papers
- Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens [1.2549198550400134]
Large language models (LLMs) are extensively used, but there are concerns regarding privacy, security, and copyright due to their opaque training data.
Current solutions to this problem leverage techniques explored in machine learning privacy such as Membership Inference Attacks (MIAs)
We propose an adaptive pre-training data detection method which alleviates this reliance and effectively amplify the identification.
arXiv Detail & Related papers (2024-07-30T23:43:59Z) - A Fixed-Point Approach to Unified Prompt-Based Counting [51.20608895374113]
This paper aims to establish a comprehensive prompt-based counting framework capable of generating density maps for objects indicated by various prompt types, such as box, point, and text.
Our model excels in prominent class-agnostic datasets and exhibits superior performance in cross-dataset adaptation tasks.
arXiv Detail & Related papers (2024-03-15T12:05:44Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - One-bit Supervision for Image Classification: Problem, Solution, and
Beyond [114.95815360508395]
This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification.
We propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm.
In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.
arXiv Detail & Related papers (2023-11-26T07:39:00Z) - DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection [72.25697820290502]
This work introduces a straightforward and efficient strategy to identify potential novel classes through zero-shot classification.
We refer to this approach as the self-training strategy, which enhances recall and accuracy for novel classes without requiring extra annotations, datasets, and re-training.
Empirical evaluations on three datasets, including LVIS, V3Det, and COCO, demonstrate significant improvements over the baseline performance.
arXiv Detail & Related papers (2023-10-02T17:52:24Z) - Adaptive Cross Batch Normalization for Metric Learning [75.91093210956116]
Metric learning is a fundamental problem in computer vision.
We show that it is equally important to ensure that the accumulated embeddings are up to date.
In particular, it is necessary to circumvent the representational drift between the accumulated embeddings and the feature embeddings at the current training iteration.
arXiv Detail & Related papers (2023-03-30T03:22:52Z) - Robustifying Sentiment Classification by Maximally Exploiting Few
Counterfactuals [16.731183915325584]
We propose a novel solution that only requires annotation of a small fraction of the original training data.
We achieve noticeable accuracy improvements by adding only 1% manual counterfactuals.
arXiv Detail & Related papers (2022-10-21T08:30:09Z) - Annotation Error Detection: Analyzing the Past and Present for a More
Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets.
We define a uniform evaluation setup including a new formalization of the annotation error detection task.
We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z) - Using Self-Supervised Pretext Tasks for Active Learning [7.214674613451605]
We propose a novel active learning approach that utilizes self-supervised pretext tasks and a unique data sampler to select data that are both difficult and representative.
The pretext task learner is trained on the unlabeled set, and the unlabeled data are sorted and grouped into batches by their pretext task losses.
In each iteration, the main task model is used to sample the most uncertain data in a batch to be annotated.
arXiv Detail & Related papers (2022-01-19T07:58:06Z) - Boosting offline handwritten text recognition in historical documents
with few labeled lines [5.9207487081080705]
We analyze how to perform transfer learning from a massive database to a smaller historical database.
Second, we analyze methods to efficiently combine TL and data augmentation.
An algorithm to mitigate the effects of incorrect labelings in the training set is proposed.
arXiv Detail & Related papers (2020-12-04T11:59:35Z) - Bootstrapping Weakly Supervised Segmentation-free Word Spotting through
HMM-based Alignment [0.5076419064097732]
We propose an approach that utilises transcripts without bounding box annotations to train word spotting models.
This is done through a training-free alignment procedure based on hidden Markov models.
We believe that this will be a significant advance towards a more general use of word spotting, since digital transcription data will already exist for parts of many collections of interest.
arXiv Detail & Related papers (2020-03-24T19:41:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.