Related papers: A Comprehensive Comparison of End-to-End Approaches for Handwritten Digit String Recognition

A Comprehensive Comparison of End-to-End Approaches for Handwritten Digit String Recognition

URL: http://arxiv.org/abs/2010.15904v1
Date: Thu, 29 Oct 2020 19:38:08 GMT
Title: A Comprehensive Comparison of End-to-End Approaches for Handwritten Digit String Recognition
Authors: Andre G. Hochuli, Alceu S. Britto Jr, David A. Saji, Jose M. Saavedra, Robert Sabourin, Luiz S. Oliveira
Abstract summary: We evaluate different end-to-end approaches to solve the HDSR problem, particularly in two verticals: those based on object-detection and sequence-to-sequence representation. Our results show that the Yolo model compares favorably against segmentation-free models with the advantage of having a shorter pipeline.
Score: 21.522563264752577
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Over the last decades, most approaches proposed for handwritten digit string recognition (HDSR) have resorted to digit segmentation, which is dominated by heuristics, thereby imposing substantial constraints on the final performance. Few of them have been based on segmentation-free strategies where each pixel column has a potential cut location. Recently, segmentation-free strategies has added another perspective to the problem, leading to promising results. However, these strategies still show some limitations when dealing with a large number of touching digits. To bridge the resulting gap, in this paper, we hypothesize that a string of digits can be approached as a sequence of objects. We thus evaluate different end-to-end approaches to solve the HDSR problem, particularly in two verticals: those based on object-detection (e.g., Yolo and RetinaNet) and those based on sequence-to-sequence representation (CRNN). The main contribution of this work lies in its provision of a comprehensive comparison with a critical analysis of the above mentioned strategies on five benchmarks commonly used to assess HDSR, including the challenging Touching Pair dataset, NIST SD19, and two real-world datasets (CAR and CVL) proposed for the ICFHR 2014 competition on HDSR. Our results show that the Yolo model compares favorably against segmentation-free models with the advantage of having a shorter pipeline that minimizes the presence of heuristics-based models. It achieved a 97%, 96%, and 84% recognition rate on the NIST-SD19, CAR, and CVL datasets, respectively.

Related papers

Chain-of-Retrieval Augmented Generation [72.06205327186069]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer. Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z)
A Simple and Generalist Approach for Panoptic Segmentation [57.94892855772925]
Generalist vision models aim for one and the same architecture for a variety of vision tasks. While such shared architecture may seem attractive, generalist models tend to be outperformed by their bespoken counterparts. We address this problem by introducing two key contributions, without compromising the desirable properties of generalist models.
arXiv Detail & Related papers (2024-08-29T13:02:12Z)
Frequency-based Matcher for Long-tailed Semantic Segmentation [22.199174076366003]
We focus on a relatively under-explored task setting, long-tailed semantic segmentation (LTSS) We propose a dual-metric evaluation system and construct the LTSS benchmark to demonstrate the performance of semantic segmentation methods and long-tailed solutions. We also propose a transformer-based algorithm to improve LTSS, frequency-based matcher, which solves the oversuppression problem by one-to-many matching.
arXiv Detail & Related papers (2024-06-06T09:57:56Z)
Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning. CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z)
Deep Active Ensemble Sampling For Image Classification [8.31483061185317]
Active learning frameworks aim to reduce the cost of data annotation by actively requesting the labeling for the most informative data points. Some proposed approaches include uncertainty-based techniques, geometric methods, implicit combination of uncertainty-based and geometric approaches. We present an innovative integration of recent progress in both uncertainty-based and geometric frameworks to enable an efficient exploration/exploitation trade-off in sample selection strategy. Our framework provides two advantages: (1) accurate posterior estimation, and (2) tune-able trade-off between computational overhead and higher accuracy.
arXiv Detail & Related papers (2022-10-11T20:20:20Z)
On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual Recognition [65.67315418971688]
We show that truncating small eigenvalues of the Global Covariance Pooling (GCP) can attain smoother gradient. On fine-grained datasets, truncating the small eigenvalues would make the model fail to converge. Inspired by this observation, we propose a network branch dedicated to magnifying the importance of small eigenvalues.
arXiv Detail & Related papers (2022-05-26T11:41:36Z)
Large-scale Unsupervised Semantic Segmentation [163.3568726730319]
We propose a new problem of large-scale unsupervised semantic segmentation (LUSS) with a newly created benchmark dataset to track the research progress. Based on the ImageNet dataset, we propose the ImageNet-S dataset with 1.2 million training images and 40k high-quality semantic segmentation annotations for evaluation.
arXiv Detail & Related papers (2021-06-06T15:02:11Z)
The Little W-Net That Could: State-of-the-Art Retinal Vessel Segmentation with Minimalistic Models [19.089445797922316]
We show that a minimalistic version of a standard U-Net with several orders of magnitude less parameters closely approximates the performance of current best techniques. We also propose a simple extension, dubbed W-Net, which reaches outstanding performance on several popular datasets. We also test our approach on the Artery/Vein segmentation problem, where we again achieve results well-aligned with the state-of-the-art.
arXiv Detail & Related papers (2020-09-03T19:59:51Z)
The Devil is in Classification: A Simple Framework for Long-tail Object Detection and Instance Segmentation [93.17367076148348]
We investigate performance drop of the state-of-the-art two-stage instance segmentation model Mask R-CNN on the recent long-tail LVIS dataset. We unveil that a major cause is the inaccurate classification of object proposals. We propose a simple calibration framework to more effectively alleviate classification head bias with a bi-level class balanced sampling approach.
arXiv Detail & Related papers (2020-07-23T12:49:07Z)
Learning Robust Feature Representations for Scene Text Detection [0.0]
We present a network architecture derived from the loss to maximize conditional log-likelihood. By extending the layer of latent variables to multiple layers, the network is able to learn robust features on scale. In experiments, the proposed algorithm significantly outperforms state-of-the-art methods in terms of both recall and precision.
arXiv Detail & Related papers (2020-05-26T01:06:47Z)
Equalization Loss for Long-Tailed Object Recognition [109.91045951333835]
State-of-the-art object detection methods still perform poorly on large vocabulary and long-tailed datasets. We propose a simple but effective loss, named equalization loss, to tackle the problem of long-tailed rare categories. Our method achieves AP gains of 4.1% and 4.8% for the rare and common categories on the challenging LVIS benchmark.
arXiv Detail & Related papers (2020-03-11T09:14:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.