Boosting CNN-based Handwriting Recognition Systems with Learnable Relaxation Labeling
- URL: http://arxiv.org/abs/2409.05699v1
- Date: Mon, 9 Sep 2024 15:12:28 GMT
- Title: Boosting CNN-based Handwriting Recognition Systems with Learnable Relaxation Labeling
- Authors: Sara Ferro, Alessandro Torcinovich, Arianna Traviglia, Marcello Pelillo,
- Abstract summary: We propose a novel approach to handwriting recognition that integrates the strengths of two distinct methodologies.
We introduce a sparsification technique that accelerates the convergence of the algorithm and enhances the overall system's performance.
- Score: 48.78361527873024
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The primary challenge for handwriting recognition systems lies in managing long-range contextual dependencies, an issue that traditional models often struggle with. To mitigate it, attention mechanisms have recently been employed to enhance context-aware labelling, thereby achieving state-of-the-art performance. In the field of pattern recognition and image analysis, however, the use of contextual information in labelling problems has a long history and goes back at least to the early 1970's. Among the various approaches developed in those years, Relaxation Labelling (RL) processes have played a prominent role and have been the method of choice in the field for more than a decade. Contrary to recent transformer-based architectures, RL processes offer a principled approach to the use of contextual constraints, having a solid theoretic foundation grounded on variational inequality and game theory, as well as effective algorithms with convergence guarantees. In this paper, we propose a novel approach to handwriting recognition that integrates the strengths of two distinct methodologies. In particular, we propose integrating (trainable) RL processes with various well-established neural architectures and we introduce a sparsification technique that accelerates the convergence of the algorithm and enhances the overall system's performance. Experiments over several benchmark datasets show that RL processes can improve the generalisation ability, even surpassing in some cases transformer-based architectures.
Related papers
- Mind the Gap: A Generalized Approach for Cross-Modal Embedding Alignment [0.0]
Retrieval-Augmented Generation (RAG) systems retrieve context across different text modalities due to semantic gaps.
We introduce a generalized projection-based method, inspired by adapter modules in transfer learning, that efficiently bridges these gaps.
Our approach emphasizes speed, accuracy, and data efficiency, requiring minimal resources for training and inference.
arXiv Detail & Related papers (2024-10-30T20:28:10Z) - LInK: Learning Joint Representations of Design and Performance Spaces through Contrastive Learning for Mechanism Synthesis [15.793704096341523]
In this paper, we introduce LInK, a novel framework that integrates contrastive learning of performance and design space with optimization techniques.
By leveraging a multimodal and transformation-invariant contrastive learning framework, LInK learns a joint representation that captures complex physics and design representations of mechanisms.
Our results demonstrate that LInK not only advances the field of mechanism design but also broadens the applicability of contrastive learning and optimization to other areas of engineering.
arXiv Detail & Related papers (2024-05-31T03:04:57Z) - REX: Rapid Exploration and eXploitation for AI Agents [103.68453326880456]
We propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX.
REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance.
arXiv Detail & Related papers (2023-07-18T04:26:33Z) - Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent
Reinforcement Learning [24.284863599920115]
We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem.
We discuss how previous approaches can be seen as specific instantiations of this framework.
arXiv Detail & Related papers (2022-10-19T23:04:16Z) - $\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text
Generation [65.29170569821093]
parallel text generation has received widespread attention due to its success in generation efficiency.
In this paper, we propose $textitlatent$-GLAT, which employs the discrete latent variables to capture word categorical information.
Experiment results show that our method outperforms strong baselines without the help of an autoregressive model.
arXiv Detail & Related papers (2022-04-05T07:34:12Z) - Hybridization of Capsule and LSTM Networks for unsupervised anomaly
detection on multivariate data [0.0]
This paper introduces a novel NN architecture which hybridises the Long-Short-Term-Memory (LSTM) and Capsule Networks into a single network.
The proposed method uses an unsupervised learning technique to overcome the issues with finding large volumes of labelled training data.
arXiv Detail & Related papers (2022-02-11T10:33:53Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - Single-Layer Vision Transformers for More Accurate Early Exits with Less
Overhead [88.17413955380262]
We introduce a novel architecture for early exiting based on the vision transformer architecture.
We show that our method works for both classification and regression problems.
We also introduce a novel method for integrating audio and visual modalities within early exits in audiovisual data analysis.
arXiv Detail & Related papers (2021-05-19T13:30:34Z) - Weakly supervised cross-domain alignment with optimal transport [102.8572398001639]
Cross-domain alignment between image objects and text sequences is key to many visual-language tasks.
This paper investigates a novel approach for the identification and optimization of fine-grained semantic similarities between image and text entities.
arXiv Detail & Related papers (2020-08-14T22:48:36Z) - Model-based Multi-Agent Reinforcement Learning with Cooperative
Prioritized Sweeping [4.5497948012757865]
We present a new model-based reinforcement learning algorithm, Cooperative Prioritized Sweeping.
The algorithm allows for sample-efficient learning on large problems by exploiting a factorization to approximate the value function.
Our method outperforms the state-of-the-art algorithm sparse cooperative Q-learning algorithm, both on the well-known SysAdmin benchmark and randomized environments.
arXiv Detail & Related papers (2020-01-15T19:13:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.