Related papers: Boosting CNN-based Handwriting Recognition Systems with Learnable Relaxation Labeling

Boosting CNN-based Handwriting Recognition Systems with Learnable Relaxation Labeling

URL: http://arxiv.org/abs/2409.05699v1
Date: Mon, 9 Sep 2024 15:12:28 GMT
Title: Boosting CNN-based Handwriting Recognition Systems with Learnable Relaxation Labeling
Authors: Sara Ferro, Alessandro Torcinovich, Arianna Traviglia, Marcello Pelillo,
Abstract summary: We propose a novel approach to handwriting recognition that integrates the strengths of two distinct methodologies. We introduce a sparsification technique that accelerates the convergence of the algorithm and enhances the overall system's performance.
Score: 48.78361527873024
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The primary challenge for handwriting recognition systems lies in managing long-range contextual dependencies, an issue that traditional models often struggle with. To mitigate it, attention mechanisms have recently been employed to enhance context-aware labelling, thereby achieving state-of-the-art performance. In the field of pattern recognition and image analysis, however, the use of contextual information in labelling problems has a long history and goes back at least to the early 1970's. Among the various approaches developed in those years, Relaxation Labelling (RL) processes have played a prominent role and have been the method of choice in the field for more than a decade. Contrary to recent transformer-based architectures, RL processes offer a principled approach to the use of contextual constraints, having a solid theoretic foundation grounded on variational inequality and game theory, as well as effective algorithms with convergence guarantees. In this paper, we propose a novel approach to handwriting recognition that integrates the strengths of two distinct methodologies. In particular, we propose integrating (trainable) RL processes with various well-established neural architectures and we introduce a sparsification technique that accelerates the convergence of the algorithm and enhances the overall system's performance. Experiments over several benchmark datasets show that RL processes can improve the generalisation ability, even surpassing in some cases transformer-based architectures.

Related papers

Towards Transformer-Based Aligned Generation with Self-Coherence Guidance [51.42269790543461]
We introduce a training-free approach for enhancing alignment in Transformer-based Text-Guided Diffusion Models (TGDMs) Existing TGDMs often struggle to generate semantically aligned images, particularly when dealing with complex text prompts or multi-concept attribute binding challenges. Our method addresses these challenges by directly optimizing cross-attention maps during the generation process.
arXiv Detail & Related papers (2025-03-22T07:03:57Z)
Speculative Decoding and Beyond: An In-Depth Survey of Techniques [4.165029665035158]
Sequential dependencies present a fundamental bottleneck in deploying large-scale autoregressive models. Recent advances in generation-refinement frameworks demonstrate that this trade-off can be significantly mitigated.
arXiv Detail & Related papers (2025-02-27T03:53:45Z)
A Comprehensive Framework for Semantic Similarity Analysis of Human and AI-Generated Text Using Transformer Architectures and Ensemble Techniques [40.704014941800594]
Traditional methods fail to capture nuanced semantic differences between human and machine-generated content. We propose a novel approach that combines a pre-trained DeBERTa-v3-large model, Bi-directional LSTMs, and linear attention pooling to capture both local and global semantic patterns. Experimental results show that this approach works better than traditional methods, proving its usefulness for AI-generated text detection and other text comparison tasks.
arXiv Detail & Related papers (2025-01-24T07:07:37Z)
From Noise to Nuance: Advances in Deep Generative Image Models [8.802499769896192]
Deep learning-based image generation has undergone a paradigm shift since 2021. Recent developments in Stable Diffusion, DALL-E, and consistency models have redefined the capabilities and performance boundaries of image synthesis. We investigate how enhanced multi-modal understanding and zero-shot generation capabilities are reshaping practical applications across industries.
arXiv Detail & Related papers (2024-12-12T02:09:04Z)
Symbolic-AI-Fusion Deep Learning (SAIF-DL): Encoding Knowledge into Training with Answer Set Programming Loss Penalties by a Novel Loss Function Approach [0.7420433640907689]
We encode domain-specific constraints, rules, and logical reasoning directly into the model's learning process. The proposed approach is flexible and applicable to both regression and classification tasks. The design allows for the automation of the loss function by simply updating the ASP rules.
arXiv Detail & Related papers (2024-11-13T09:33:33Z)
Mind the Gap: A Generalized Approach for Cross-Modal Embedding Alignment [0.0]
Retrieval-Augmented Generation (RAG) systems retrieve context across different text modalities due to semantic gaps. We introduce a generalized projection-based method, inspired by adapter modules in transfer learning, that efficiently bridges these gaps. Our approach emphasizes speed, accuracy, and data efficiency, requiring minimal resources for training and inference.
arXiv Detail & Related papers (2024-10-30T20:28:10Z)
A Distribution-Aware Flow-Matching for Generating Unstructured Data for Few-Shot Reinforcement Learning [1.0709300917082865]
We introduce a distribution-aware flow matching approach to generate synthetic unstructured data for few-shot reinforcement learning. Our approach addresses key challenges in traditional model-based RL, such as overfitting and data correlation. Results demonstrate that our method achieves stable convergence in terms of maximum Q-value while enhancing frame rates by 30% in the initial timestamps.
arXiv Detail & Related papers (2024-09-21T15:50:59Z)
LInK: Learning Joint Representations of Design and Performance Spaces through Contrastive Learning for Mechanism Synthesis [15.793704096341523]
In this paper, we introduce LInK, a novel framework that integrates contrastive learning of performance and design space with optimization techniques. By leveraging a multimodal and transformation-invariant contrastive learning framework, LInK learns a joint representation that captures complex physics and design representations of mechanisms. Our results demonstrate that LInK not only advances the field of mechanism design but also broadens the applicability of contrastive learning and optimization to other areas of engineering.
arXiv Detail & Related papers (2024-05-31T03:04:57Z)
REX: Rapid Exploration and eXploitation for AI Agents [103.68453326880456]
We propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance.
arXiv Detail & Related papers (2023-07-18T04:26:33Z)
Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning [24.284863599920115]
We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem. We discuss how previous approaches can be seen as specific instantiations of this framework.
arXiv Detail & Related papers (2022-10-19T23:04:16Z)
$\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text Generation [65.29170569821093]
parallel text generation has received widespread attention due to its success in generation efficiency. In this paper, we propose $textitlatent$-GLAT, which employs the discrete latent variables to capture word categorical information. Experiment results show that our method outperforms strong baselines without the help of an autoregressive model.
arXiv Detail & Related papers (2022-04-05T07:34:12Z)
Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward. We introduce a new RL formulation for text generation from the soft Q-learning perspective. We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)
Single-Layer Vision Transformers for More Accurate Early Exits with Less Overhead [88.17413955380262]
We introduce a novel architecture for early exiting based on the vision transformer architecture. We show that our method works for both classification and regression problems. We also introduce a novel method for integrating audio and visual modalities within early exits in audiovisual data analysis.
arXiv Detail & Related papers (2021-05-19T13:30:34Z)
Weakly supervised cross-domain alignment with optimal transport [102.8572398001639]
Cross-domain alignment between image objects and text sequences is key to many visual-language tasks. This paper investigates a novel approach for the identification and optimization of fine-grained semantic similarities between image and text entities.
arXiv Detail & Related papers (2020-08-14T22:48:36Z)
Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Sweeping [4.5497948012757865]
We present a new model-based reinforcement learning algorithm, Cooperative Prioritized Sweeping. The algorithm allows for sample-efficient learning on large problems by exploiting a factorization to approximate the value function. Our method outperforms the state-of-the-art algorithm sparse cooperative Q-learning algorithm, both on the well-known SysAdmin benchmark and randomized environments.
arXiv Detail & Related papers (2020-01-15T19:13:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.