Related papers: Learning to Align: Addressing Character Frequency Distribution Shifts in Handwritten Text Recognition

Learning to Align: Addressing Character Frequency Distribution Shifts in Handwritten Text Recognition

URL: http://arxiv.org/abs/2506.09846v1
Date: Wed, 11 Jun 2025 15:20:30 GMT
Title: Learning to Align: Addressing Character Frequency Distribution Shifts in Handwritten Text Recognition
Authors: Panagiotis Kaliosis, John Pavlopoulos,
Abstract summary: Handwritten text recognition aims to convert visual input into machine-readable text.<n>Character sets change over time, and character frequency distributions shift across historical periods or regions.<n>We propose a novel loss function that incorporates the Wasserstein distance between the character frequency distribution of the predicted text and a target distribution.
Score: 2.8851756275902476
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Handwritten text recognition aims to convert visual input into machine-readable text, and it remains challenging due to the evolving and context-dependent nature of handwriting. Character sets change over time, and character frequency distributions shift across historical periods or regions, often causing models trained on broad, heterogeneous corpora to underperform on specific subsets. To tackle this, we propose a novel loss function that incorporates the Wasserstein distance between the character frequency distribution of the predicted text and a target distribution empirically derived from training data. By penalizing divergence from expected distributions, our approach enhances both accuracy and robustness under temporal and contextual intra-dataset shifts. Furthermore, we demonstrate that character distribution alignment can also improve existing models at inference time without requiring retraining by integrating it as a scoring function in a guided decoding scheme. Experimental results across multiple datasets and architectures confirm the effectiveness of our method in boosting generalization and performance. We open source our code at https://github.com/pkaliosis/fada.

Related papers

Semantic-guided Fine-tuning of Foundation Model for Long-tailed Visual Recognition [38.74388860692423]
We propose a novel approach, Semantic-guided fine-tuning of foundation model for long-tailed visual recognition (Sage)<n>We introduce an SG-Adapter that integrates class descriptions as semantic guidance to guide the fine-tuning of the visual encoder.<n>Experiments on benchmark datasets demonstrate the effectiveness of the proposed Sage in enhancing performance in long-tailed learning.
arXiv Detail & Related papers (2025-07-17T05:47:19Z)
Navigating Semantic Drift in Task-Agnostic Class-Incremental Learning [51.177789437682954]
Class-incremental learning (CIL) seeks to enable a model to sequentially learn new classes while retaining knowledge of previously learned ones.<n> Balancing flexibility and stability remains a significant challenge, particularly when the task ID is unknown.<n>We propose a novel semantic drift calibration method that incorporates mean shift compensation and covariance calibration.
arXiv Detail & Related papers (2025-02-11T13:57:30Z)
Mind the Gap: A Generalized Approach for Cross-Modal Embedding Alignment [0.0]
Retrieval-Augmented Generation (RAG) systems retrieve context across different text modalities due to semantic gaps. We introduce a generalized projection-based method, inspired by adapter modules in transfer learning, that efficiently bridges these gaps. Our approach emphasizes speed, accuracy, and data efficiency, requiring minimal resources for training and inference.
arXiv Detail & Related papers (2024-10-30T20:28:10Z)
AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation [53.65701943405546]
We learn adaptive inclusive tokens to shift the attribute distribution of the final generative outputs. Our method requires neither explicit attribute specification nor prior knowledge of the bias distribution. Our method achieves comparable performance to models that require specific attributes or editing directions for generation.
arXiv Detail & Related papers (2024-06-18T17:22:23Z)
Improving Sampling Methods for Fine-tuning SentenceBERT in Text Streams [49.3179290313959]
This study explores the efficacy of seven text sampling methods designed to selectively fine-tune language models. We precisely assess the impact of these methods on fine-tuning the SBERT model using four different loss functions. Our findings indicate that Softmax loss and Batch All Triplets loss are particularly effective for text stream classification.
arXiv Detail & Related papers (2024-03-18T23:41:52Z)
Text2Data: Low-Resource Data Generation with Textual Control [100.5970757736845]
Text2Data is a novel approach that utilizes unlabeled data to understand the underlying data distribution.<n>It undergoes finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z)
Robust Novelty Detection through Style-Conscious Feature Ranking [7.691679448855549]
We advocate for a formal distinction between task-relevant semantic or content changes and irrelevant style changes.<n>This distinction forms the basis for robust novelty detection, emphasizing the identification of semantic changes resilient to style distributional shifts.<n>We introduce Stylist, a method that utilizes pretrained large-scale model representations to selectively discard environment-biased features.
arXiv Detail & Related papers (2023-10-05T17:58:32Z)
Optimizing Factual Accuracy in Text Generation through Dynamic Knowledge Selection [71.20871905457174]
Language models (LMs) have revolutionized the way we interact with information, but they often generate nonfactual text. Previous methods use external knowledge as references for text generation to enhance factuality but often struggle with the knowledge mix-up of irrelevant references. We present DKGen, which divide the text generation process into an iterative process.
arXiv Detail & Related papers (2023-08-30T02:22:40Z)
Evaluating Factual Consistency of Texts with Semantic Role Labeling [3.1776833268555134]
We introduce SRLScore, a reference-free evaluation metric designed with text summarization in mind. A final factuality score is computed by an adjustable scoring mechanism. Correlation with human judgments on English summarization datasets shows that SRLScore is competitive with state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T17:59:42Z)
Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation. This paper aims to address the issue with a mask-and-predict strategy. We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions. Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z)
Collaborative Training of GANs in Continuous and Discrete Spaces for Text Generation [21.435286755934534]
We propose a novel text GAN architecture that promotes the collaborative training of the continuous-space and discrete-space methods. Our model substantially outperforms state-of-the-art text GANs with respect to quality, diversity, and global consistency.
arXiv Detail & Related papers (2020-10-16T07:51:16Z)
Heavy-tailed Representations, Text Polarity Classification & Data Augmentation [11.624944730002298]
We develop a novel method to learn a heavy-tailed embedding with desirable regularity properties. A classifier dedicated to the tails of the proposed embedding is obtained which performance outperforms the baseline. Numerical experiments on synthetic and real text data demonstrate the relevance of the proposed framework.
arXiv Detail & Related papers (2020-03-25T19:24:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.