Robust and Efficient Writer-Independent IMU-Based Handwriting Recognization
- URL: http://arxiv.org/abs/2502.20954v1
- Date: Fri, 28 Feb 2025 11:09:28 GMT
- Title: Robust and Efficient Writer-Independent IMU-Based Handwriting Recognization
- Authors: Jindong Li, Tim Hamann, Jens Barth, Peter Kaempf, Dario Zanca, Bjoern Eskofier,
- Abstract summary: Online handwriting recognition (HWR) using data from inertial measurement units (IMUs) remains challenging.<n>Traditional models often struggle to recognize handwriting from unseen writers.<n>This paper presents an HWR model with an encoder-decoder structure for IMU data, featuring a CNN-based encoder for feature extraction and a BiLSTM decoder for sequence modeling.
- Score: 2.022040536958726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online handwriting recognition (HWR) using data from inertial measurement units (IMUs) remains challenging due to variations in writing styles and the limited availability of high-quality annotated datasets. Traditional models often struggle to recognize handwriting from unseen writers, making writer-independent (WI) recognition a crucial but difficult problem. This paper presents an HWR model with an encoder-decoder structure for IMU data, featuring a CNN-based encoder for feature extraction and a BiLSTM decoder for sequence modeling, which supports inputs of varying lengths. Our approach demonstrates strong robustness and data efficiency, outperforming existing methods on WI datasets, including the WI split of the OnHW dataset and our own dataset. Extensive evaluations show that our model maintains high accuracy across different age groups and writing conditions while effectively learning from limited data. Through comprehensive ablation studies, we analyze key design choices, achieving a balance between accuracy and efficiency. These findings contribute to the development of more adaptable and scalable HWR systems for real-world applications.
Related papers
- QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding [53.69841526266547]
Fine-tuning a pre-trained Vision-Language Model with new datasets often falls short in optimizing the vision encoder.
We introduce QID, a novel, streamlined, architecture-preserving approach that integrates query embeddings into the vision encoder.
arXiv Detail & Related papers (2025-04-03T18:47:16Z) - Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation [11.217033010884006]
We show that scaling the decoder yields significant performance gains, always exceeding those achieved by encoder scaling alone.
We also identify label noise as a key challenge in Scene Text Recognition, particularly in real-world data.
Our methodology achieves state-of-the-art performance on 10 out of 11 benchmarks using only real data, while significantly reducing the parameter size and computational costs.
arXiv Detail & Related papers (2025-03-20T14:35:46Z) - Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.<n>We introduce novel methodologies and datasets to overcome these challenges.<n>We propose MhBART, an encoder-decoder model designed to emulate human writing style.<n>We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z) - Scalable Weibull Graph Attention Autoencoder for Modeling Document Networks [50.42343781348247]
We develop a graph Poisson factor analysis (GPFA) which provides analytic conditional posteriors to improve the inference accuracy.
We also extend GPFA to a multi-stochastic-layer version named graph Poisson gamma belief network (GPGBN) to capture the hierarchical document relationships at multiple semantic levels.
Our models can extract high-quality hierarchical latent document representations and achieve promising performance on various graph analytic tasks.
arXiv Detail & Related papers (2024-10-13T02:22:14Z) - Large Language Models for Page Stream Segmentation [0.03495246564946555]
Page Stream (PSS) is an essential prerequisite for automated document processing at scale.
This paper introduces TABME++, an enhanced benchmark featuring commercial Optical Character Recognition (OCR) annotations.
We evaluate the performance of large language models (LLMs) on PSS, focusing on decoder-based models fine-tuned with parameter-efficient methods.
arXiv Detail & Related papers (2024-08-21T20:28:42Z) - Bidirectional Trained Tree-Structured Decoder for Handwritten
Mathematical Expression Recognition [51.66383337087724]
The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR.
Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models.
We propose the Mirror-Flipped Symbol Layout Tree (MF-SLT) and Bidirectional Asynchronous Training (BAT) structure.
arXiv Detail & Related papers (2023-12-31T09:24:21Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - How to Choose Pretrained Handwriting Recognition Models for Single
Writer Fine-Tuning [23.274139396706264]
Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on modern and historical manuscripts.
Those models struggle to obtain the same performance when applied to manuscripts with peculiar characteristics, such as language, paper support, ink, and author handwriting.
In this paper, we take into account large, real benchmark datasets and synthetic ones obtained with a styled Handwritten Text Generation model.
We give a quantitative indication of the most relevant characteristics of such data for obtaining an HTR model able to effectively transcribe manuscripts in small collections with as little as five real fine-tuning lines
arXiv Detail & Related papers (2023-05-04T07:00:28Z) - Uncovering the Handwritten Text in the Margins: End-to-end Handwritten
Text Detection and Recognition [0.840835093659811]
This work presents an end-to-end framework for automatic detection and recognition of handwritten marginalia.
It uses data augmentation and transfer learning to overcome training data scarcity.
The effectiveness of the proposed framework has been empirically evaluated on the data from early book collections found in the Uppsala University Library in Sweden.
arXiv Detail & Related papers (2023-03-10T14:00:53Z) - Benchmarking Online Sequence-to-Sequence and Character-based Handwriting
Recognition from IMU-Enhanced Pens [2.840092825973023]
This paper presents data and benchmark models for real-time sequence-to-sequence learning and single character-based recognition.
Data is recorded by a sensor-enhanced ballpoint pen, accelerometers, a magnetometer and a force sensor at 100Hz.
We propose a variety of datasets including equations and words for both the writer-dependent tasks.
arXiv Detail & Related papers (2022-02-14T20:55:33Z) - AttentionHTR: Handwritten Text Recognition Based on Attention
Encoder-Decoder Networks [0.0]
This work proposes an attention-based sequence-to-sequence model for handwritten word recognition.
It exploits models pre-trained on scene text images as a starting point towards tailoring the handwriting recognition models.
The effectiveness of the proposed end-to-end HTR system has been empirically evaluated on a novel multi-writer dataset.
arXiv Detail & Related papers (2022-01-23T22:48:36Z) - Rethinking Text Line Recognition Models [57.47147190119394]
We consider two decoder families (Connectionist Temporal Classification and Transformer) and three encoder modules (Bidirectional LSTMs, Self-Attention, and GRCLs)
We compare their accuracy and performance on widely used public datasets of scene and handwritten text.
Unlike the more common Transformer-based models, this architecture can handle inputs of arbitrary length.
arXiv Detail & Related papers (2021-04-15T21:43:13Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.