Related papers: Capturing More: Learning Multi-Domain Representations for Robust Online Handwriting Verification

Capturing More: Learning Multi-Domain Representations for Robust Online Handwriting Verification

URL: http://arxiv.org/abs/2508.01427v2
Date: Tue, 14 Oct 2025 13:36:23 GMT
Title: Capturing More: Learning Multi-Domain Representations for Robust Online Handwriting Verification
Authors: Peirong Zhang, Kai Ding, Lianwen Jin,
Abstract summary: SPECTRUM is a temporal-frequency synergistic model that unlocks the untapped potential of multi-domain representation learning for online handwriting verification (OHV)<n>Extensive experiments demonstrate SPECTRUM's superior performance over existing methods.<n>These findings pave the way for future research in multi-domain approaches across both feature and biometric domains.
Score: 49.085301457166544
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In this paper, we propose SPECTRUM, a temporal-frequency synergistic model that unlocks the untapped potential of multi-domain representation learning for online handwriting verification (OHV). SPECTRUM comprises three core components: (1) a multi-scale interactor that finely combines temporal and frequency features through dual-modal sequence interaction and multi-scale aggregation, (2) a self-gated fusion module that dynamically integrates global temporal and frequency features via self-driven balancing. These two components work synergistically to achieve micro-to-macro spectral-temporal integration. (3) A multi-domain distance-based verifier then utilizes both temporal and frequency representations to improve discrimination between genuine and forged handwriting, surpassing conventional temporal-only approaches. Extensive experiments demonstrate SPECTRUM's superior performance over existing OHV methods, underscoring the effectiveness of temporal-frequency multi-domain learning. Furthermore, we reveal that incorporating multiple handwritten biometrics fundamentally enhances the discriminative power of handwriting representations and facilitates verification. These findings not only validate the efficacy of multi-domain learning in OHV but also pave the way for future research in multi-domain approaches across both feature and biometric domains. Code is publicly available at https://github.com/NiceRingNode/SPECTRUM.

Related papers

UniDiff: A Unified Diffusion Framework for Multimodal Time Series Forecasting [90.47915032778366]
We propose UniDiff, a unified diffusion framework for multimodal time series forecasting.<n>At its core lies a unified and parallel fusion module, where a single cross-attention mechanism integrates structural information from timestamps and semantic context from texts.<n>Experiments on real-world benchmark datasets across eight domains demonstrate that the proposed UniDiff model achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-12-08T05:36:14Z)
TEM^3-Learning: Time-Efficient Multimodal Multi-Task Learning for Advanced Assistive Driving [22.22943635900334]
TEM3-Learning is a novel framework that jointly optimize driver emotion recognition, driver behavior recognition, traffic context recognition, and vehicle behavior recognition.<n>It achieves state-of-the-art accuracy across all four tasks, maintaining a lightweight architecture with fewer than 6 million parameters and delivering an impressive 142.32 FPS inference speed.
arXiv Detail & Related papers (2025-06-22T16:12:27Z)
Multivariate Long-term Time Series Forecasting with Fourier Neural Filter [55.09326865401653]
We introduce FNF as the backbone and DBD as architecture to provide excellent learning capabilities and optimal learning pathways for spatial-temporal modeling.<n>We show that FNF unifies local time-domain and global frequency-domain information processing within a single backbone that extends naturally to spatial modeling.
arXiv Detail & Related papers (2025-06-10T18:40:20Z)
FreRA: A Frequency-Refined Augmentation for Contrastive Learning on Time Series Classification [56.925103708982164]
We present a novel perspective from the frequency domain and identify three advantages for downstream classification: global, independent, and compact.<n>We propose the lightweight yet effective Frequency Refined Augmentation (FreRA) tailored for time series contrastive learning on classification tasks.<n>FreRA consistently outperforms ten leading baselines on time series classification, anomaly detection, and transfer learning tasks.
arXiv Detail & Related papers (2025-05-29T07:18:28Z)
Unsupervised Multi-modal Feature Alignment for Time Series Representation Learning [19.66959764702544]
We introduce an innovative approach that focuses on aligning and binding time series representations encoded from different modalities.<n>In contrast to conventional methods that fuse features from multiple modalities, our proposed approach simplifies the neural architecture by retaining a single time series encoder.<n>Our approach outperforms existing state-of-the-art URL methods across diverse downstream tasks.
arXiv Detail & Related papers (2023-12-09T22:31:20Z)
FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space [7.324708513042455]
This paper proposes a novel contrastive learning framework, called FOCAL, for extracting comprehensive features from multimodal time-series sensing signals. It consistently outperforms the state-of-the-art baselines in downstream tasks with a clear margin.
arXiv Detail & Related papers (2023-10-30T22:55:29Z)
Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem. By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts. Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z)
Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream. At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank. To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z)
Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions. The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z)
Multidomain Multimodal Fusion For Human Action Recognition Using Inertial Sensors [1.52292571922932]
We propose a novel multidomain multimodal fusion framework that extracts complementary and distinct features from different domains of the input modality. Features in different domains are extracted by Convolutional Neural networks (CNNs) and then fused by Canonical Correlation based Fusion (CCF) for improving the accuracy of human action recognition.
arXiv Detail & Related papers (2020-08-22T03:46:12Z)
Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition [89.0152015268929]
We propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition. The proposed method includes two key components: 1) enhanced temporal representation via the 3D Central Difference Convolution (3D-CDC) family, and optimized backbones for multi-modal-rate branches and lateral connections. The resultant multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics.
arXiv Detail & Related papers (2020-08-21T10:45:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.