StrokeNet: Unveiling How to Learn Fine-Grained Interactions in Online Handwritten Stroke Classification
- URL: http://arxiv.org/abs/2512.06290v1
- Date: Sat, 06 Dec 2025 04:41:08 GMT
- Title: StrokeNet: Unveiling How to Learn Fine-Grained Interactions in Online Handwritten Stroke Classification
- Authors: Yiheng Huang, Shuang She, Zewei Wei, Jianmin Lin, Ming Yang, Wenyin Liu,
- Abstract summary: Stroke classification remains challenging due to variations in writing style, ambiguous content, and dynamic writing positions.<n>Our observations indicate that stroke interactions are typically localized, making it difficult for existing deep learning methods to capture such fine-grained relationships.<n>By selecting reference points and using their sequential order to represent strokes in a fine-grained manner, this problem can be effectively solved.
- Score: 9.447865895922982
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stroke classification remains challenging due to variations in writing style, ambiguous content, and dynamic writing positions. The core challenge in stroke classification is modeling the semantic relationships between strokes. Our observations indicate that stroke interactions are typically localized, making it difficult for existing deep learning methods to capture such fine-grained relationships. Although viewing strokes from a point-level perspective can address this issue, it introduces redundancy. However, by selecting reference points and using their sequential order to represent strokes in a fine-grained manner, this problem can be effectively solved. This insight inspired StrokeNet, a novel network architecture encoding strokes as reference pair representations (points + feature vectors), where reference points enable spatial queries and features mediate interaction modeling. Specifically, we dynamically select reference points for each stroke and sequence them, employing an Inline Sequence Attention (ISA) module to construct contextual features. To capture spatial feature interactions, we devised a Cross-Ellipse Query (CEQ) mechanism that clusters reference points and extracts features across varying spatial scales. Finally, a joint optimization framework simultaneously predicts stroke categories via reference points regression and adjacent stroke semantic transition modeling through an Auxiliary Branch (Aux-Branch). Experimental results show that our method achieves state-of-the-art performance on multiple public online handwritten datasets. Notably, on the CASIA-onDo dataset, the accuracy improves from 93.81$\%$ to 95.54$\%$, demonstrating the effectiveness and robustness of our approach.
Related papers
- Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation [120.23172120151821]
We propose a novel approach for disentangling visual and semantic features from the backbones of pre-trained diffusion models.<n>We introduce an automated pipeline that constructs image pairs with annotated semantic and visual correspondences.<n>We propose a new metric, Visual Semantic Matching, that quantifies visual inconsistencies in subject-driven image generation.
arXiv Detail & Related papers (2025-09-26T07:11:55Z) - Relation3D: Enhancing Relation Modeling for Point Cloud Instance Segmentation [4.476845464695504]
3D instance segmentation aims to predict a set of object instances in a scene, representing them as binary foreground masks with corresponding semantic labels.<n>We introduce textbfRelation3D: Enhancing Relation Modeling for Point Instance. Specifically, we introduce an adaptive superpoint aggregation module and a contrastive learning-guided superpoint refinement module to better represent superpoint features (scene features)<n>Our relation-aware self-attention mechanism enhances the capabilities of modeling relationships between queries by incorporating positional and geometric relationships into the self-attention mechanism.
arXiv Detail & Related papers (2025-06-22T03:48:19Z) - SEP-GCN: Leveraging Similar Edge Pairs with Temporal and Spatial Contexts for Location-Based Recommender Systems [0.0]
We propose SEP-GCN, a novel graph-based recommendation framework that learns from pairs of contextually similar interaction edges.<n>By identifying edge pairs that occur within similar temporal windows or geographic proximity, SEP-GCN augments the user-item graph with contextual similarity links.<n> Experiments on benchmark data sets show that SEP-GCN consistently outperforms strong baselines in both predictive accuracy and robustness.
arXiv Detail & Related papers (2025-06-19T03:48:30Z) - Self-supervised Latent Space Optimization with Nebula Variational Coding [87.20343320266215]
This paper proposes a variational inference model which leads to a clustered embedding.<n>We introduce additional variables in the latent space, called textbfnebula anchors, that guide the latent variables to form clusters during training.<n>Since each latent feature can be labeled with the closest anchor, we also propose to apply metric learning in a self-supervised way to make the separation between clusters more explicit.
arXiv Detail & Related papers (2025-06-02T08:13:32Z) - Representing Videos as Discriminative Sub-graphs for Action Recognition [165.54738402505194]
We introduce a new design of sub-graphs to represent and encode theriminative patterns of each action in the videos.
We present MUlti-scale Sub-Earn Ling (MUSLE) framework that novelly builds space-time graphs and clusters into compact sub-graphs on each scale.
arXiv Detail & Related papers (2022-01-11T16:15:25Z) - StrokeNet: Stroke Assisted and Hierarchical Graph Reasoning Networks [31.76016966100244]
StrokeNet is proposed to effectively detect the texts by capturing the fine-grained strokes.
Different from existing approaches that represent the text area by a series of points or rectangular boxes, we directly localize strokes of each text instance.
arXiv Detail & Related papers (2021-11-23T08:26:42Z) - Temporally-Consistent Surface Reconstruction using Metrically-Consistent
Atlases [131.50372468579067]
We propose a method for unsupervised reconstruction of a temporally-consistent sequence of surfaces from a sequence of time-evolving point clouds.
We represent the reconstructed surfaces as atlases computed by a neural network, which enables us to establish correspondences between frames.
Our approach outperforms state-of-the-art ones on several challenging datasets.
arXiv Detail & Related papers (2021-11-12T17:48:25Z) - Modelling Neighbor Relation in Joint Space-Time Graph for Video
Correspondence Learning [53.74240452117145]
This paper presents a self-supervised method for learning reliable visual correspondence from unlabeled videos.
We formulate the correspondence as finding paths in a joint space-time graph, where nodes are grid patches sampled from frames, and are linked by two types of edges.
Our learned representation outperforms the state-of-the-art self-supervised methods on a variety of visual tasks.
arXiv Detail & Related papers (2021-09-28T05:40:01Z) - PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency [38.93610732090426]
We present PointDSC, a novel deep neural network that explicitly incorporates spatial consistency for pruning outlier correspondences.
Our method outperforms the state-of-the-art hand-crafted and learning-based outlier rejection approaches on several real-world datasets.
arXiv Detail & Related papers (2021-03-09T14:56:08Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.