Sketch Input Method Editor: A Comprehensive Dataset and Methodology for Systematic Input Recognition
- URL: http://arxiv.org/abs/2311.18254v2
- Date: Sun, 31 Mar 2024 13:13:37 GMT
- Title: Sketch Input Method Editor: A Comprehensive Dataset and Methodology for Systematic Input Recognition
- Authors: Guangming Zhu, Siyuan Wang, Qing Cheng, Kelong Wu, Hao Li, Liang Zhang,
- Abstract summary: This study aims to create a Sketch Input Method Editor (SketchIME) specifically designed for a professional C4I system.
Within this system, sketches are utilized as low-fidelity prototypes for recommending standardized symbols.
By incorporating few-shot domain adaptation and class-incremental learning, the network's ability to adapt to new users is significantly enhanced.
- Score: 14.667745062352148
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the recent surge in the use of touchscreen devices, free-hand sketching has emerged as a promising modality for human-computer interaction. While previous research has focused on tasks such as recognition, retrieval, and generation of familiar everyday objects, this study aims to create a Sketch Input Method Editor (SketchIME) specifically designed for a professional C4I system. Within this system, sketches are utilized as low-fidelity prototypes for recommending standardized symbols in the creation of comprehensive situation maps. This paper also presents a systematic dataset comprising 374 specialized sketch types, and proposes a simultaneous recognition and segmentation architecture with multilevel supervision between recognition and segmentation to improve performance and enhance interpretability. By incorporating few-shot domain adaptation and class-incremental learning, the network's ability to adapt to new users and extend to new task-specific classes is significantly enhanced. Results from experiments conducted on both the proposed dataset and the SPG dataset illustrate the superior performance of the proposed architecture. Our dataset and code are publicly available at https://github.com/GuangmingZhu/SketchIME.
Related papers
- Shortcut Learning Susceptibility in Vision Classifiers [3.004632712148892]
Shortcut learning is where machine learning models exploit spurious correlations in data instead of capturing meaningful features.
This phenomenon is prevalent across various machine learning applications, including vision, natural language processing, and speech recognition.
We systematically evaluate these architectures by introducing deliberate shortcuts into the dataset that are positionally correlated with class labels.
arXiv Detail & Related papers (2025-02-13T10:25:52Z) - Faceptor: A Generalist Model for Face Perception [52.8066001012464]
Faceptor is proposed to adopt a well-designed single-encoder dual-decoder architecture.
Layer-Attention into Faceptor enables the model to adaptively select features from optimal layers to perform the desired tasks.
Our training framework can also be applied to auxiliary supervised learning, significantly improving performance in data-sparse tasks such as age estimation and expression recognition.
arXiv Detail & Related papers (2024-03-14T15:42:31Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Class-Specific Semantic Reconstruction for Open Set Recognition [101.24781422480406]
Open set recognition enables deep neural networks (DNNs) to identify samples of unknown classes.
We propose a novel method, called Class-Specific Semantic Reconstruction (CSSR), that integrates the power of auto-encoder (AE) and prototype learning.
Results of experiments conducted on multiple datasets show that the proposed method achieves outstanding performance in both close and open set recognition.
arXiv Detail & Related papers (2022-07-05T16:25:34Z) - DcnnGrasp: Towards Accurate Grasp Pattern Recognition with Adaptive
Regularizer Learning [13.08779945306727]
Current state-of-the-art methods ignore category information of objects which is crucial for grasp pattern recognition.
This paper presents a novel dual-branch convolutional neural network (DcnnGrasp) to achieve joint learning of object category classification and grasp pattern recognition.
arXiv Detail & Related papers (2022-05-11T00:34:27Z) - RSBNet: One-Shot Neural Architecture Search for A Backbone Network in
Remote Sensing Image Recognition [43.95699860302204]
We propose a new design paradigm for the backbone architecture in RSI recognition tasks, including scene classification, land-cover classification and object detection.
A novel one-shot architecture search framework based on weight-sharing strategy and evolutionary algorithm is proposed, called RSBNet.
Extensive experiments have been conducted on five benchmark datasets for different recognition tasks, the results show the effectiveness of the proposed search paradigm.
arXiv Detail & Related papers (2021-12-07T02:44:16Z) - Multi-Perspective LSTM for Joint Visual Representation Learning [81.21490913108835]
We present a novel LSTM cell architecture capable of learning both intra- and inter-perspective relationships available in visual sequences captured from multiple perspectives.
Our architecture adopts a novel recurrent joint learning strategy that uses additional gates and memories at the cell level.
We show that by using the proposed cell to create a network, more effective and richer visual representations are learned for recognition tasks.
arXiv Detail & Related papers (2021-05-06T16:44:40Z) - Distribution Alignment: A Unified Framework for Long-tail Visual
Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition.
We then introduce a generalized re-weight method in the two-stage learning to balance the class prior.
Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - A Multisensory Learning Architecture for Rotation-invariant Object
Recognition [0.0]
This study presents a multisensory machine learning architecture for object recognition by employing a novel dataset that was constructed with the iCub robot.
The proposed architecture combines convolutional neural networks to form representations (i.e., features) ford grayscale color images and a multi-layer perceptron algorithm to process depth data.
arXiv Detail & Related papers (2020-09-14T09:39:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.