Boosting Gesture Recognition with an Automatic Gesture Annotation Framework
- URL: http://arxiv.org/abs/2401.11150v2
- Date: Sat, 05 Oct 2024 06:08:05 GMT
- Title: Boosting Gesture Recognition with an Automatic Gesture Annotation Framework
- Authors: Junxiao Shen, Xuhai Xu, Ran Tan, Amy Karlson, Evan Strasnick,
- Abstract summary: We propose a framework that can automatically annotate gesture classes and identify their temporal ranges.
Our framework consists of two key components: (1) a novel annotation model that leverages the Connectionist Temporal Classification (CTC) loss, and (2) a semi-supervised learning pipeline.
These high-quality pseudo labels can also be used to enhance the accuracy of other downstream gesture recognition models.
- Score: 10.158684480548242
- License:
- Abstract: Training a real-time gesture recognition model heavily relies on annotated data. However, manual data annotation is costly and demands substantial human effort. In order to address this challenge, we propose a framework that can automatically annotate gesture classes and identify their temporal ranges. Our framework consists of two key components: (1) a novel annotation model that leverages the Connectionist Temporal Classification (CTC) loss, and (2) a semi-supervised learning pipeline that enables the model to improve its performance by training on its own predictions, known as pseudo labels. These high-quality pseudo labels can also be used to enhance the accuracy of other downstream gesture recognition models. To evaluate our framework, we conducted experiments using two publicly available gesture datasets. Our ablation study demonstrates that our annotation model design surpasses the baseline in terms of both gesture classification accuracy (3-4% improvement) and localization accuracy (71-75% improvement). Additionally, we illustrate that the pseudo-labeled dataset produced from the proposed framework significantly boosts the accuracy of a pre-trained downstream gesture recognition model by 11-18%. We believe that this annotation framework has immense potential to improve the training of downstream gesture recognition models using unlabeled datasets.
Related papers
- Enhancing Hyperspectral Image Prediction with Contrastive Learning in Low-Label Regime [0.810304644344495]
Self-supervised contrastive learning is an effective approach for addressing the challenge of limited labelled data.
We evaluate the method's performance for both the single-label and multi-label classification tasks.
arXiv Detail & Related papers (2024-10-10T10:20:16Z) - TrajSSL: Trajectory-Enhanced Semi-Supervised 3D Object Detection [59.498894868956306]
Pseudo-labeling approaches to semi-supervised learning adopt a teacher-student framework.
We leverage pre-trained motion-forecasting models to generate object trajectories on pseudo-labeled data.
Our approach improves pseudo-label quality in two distinct manners.
arXiv Detail & Related papers (2024-09-17T05:35:00Z) - LabelFormer: Object Trajectory Refinement for Offboard Perception from
LiDAR Point Clouds [37.87496475959941]
"Auto-labelling" offboard perception models are trained to automatically generate annotations from raw LiDAR point clouds.
We propose LabelFormer, a simple, efficient, and effective trajectory-level refinement approach.
Our approach first encodes each frame's observations separately, then exploits self-attention to reason about the trajectory with full temporal context.
arXiv Detail & Related papers (2023-11-02T17:56:06Z) - A Benchmark Generative Probabilistic Model for Weak Supervised Learning [2.0257616108612373]
Weak Supervised Learning approaches have been developed to alleviate the annotation burden.
We show that latent variable models (PLVMs) achieve state-of-the-art performance across four datasets.
arXiv Detail & Related papers (2023-03-31T07:06:24Z) - Dynamic Supervisor for Cross-dataset Object Detection [52.95818230087297]
Cross-dataset training in object detection tasks is complicated because the inconsistency in the category range across datasets transforms fully supervised learning into semi-supervised learning.
We propose a dynamic supervisor framework that updates the annotations multiple times through multiple-updated submodels trained using hard and soft labels.
In the final generated annotations, both recall and precision improve significantly through the integration of hard-label training with soft-label training.
arXiv Detail & Related papers (2022-04-01T03:18:46Z) - Semi-supervised Long-tailed Recognition using Alternate Sampling [95.93760490301395]
Main challenges in long-tailed recognition come from the imbalanced data distribution and sample scarcity in its tail classes.
We propose a new recognition setting, namely semi-supervised long-tailed recognition.
We demonstrate significant accuracy improvements over other competitive methods on two datasets.
arXiv Detail & Related papers (2021-05-01T00:43:38Z) - Weakly Supervised Video Salient Object Detection [79.51227350937721]
We present the first weakly supervised video salient object detection model based on relabeled "fixation guided scribble annotations"
An "Appearance-motion fusion module" and bidirectional ConvLSTM based framework are proposed to achieve effective multi-modal learning and long-term temporal context modeling.
arXiv Detail & Related papers (2021-04-06T09:48:38Z) - SLADE: A Self-Training Framework For Distance Metric Learning [75.54078592084217]
We present a self-training framework, SLADE, to improve retrieval performance by leveraging additional unlabeled data.
We first train a teacher model on the labeled data and use it to generate pseudo labels for the unlabeled data.
We then train a student model on both labels and pseudo labels to generate final feature embeddings.
arXiv Detail & Related papers (2020-11-20T08:26:10Z) - Improving Semantic Segmentation via Self-Training [75.07114899941095]
We show that we can obtain state-of-the-art results using a semi-supervised approach, specifically a self-training paradigm.
We first train a teacher model on labeled data, and then generate pseudo labels on a large set of unlabeled data.
Our robust training framework can digest human-annotated and pseudo labels jointly and achieve top performances on Cityscapes, CamVid and KITTI datasets.
arXiv Detail & Related papers (2020-04-30T17:09:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.