AugmentGest: Can Random Data Cropping Augmentation Boost Gesture Recognition Performance?
- URL: http://arxiv.org/abs/2506.07216v1
- Date: Sun, 08 Jun 2025 16:43:05 GMT
- Title: AugmentGest: Can Random Data Cropping Augmentation Boost Gesture Recognition Performance?
- Authors: Nada Aboudeshish, Dmitry Ignatov, Radu Timofte,
- Abstract summary: This paper proposes a comprehensive data augmentation framework that integrates geometric transformations, random variations, rotation, zooming and intensity-based transformations.<n>The proposed augmentation strategy is evaluated on three models: multi-stream e2eET, FPPR point cloud-based hand gesture recognition (HGR), and DD-Network.
- Score: 49.64902130083662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation is a crucial technique in deep learning, particularly for tasks with limited dataset diversity, such as skeleton-based datasets. This paper proposes a comprehensive data augmentation framework that integrates geometric transformations, random cropping, rotation, zooming and intensity-based transformations, brightness and contrast adjustments to simulate real-world variations. Random cropping ensures the preservation of spatio-temporal integrity while addressing challenges such as viewpoint bias and occlusions. The augmentation pipeline generates three augmented versions for each sample in addition to the data set sample, thus quadrupling the data set size and enriching the diversity of gesture representations. The proposed augmentation strategy is evaluated on three models: multi-stream e2eET, FPPR point cloud-based hand gesture recognition (HGR), and DD-Network. Experiments are conducted on benchmark datasets including DHG14/28, SHREC'17, and JHMDB. The e2eET model, recognized as the state-of-the-art for hand gesture recognition on DHG14/28 and SHREC'17. The FPPR-PCD model, the second-best performing model on SHREC'17, excels in point cloud-based gesture recognition. DD-Net, a lightweight and efficient architecture for skeleton-based action recognition, is evaluated on SHREC'17 and the Human Motion Data Base (JHMDB). The results underline the effectiveness and versatility of the proposed augmentation strategy, significantly improving model generalization and robustness across diverse datasets and architectures. This framework not only establishes state-of-the-art results on all three evaluated models but also offers a scalable solution to advance HGR and action recognition applications in real-world scenarios. The framework is available at https://github.com/NadaAbodeshish/Random-Cropping-augmentation-HGR
Related papers
- Effective Dual-Region Augmentation for Reduced Reliance on Large Amounts of Labeled Data [1.0901840476380924]
This paper introduces a novel dual-region augmentation approach designed to reduce reliance on large-scale labeled datasets.<n>Our method performs targeted data transformations by applying random noise perturbations to foreground objects.<n>By augmenting training data through structured transformations, our method enables model generalization across domains.
arXiv Detail & Related papers (2025-04-17T16:42:33Z) - Semantic Scene Completion with Multi-Feature Data Balancing Network [5.3431413737671525]
We propose a dual-head model for RGB and depth data (F-TSDF) inputs.<n>Our hybrid encoder-decoder architecture with identity transformation in a pre-activation residual module effectively manages diverse signals within F-TSDF.<n>We evaluate RGB feature fusion strategies and use a combined loss function cross entropy for 2D RGB features and weighted cross-entropy for 3D SSC predictions.
arXiv Detail & Related papers (2024-12-02T12:12:21Z) - Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images [67.66644395272075]
We present first analysis of state-of-the-art semantic segmentation models when faced with geometric out-of-distribution data.
We propose an augmentation technique called "Organ Transplantation" to enhance generalizability.
Our augmentation technique improves SOA model performance by up to 67 % for RGB data and 90 % for HSI data, achieving performance at the level of in-distribution performance on real OOD test data.
arXiv Detail & Related papers (2024-08-27T19:13:15Z) - No-Reference Image Quality Assessment with Global-Local Progressive Integration and Semantic-Aligned Quality Transfer [6.095342999639137]
We develop a dual-measurement framework that combines vision Transformer (ViT)-based global feature extractor and convolutional neural networks (CNNs)-based local feature extractor.<n>We introduce a semantic-aligned quality transfer method that extends the training data by automatically labeling the quality scores of diverse image content with subjective opinion scores.
arXiv Detail & Related papers (2024-08-07T16:34:32Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - T-ADAF: Adaptive Data Augmentation Framework for Image Classification
Network based on Tensor T-product Operator [0.0]
This paper proposes an Adaptive Data Augmentation Framework based on the tensor T-product Operator.
It triples one image data to be trained and gain the result from all these three images together with only less than 0.1% increase in the number of parameters.
Numerical experiments show that our data augmentation framework can improve the performance of original neural network model by 2%.
arXiv Detail & Related papers (2023-06-07T08:30:44Z) - MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based
Self-Supervised Pre-Training [58.07391711548269]
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
arXiv Detail & Related papers (2023-03-23T17:59:02Z) - GPGait: Generalized Pose-based Gait Recognition [11.316545213493223]
Recent works on pose-based gait recognition have demonstrated the potential of using such simple information to achieve results comparable to silhouette-based methods.
To improve the generalization ability of pose-based methods across datasets, we propose a textbfGeneralized textbfPose-based textbfGait recognition framework.
arXiv Detail & Related papers (2023-03-09T13:17:13Z) - Domain Generalization via Ensemble Stacking for Face Presentation Attack
Detection [4.61143637299349]
Face Presentation Attack Detection (PAD) plays a pivotal role in securing face recognition systems against spoofing attacks.
This work proposes a comprehensive solution that combines synthetic data generation and deep ensemble learning.
Experimental results on four datasets demonstrate low half total error rates (HTERs) on three benchmark datasets.
arXiv Detail & Related papers (2023-01-05T16:44:36Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Improving 3D Object Detection through Progressive Population Based
Augmentation [91.56261177665762]
We present the first attempt to automate the design of data augmentation policies for 3D object detection.
We introduce the Progressive Population Based Augmentation (PPBA) algorithm, which learns to optimize augmentation strategies by narrowing down the search space and adopting the best parameters discovered in previous iterations.
We find that PPBA may be up to 10x more data efficient than baseline 3D detection models without augmentation, highlighting that 3D detection models may achieve competitive accuracy with far fewer labeled examples.
arXiv Detail & Related papers (2020-04-02T05:57:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.