SHARDeg: A Benchmark for Skeletal Human Action Recognition in Degraded Scenarios
- URL: http://arxiv.org/abs/2505.18048v2
- Date: Tue, 27 May 2025 15:11:07 GMT
- Title: SHARDeg: A Benchmark for Skeletal Human Action Recognition in Degraded Scenarios
- Authors: Simon Malzard, Nitish Mital, Richard Walters, Victoria Nockles, Raghuveer Rao, Celso M. De Melo,
- Abstract summary: Skeletal Human Action Recognition (SHAR) is critical in many CV pipelines operating in real-time and at the edge.<n>This is exemplified by Skeletal Human Action Recognition (SHAR), which is critical in many CV pipelines operating in real-time and at the edge.<n>We demonstrate the need for this benchmark by showing that the form of degradation, which has not previously been considered, has a large impact on model accuracy.<n>We identify that temporal regularity of frames in degraded SHAR data is likely a major driver of differences in model performance.
- Score: 3.0519884745675485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computer vision (CV) models for detection, prediction or classification tasks operate on video data-streams that are often degraded in the real world, due to deployment in real-time or on resource-constrained hardware. It is therefore critical that these models are robust to degraded data, but state of the art (SoTA) models are often insufficiently assessed with these real-world constraints in mind. This is exemplified by Skeletal Human Action Recognition (SHAR), which is critical in many CV pipelines operating in real-time and at the edge, but robustness to degraded data has previously only been shallowly and inconsistently assessed. Here we address this issue for SHAR by providing an important first data degradation benchmark on the most detailed and largest 3D open dataset, NTU-RGB+D-120, and assess the robustness of five leading SHAR models to three forms of degradation that represent real-world issues. We demonstrate the need for this benchmark by showing that the form of degradation, which has not previously been considered, has a large impact on model accuracy; at the same effective frame rate, model accuracy can vary by >40% depending on degradation type. We also identify that temporal regularity of frames in degraded SHAR data is likely a major driver of differences in model performance, and harness this to improve performance of existing models by up to >40%, through employing a simple mitigation approach based on interpolation. Finally, we highlight how our benchmark has helped identify an important degradation-resistant SHAR model based in Rough Path Theory; the LogSigRNN SHAR model outperforms the SoTA DeGCN model in five out of six cases at low frame rates by an average accuracy of 6%, despite trailing the SoTA model by 11-12% on un-degraded data at high frame rates (30 FPS).
Related papers
- Distilling foundation models for robust and efficient models in digital pathology [32.99044401004595]
We distilled a large foundation model into a smaller one, reducing the number of parameters by several orders of magnitude.<n>Our model, H0-mini, achieves nearly comparable performance to large FMs at a significantly reduced inference cost.<n>It is evaluated on several public benchmarks, achieving 3rd place on the HEST benchmark and 5th place on the EVA benchmark.
arXiv Detail & Related papers (2025-01-27T17:35:39Z) - Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy [104.48511402784763]
Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality.<n>We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
arXiv Detail & Related papers (2024-11-30T10:56:30Z) - LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content [62.816876067499415]
We propose LiveXiv: a scalable evolving live benchmark based on scientific ArXiv papers.<n>LiveXiv accesses domain-specific manuscripts at any given timestamp and proposes to automatically generate visual question-answer pairs.<n>We benchmark multiple open and proprietary Large Multi-modal Models (LMMs) on the first version of our benchmark, showing its challenging nature and exposing the models true abilities.
arXiv Detail & Related papers (2024-10-14T17:51:23Z) - Self-Data Distillation for Recovering Quality in Pruned Large Language Models [1.5665059604715017]
One-shot pruning results in significant quality degradation, particularly in tasks requiring multi-step reasoning.<n>To recover lost quality, supervised fine-tuning (SFT) is commonly applied, but it can lead to catastrophic forgetting.<n>In this work, we utilize self-data distilled fine-tuning to address these challenges.
arXiv Detail & Related papers (2024-10-13T19:53:40Z) - PUMA: margin-based data pruning [51.12154122266251]
We focus on data pruning, where some training samples are removed based on the distance to the model classification boundary (i.e., margin)
We propose PUMA, a new data pruning strategy that computes the margin using DeepFool.
We show that PUMA can be used on top of the current state-of-the-art methodology in robustness, and it is able to significantly improve the model performance unlike the existing data pruning strategies.
arXiv Detail & Related papers (2024-05-10T08:02:20Z) - No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance [68.18779562801762]
multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance.
Our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training paradigms remains to be found.
arXiv Detail & Related papers (2024-04-04T17:58:02Z) - Addressing Concept Shift in Online Time Series Forecasting: Detect-then-Adapt [37.98336090671441]
Concept textbfDrift textbfDetection antextbfD textbfAdaptation (D3A)
It first detects drifting conception and then aggressively adapts the current model to the drifted concepts after the detection for rapid adaption.
It helps mitigate the data distribution gap, a critical factor contributing to train-test performance inconsistency.
arXiv Detail & Related papers (2024-03-22T04:44:43Z) - Three-Stage Adjusted Regression Forecasting (TSARF) for Software Defect
Prediction [5.826476252191368]
Nonhomogeneous Poisson process (NHPP) SRGM are the most commonly employed models.
Increased model complexity presents a challenge in identifying robust and computationally efficient algorithms.
arXiv Detail & Related papers (2024-01-31T02:19:35Z) - A Training Rate and Survival Heuristic for Inference and Robustness Evaluation (TRASHFIRE) [1.622320874892682]
This work addresses the problem of understanding and predicting how particular model hyper- parameters influence the performance of a model in the presence of an adversary.
The proposed approach uses survival models, worst-case examples, and a cost-aware analysis to precisely and accurately reject a particular model change.
Using the proposed methodology, we show that ResNet is hopelessly against even the simplest of white box attacks.
arXiv Detail & Related papers (2024-01-24T19:12:37Z) - Introducing 3DCNN ResNets for ASD full-body kinematic assessment: a comparison with hand-crafted features [1.3499500088995464]
We propose a newly adapted 3DCNN ResNet from and compare it to widely used hand-crafted features for motor ASD assessment.
Specifically, we developed a virtual reality environment with multiple motor tasks and trained models using both approaches.
Results show the proposed model achieves a maximum accuracy of 85$pm$3%, outperforming state-of-the-art end-to-end models with short 1-to-3 minute samples.
arXiv Detail & Related papers (2023-11-24T14:56:36Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z) - From Sound Representation to Model Robustness [82.21746840893658]
We investigate the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network.
Averaged over various experiments on three environmental sound datasets, we found the ResNet-18 model outperforms other deep learning architectures.
arXiv Detail & Related papers (2020-07-27T17:30:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.