Related papers: Detecting Informative Channels: ActionFormer

Detecting Informative Channels: ActionFormer

URL: http://arxiv.org/abs/2505.20739v1
Date: Tue, 27 May 2025 05:29:02 GMT
Title: Detecting Informative Channels: ActionFormer
Authors: Kunpeng Zhao, Asahi Miyazaki, Tsuyoshi Okita,
Abstract summary: ActionFormer gives us additional outputs which detect the border of the activities as well as the activity labels.<n>We analyze this extensively in terms of deep learning architectures.<n>Our method achieves substantial improvement of a 16.01% in terms of average mAP for inertial data.
Score: 3.1976901430982063
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human Activity Recognition (HAR) has recently witnessed advancements with Transformer-based models. Especially, ActionFormer shows us a new perspectives for HAR in the sense that this approach gives us additional outputs which detect the border of the activities as well as the activity labels. ActionFormer was originally proposed with its input as image/video. However, this was converted to with its input as sensor signals as well. We analyze this extensively in terms of deep learning architectures. Based on the report of high temporal dynamics which limits the model's ability to capture subtle changes effectively and of the interdependencies between the spatial and temporal features. We propose the modified ActionFormer which will decrease these defects for sensor signals. The key to our approach lies in accordance with the Sequence-and-Excitation strategy to minimize the increase in additional parameters and opt for the swish activation function to retain the information about direction in the negative range. Experiments on the WEAR dataset show that our method achieves substantial improvement of a 16.01\% in terms of average mAP for inertial data.

Related papers

Sensor Generalization for Adaptive Sensing in Event-based Object Detection via Joint Distribution Training [18.51701989107632]
Bio-inspired event cameras have recently attracted significant research due to their asynchronous and low-latency capabilities.<n>There is a gap in the variability of available data and a lack of extensive analysis of the parameters characterizing their signals.<n>This paper addresses these issues by providing readers with an in-depth understanding of how intrinsic parameters affect the performance of a model trained on event data, specifically for object detection.
arXiv Detail & Related papers (2026-02-26T18:57:52Z)
Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture. We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z)
Large Language Model-Guided Semantic Alignment for Human Activity Recognition [14.934473748133422]
Human Activity Recognition (HAR) using Inertial Measurement Unit (IMU) sensors is critical for applications in healthcare, safety, and industrial production.<n> variation in activity patterns, device types, and sensor placements create distribution gaps across datasets.<n>We propose LanHAR, a novel system that generates semantic interpretations of sensor readings and activity labels for cross-dataset HAR.
arXiv Detail & Related papers (2024-09-12T22:57:29Z)
Sensor Data Augmentation from Skeleton Pose Sequences for Improving Human Activity Recognition [5.669438716143601]
Human Activity Recognition (HAR) has not fully capitalized on the proliferation of deep learning. We propose a novel approach to improve wearable sensor-based HAR by introducing a pose-to-sensor network model. Our contributions include the integration of simultaneous training, direct pose-to-sensor generation, and a comprehensive evaluation on the MM-Fit dataset.
arXiv Detail & Related papers (2024-04-25T10:13:18Z)
Semi-supervised Open-World Object Detection [74.95267079505145]
We introduce a more realistic formulation, named semi-supervised open-world detection (SS-OWOD) We demonstrate that the performance of the state-of-the-art OWOD detector dramatically deteriorates in the proposed SS-OWOD setting. Our experiments on 4 datasets including MS COCO, PASCAL, Objects365 and DOTA demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-02-25T07:12:51Z)
DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding. Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition. We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z)
Towards High-Quality Temporal Action Detection with Sparse Proposals [14.923321325749196]
Temporal Action Detection aims to localize the temporal segments containing human action instances and predict the action categories. We introduce Sparse Proposals to interact with the hierarchical features. Experiments demonstrate the effectiveness of our method, especially under high tIoU thresholds.
arXiv Detail & Related papers (2021-09-18T06:15:19Z)
Learning to Discriminate Information for Online Action Detection: Analysis and Application [32.4410197207228]
We propose a novel recurrent unit, named Information Discrimination Unit (IDU), which explicitly discriminates the information relevancy between an ongoing action and others. We also present a new recurrent unit, called Information Integration Unit (IIU), for action anticipation. Our IIU exploits the outputs from IDU as pseudo action labels as well as RGB frames to learn enriched features of observed actions effectively.
arXiv Detail & Related papers (2021-09-08T01:51:51Z)
Robust and Accurate Object Detection via Adversarial Learning [111.36192453882195]
This work augments the fine-tuning stage for object detectors by exploring adversarial examples. Our approach boosts the performance of state-of-the-art EfficientDets by +1.1 mAP on the object detection benchmark.
arXiv Detail & Related papers (2021-03-23T19:45:26Z)
Spatial-Temporal Alignment Network for Action Recognition and Detection [80.19235282200697]
This paper studies how to introduce viewpoint-invariant feature representations that can help action recognition and detection. We propose a novel Spatial-Temporal Alignment Network (STAN) that aims to learn geometric invariant representations for action recognition and action detection. We test our STAN model extensively on AVA, Kinetics-400, AVA-Kinetics, Charades, and Charades-Ego datasets.
arXiv Detail & Related papers (2020-12-04T06:23:40Z)
DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets. We propose an efficient and effective data augmentation method called DecAug for HOI detection. Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.