WEAR: An Outdoor Sports Dataset for Wearable and Egocentric Activity
Recognition
- URL: http://arxiv.org/abs/2304.05088v3
- Date: Tue, 21 Nov 2023 16:35:26 GMT
- Title: WEAR: An Outdoor Sports Dataset for Wearable and Egocentric Activity
Recognition
- Authors: Marius Bock, Hilde Kuehne, Kristof Van Laerhoven, Michael Moeller
- Abstract summary: WEAR is an outdoor sports dataset for both vision- and inertial-based human activity recognition (HAR)
The dataset comprises data from 18 participants performing a total of 18 different workout activities with untrimmed inertial (acceleration) and camera (egocentric video) data recorded at 10 different outside locations.
- Score: 25.113458430281632
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Though research has shown the complementarity of camera- and inertial-based
data, datasets which offer both egocentric video and inertial-based sensor data
remain scarce. In this paper, we introduce WEAR, an outdoor sports dataset for
both vision- and inertial-based human activity recognition (HAR). The dataset
comprises data from 18 participants performing a total of 18 different workout
activities with untrimmed inertial (acceleration) and camera (egocentric video)
data recorded at 10 different outside locations. Unlike previous egocentric
datasets, WEAR provides a challenging prediction scenario marked by purposely
introduced activity variations as well as an overall small information overlap
across modalities. Benchmark results obtained using each modality separately
show that each modality interestingly offers complementary strengths and
weaknesses in their prediction performance. Further, in light of the recent
success of temporal action localization models following the architecture
design of the ActionFormer, we demonstrate their versatility by applying them
in a plain fashion using vision, inertial and combined (vision + inertial)
features as input. Results demonstrate both the applicability of vision-based
temporal action localization models for inertial data and fusing both
modalities by means of simple concatenation, with the combined approach (vision
+ inertial features) being able to produce the highest mean average precision
and close-to-best F1-score. The dataset and code to reproduce experiments is
publicly available via: https://mariusbock.github.io/wear/
Related papers
- ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Learning Fine-grained View-Invariant Representations from Unpaired
Ego-Exo Videos via Temporal Alignment [71.16699226211504]
We propose to learn fine-grained action features that are invariant to the viewpoints by aligning egocentric and exocentric videos in time.
To this end, we propose AE2, a self-supervised embedding approach with two key designs.
For evaluation, we establish a benchmark for fine-grained video understanding in the ego-exo context.
arXiv Detail & Related papers (2023-06-08T19:54:08Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - Do I Have Your Attention: A Large Scale Engagement Prediction Dataset
and Baselines [9.896915478880635]
The degree of concentration, enthusiasm, optimism, and passion displayed by individual(s) while interacting with a machine is referred to as user engagement'
To create engagement prediction systems that can work in real-world conditions, it is quintessential to learn from rich, diverse datasets.
Large scale multi-faceted engagement in the wild dataset EngageNet is proposed.
arXiv Detail & Related papers (2023-02-01T13:25:54Z) - AU-Aware Vision Transformers for Biased Facial Expression Recognition [17.00557858587472]
We experimentally show that the naive joint training of multiple FER datasets is harmful to the FER performance of individual datasets.
We propose a simple yet conceptually-new framework, AU-aware Vision Transformer (AU-ViT)
Our AU-ViT achieves state-of-the-art performance on three popular datasets, namely 91.10% on RAF-DB, 65.59% on AffectNet, and 90.15% on FERPlus.
arXiv Detail & Related papers (2022-11-12T08:58:54Z) - Mitigating Representation Bias in Action Recognition: Algorithms and
Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects.
We tackle this problem from two different angles: algorithm and dataset.
We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - SelfHAR: Improving Human Activity Recognition through Self-training with
Unlabeled Data [9.270269467155547]
SelfHAR is a semi-supervised model that learns to leverage unlabeled datasets to complement small labeled datasets.
Our approach combines teacher-student self-training, which distills the knowledge of unlabeled and labeled datasets.
SelfHAR is data-efficient, reaching similar performance using up to 10 times less labeled data compared to supervised approaches.
arXiv Detail & Related papers (2021-02-11T15:40:35Z) - Invariant Feature Learning for Sensor-based Human Activity Recognition [11.334750079923428]
We present an invariant feature learning framework (IFLF) that extracts common information shared across subjects and devices.
Experiments demonstrated that IFLF is effective in handling both subject and device diversion across popular open datasets and an in-house dataset.
arXiv Detail & Related papers (2020-12-14T21:56:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.