Towards Continual Egocentric Activity Recognition: A Multi-modal
Egocentric Activity Dataset for Continual Learning
- URL: http://arxiv.org/abs/2301.10931v1
- Date: Thu, 26 Jan 2023 04:32:00 GMT
- Title: Towards Continual Egocentric Activity Recognition: A Multi-modal
Egocentric Activity Dataset for Continual Learning
- Authors: Linfeng Xu, Qingbo Wu, Lili Pan, Fanman Meng, Hongliang Li, Chiyuan
He, Hanxin Wang, Shaoxu Cheng, Yu Dai
- Abstract summary: We present a multi-modal egocentric activity dataset for continual learning named UESTC-MMEA-CL.
It contains synchronized data of videos, accelerometers, and gyroscopes, for 32 types of daily activities, performed by 10 participants.
Results of egocentric activity recognition are reported when using separately, and jointly, three modalities: RGB, acceleration, and gyroscope.
- Score: 21.68009790164824
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid development of wearable cameras, a massive collection of
egocentric video for first-person visual perception becomes available. Using
egocentric videos to predict first-person activity faces many challenges,
including limited field of view, occlusions, and unstable motions. Observing
that sensor data from wearable devices facilitates human activity recognition,
multi-modal activity recognition is attracting increasing attention. However,
the deficiency of related dataset hinders the development of multi-modal deep
learning for egocentric activity recognition. Nowadays, deep learning in real
world has led to a focus on continual learning that often suffers from
catastrophic forgetting. But the catastrophic forgetting problem for egocentric
activity recognition, especially in the context of multiple modalities, remains
unexplored due to unavailability of dataset. In order to assist this research,
we present a multi-modal egocentric activity dataset for continual learning
named UESTC-MMEA-CL, which is collected by self-developed glasses integrating a
first-person camera and wearable sensors. It contains synchronized data of
videos, accelerometers, and gyroscopes, for 32 types of daily activities,
performed by 10 participants. Its class types and scale are compared with other
publicly available datasets. The statistical analysis of the sensor data is
given to show the auxiliary effects for different behaviors. And results of
egocentric activity recognition are reported when using separately, and
jointly, three modalities: RGB, acceleration, and gyroscope, on a base network
architecture. To explore the catastrophic forgetting in continual learning
tasks, four baseline methods are extensively evaluated with different
multi-modal combinations. We hope the UESTC-MMEA-CL can promote future studies
on continual learning for first-person activity recognition in wearable
applications.
Related papers
- Scaling Wearable Foundation Models [54.93979158708164]
We investigate the scaling properties of sensor foundation models across compute, data, and model size.
Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM.
Our results establish the scaling laws of LSM for tasks such as imputation, extrapolation, both across time and sensor modalities.
arXiv Detail & Related papers (2024-10-17T15:08:21Z) - Learning State-Aware Visual Representations from Audible Interactions [39.08554113807464]
We propose a self-supervised algorithm to learn representations from egocentric video data.
We use audio signals to identify moments of likely interactions which are conducive to better learning.
We validate these contributions extensively on two large-scale egocentric datasets.
arXiv Detail & Related papers (2022-09-27T17:57:13Z) - MECCANO: A Multimodal Egocentric Dataset for Humans Behavior
Understanding in the Industrial-like Domain [23.598727613908853]
We present MECCANO, a dataset of egocentric videos to study humans behavior understanding in industrial-like settings.
The multimodality is characterized by the presence of gaze signals, depth maps and RGB videos acquired simultaneously with a custom headset.
The dataset has been explicitly labeled for fundamental tasks in the context of human behavior understanding from a first person view.
arXiv Detail & Related papers (2022-09-19T00:52:42Z) - Classifying Human Activities using Machine Learning and Deep Learning
Techniques [0.0]
Human Activity Recognition (HAR) describes the machines ability to recognize human actions.
Challenge in HAR is to overcome the difficulties of separating human activities based on the given data.
Deep Learning techniques like Long Short-Term Memory (LSTM), Bi-Directional LS classifier, Recurrent Neural Network (RNN), and Gated Recurrent Unit (GRU) are trained.
Experiment results proved that the Linear Support Vector in machine learning and Gated Recurrent Unit in Deep Learning provided better accuracy for human activity recognition.
arXiv Detail & Related papers (2022-05-19T05:20:04Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - JRDB-Act: A Large-scale Multi-modal Dataset for Spatio-temporal Action,
Social Group and Activity Detection [54.696819174421584]
We introduce JRDB-Act, a multi-modal dataset that reflects a real distribution of human daily life actions in a university campus environment.
JRDB-Act has been densely annotated with atomic actions, comprises over 2.8M action labels.
JRDB-Act comes with social group identification annotations conducive to the task of grouping individuals based on their interactions in the scene.
arXiv Detail & Related papers (2021-06-16T14:43:46Z) - Anomaly Detection in Video via Self-Supervised and Multi-Task Learning [113.81927544121625]
Anomaly detection in video is a challenging computer vision problem.
In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level.
arXiv Detail & Related papers (2020-11-15T10:21:28Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Towards Deep Clustering of Human Activities from Wearables [21.198881633580797]
We develop an unsupervised end-to-end learning strategy for the fundamental problem of human activity recognition from wearables.
We show the effectiveness of our approach to jointly learn unsupervised representations for sensory data and generate cluster assignments with strong semantic correspondence to distinct human activities.
arXiv Detail & Related papers (2020-08-02T13:55:24Z) - The IKEA ASM Dataset: Understanding People Assembling Furniture through
Actions, Objects and Pose [108.21037046507483]
IKEA ASM is a three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human pose.
We benchmark prominent methods for video action recognition, object segmentation and human pose estimation tasks on this challenging dataset.
The dataset enables the development of holistic methods, which integrate multi-modal and multi-view data to better perform on these tasks.
arXiv Detail & Related papers (2020-07-01T11:34:46Z) - IMUTube: Automatic Extraction of Virtual on-body Accelerometry from
Video for Human Activity Recognition [12.91206329972949]
We introduce IMUTube, an automated processing pipeline to convert videos of human activity into virtual streams of IMU data.
These virtual IMU streams represent accelerometry at a wide variety of locations on the human body.
We show how the virtually-generated IMU data improves the performance of a variety of models on known HAR datasets.
arXiv Detail & Related papers (2020-05-29T21:50:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.