ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize
Daily Activities of the Elderly
- URL: http://arxiv.org/abs/2003.01920v2
- Date: Wed, 11 Mar 2020 05:01:07 GMT
- Title: ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize
Daily Activities of the Elderly
- Authors: Jinhyeok Jang, Dohyung Kim, Cheonshu Park, Minsu Jang, Jaeyeon Lee,
Jaehong Kim
- Abstract summary: We introduce a new dataset called ETRI-Activity3D, focusing on the daily activities of the elderly in robot-view.
The proposed dataset contains 112,620 samples including RGB videos, depth maps, and skeleton sequences.
We also propose a novel network called four-stream adaptive CNN (FSA-CNN)
- Score: 6.597705088139007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning, based on which many modern algorithms operate, is well known
to be data-hungry. In particular, the datasets appropriate for the intended
application are difficult to obtain. To cope with this situation, we introduce
a new dataset called ETRI-Activity3D, focusing on the daily activities of the
elderly in robot-view. The major characteristics of the new dataset are as
follows: 1) practical action categories that are selected from the close
observation of the daily lives of the elderly; 2) realistic data collection,
which reflects the robot's working environment and service situations; and 3) a
large-scale dataset that overcomes the limitations of the current 3D activity
analysis benchmark datasets. The proposed dataset contains 112,620 samples
including RGB videos, depth maps, and skeleton sequences. During the data
acquisition, 100 subjects were asked to perform 55 daily activities.
Additionally, we propose a novel network called four-stream adaptive CNN
(FSA-CNN). The proposed FSA-CNN has three main properties: robustness to
spatio-temporal variations, input-adaptive activation function, and extension
of the conventional two-stream approach. In the experiment section, we
confirmed the superiority of the proposed FSA-CNN using NTU RGB+D and
ETRI-Activity3D. Further, the domain difference between both groups of age was
verified experimentally. Finally, the extension of FSA-CNN to deal with the
multimodal data was investigated.
Related papers
- Dynamic Data Pruning for Automatic Speech Recognition [58.95758272440217]
We introduce Dynamic Data Pruning for ASR (DDP-ASR), which offers fine-grained pruning granularities specifically tailored for speech-related datasets.
Our experiments show that DDP-ASR can save up to 1.6x training time with negligible performance loss.
arXiv Detail & Related papers (2024-06-26T14:17:36Z) - ActiveAnno3D -- An Active Learning Framework for Multi-Modal 3D Object
Detection [15.885344033374393]
We propose ActiveAnno3D, an active learning framework to select data samples for labeling.
We perform experiments and ablation studies with BEVFusion and PV-RCNN on the nuScenes and TUM Traffic Intersection dataset.
We integrate our active learning framework into the proAnno labeling tool to enable AI-assisted data selection and labeling.
arXiv Detail & Related papers (2024-02-05T17:52:58Z) - Towards More Practical Group Activity Detection: A New Benchmark and Model [61.39427407758131]
Group activity detection (GAD) is the task of identifying members of each group and classifying the activity of the group at the same time in a video.
We present a new dataset, dubbed Caf'e, which presents more practical scenarios and metrics.
We also propose a new GAD model that deals with an unknown number of groups and latent group members efficiently and effectively.
arXiv Detail & Related papers (2023-12-05T16:48:17Z) - Scaling Data Generation in Vision-and-Language Navigation [116.95534559103788]
We propose an effective paradigm for generating large-scale data for learning.
We apply 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs.
Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning.
arXiv Detail & Related papers (2023-07-28T16:03:28Z) - TADIL: Task-Agnostic Domain-Incremental Learning through Task-ID
Inference using Transformer Nearest-Centroid Embeddings [0.0]
We propose a novel pipeline for identifying tasks in domain-incremental learning scenarios without supervision.
We leverage the lightweight computational requirements of the pipeline to devise an algorithm that decides in an online fashion when to learn a new task.
arXiv Detail & Related papers (2023-06-21T00:55:02Z) - Going beyond research datasets: Novel intent discovery in the industry
setting [60.90117614762879]
This paper proposes methods to improve the intent discovery pipeline deployed in a large e-commerce platform.
We show the benefit of pre-training language models on in-domain data: both self-supervised and with weak supervision.
We also devise the best method to utilize the conversational structure (i.e., question and answer) of real-life datasets during fine-tuning for clustering tasks, which we call Conv.
arXiv Detail & Related papers (2023-05-09T14:21:29Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Feature Extraction for Machine Learning-based Intrusion Detection in IoT
Networks [6.6147550436077776]
This paper aims to discover whether Feature Reduction (FR) and Machine Learning (ML) techniques can be generalised across various datasets.
The detection accuracy of three Feature Extraction (FE) algorithms; Principal Component Analysis (PCA), Auto-encoder (AE), and Linear Discriminant Analysis (LDA) is evaluated.
arXiv Detail & Related papers (2021-08-28T23:52:18Z) - Diminishing Uncertainty within the Training Pool: Active Learning for
Medical Image Segmentation [6.3858225352615285]
We explore active learning for the task of segmentation of medical imaging data sets.
We propose three new strategies for active learning: increasing frequency of uncertain data to bias the training data set, using mutual information among the input images as a regularizer and adaptation of Dice log-likelihood for Stein variational gradient descent (SVGD)
The results indicate an improvement in terms of data reduction by achieving full accuracy while only using 22.69 % and 48.85 % of the available data for each dataset, respectively.
arXiv Detail & Related papers (2021-01-07T01:55:48Z) - 2nd Place Scheme on Action Recognition Track of ECCV 2020 VIPriors
Challenges: An Efficient Optical Flow Stream Guided Framework [57.847010327319964]
We propose a data-efficient framework that can train the model from scratch on small datasets.
Specifically, by introducing a 3D central difference convolution operation, we proposed a novel C3D neural network-based two-stream framework.
It is proved that our method can achieve a promising result even without a pre-trained model on large scale datasets.
arXiv Detail & Related papers (2020-08-10T09:50:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.