Toyota Smarthome Untrimmed: Real-World Untrimmed Videos for Activity
Detection
- URL: http://arxiv.org/abs/2010.14982v2
- Date: Fri, 10 Jun 2022 10:50:48 GMT
- Title: Toyota Smarthome Untrimmed: Real-World Untrimmed Videos for Activity
Detection
- Authors: Rui Dai, Srijan Das, Saurav Sharma, Luca Minciullo, Lorenzo Garattoni,
Francois Bremond, Gianpiero Francesca
- Abstract summary: We introduce a new untrimmed daily-living dataset that features several real-world challenges: Toyota Smarthome Untrimmed.
The dataset contains dense annotations including elementary, composite activities and activities involving interactions with objects.
We show that current state-of-the-art methods fail to achieve satisfactory performance on the TSU dataset.
We propose a new baseline method for activity detection to tackle the novel challenges provided by our dataset.
- Score: 6.682959425576476
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Designing activity detection systems that can be successfully deployed in
daily-living environments requires datasets that pose the challenges typical of
real-world scenarios. In this paper, we introduce a new untrimmed daily-living
dataset that features several real-world challenges: Toyota Smarthome Untrimmed
(TSU). TSU contains a wide variety of activities performed in a spontaneous
manner. The dataset contains dense annotations including elementary, composite
activities and activities involving interactions with objects. We provide an
analysis of the real-world challenges featured by our dataset, highlighting the
open issues for detection algorithms. We show that current state-of-the-art
methods fail to achieve satisfactory performance on the TSU dataset. Therefore,
we propose a new baseline method for activity detection to tackle the novel
challenges provided by our dataset. This method leverages one modality (i.e.
optic flow) to generate the attention weights to guide another modality (i.e
RGB) to better detect the activity boundaries. This is particularly beneficial
to detect activities characterized by high temporal variance. We show that the
method we propose outperforms state-of-the-art methods on TSU and on another
popular challenging dataset, Charades.
Related papers
- Action Spotting and Precise Event Detection in Sports: Datasets, Methods, and Challenges [5.747955930615445]
Video event detection is central to modern sports analytics, enabling automated understanding of key moments for performance evaluation, content creation, and tactical feedback.<n>While deep learning has significantly advanced tasks, existing surveys often overlook the fine-grained temporal demands and domain-specific challenges posed by sports.<n>This survey first provides a clear conceptual distinction between TAL, AS, and PES, then introduces a methods-based taxonomy covering recent deep learning approaches for AS and PES.<n>We outline open challenges and future directions toward more temporally precise, generalizable, and practical event spotting in sports video analysis.
arXiv Detail & Related papers (2025-05-06T22:02:30Z) - Oriented Tiny Object Detection: A Dataset, Benchmark, and Dynamic Unbiased Learning [51.170479006249195]
We introduce a new dataset, benchmark, and a dynamic coarse-to-fine learning scheme in this study.
Our proposed dataset, AI-TOD-R, features the smallest object sizes among all oriented object detection datasets.
We present a benchmark spanning a broad range of detection paradigms, including both fully-supervised and label-efficient approaches.
arXiv Detail & Related papers (2024-12-16T09:14:32Z) - Towards Student Actions in Classroom Scenes: New Dataset and Baseline [43.268586725768465]
We present a new multi-label student action video (SAV) dataset for complex classroom scenes.
The dataset consists of 4,324 carefully trimmed video clips from 758 different classrooms, each labeled with 15 different actions displayed by students in classrooms.
arXiv Detail & Related papers (2024-09-02T03:44:24Z) - Deep Learning-Based Object Pose Estimation: A Comprehensive Survey [73.74933379151419]
We discuss the recent advances in deep learning-based object pose estimation.
Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks.
arXiv Detail & Related papers (2024-05-13T14:44:22Z) - DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments [28.23284296418962]
Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments.
Existing datasets for developing ZSON algorithms lack consideration of dynamic obstacles, object diversity, and scene texts.
We propose a dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments (DOZE)
DOZE comprises ten high-fidelity 3D scenes with over 18k tasks, aiming to mimic complex, dynamic real-world scenarios.
arXiv Detail & Related papers (2024-02-29T10:03:57Z) - Accelerating Exploration with Unlabeled Prior Data [66.43995032226466]
We study how prior data without reward labels may be used to guide and accelerate exploration for an agent solving a new sparse reward task.
We propose a simple approach that learns a reward model from online experience, labels the unlabeled prior data with optimistic rewards, and then uses it concurrently alongside the online data for downstream policy and critic optimization.
arXiv Detail & Related papers (2023-11-09T00:05:17Z) - Cross-Domain HAR: Few Shot Transfer Learning for Human Activity
Recognition [0.2944538605197902]
We present an approach for economic use of publicly available labeled HAR datasets for effective transfer learning.
We introduce a novel transfer learning framework, Cross-Domain HAR, which follows the teacher-student self-training paradigm.
We demonstrate the effectiveness of our approach for practically relevant few shot activity recognition scenarios.
arXiv Detail & Related papers (2023-10-22T19:13:25Z) - Single-Modal Entropy based Active Learning for Visual Question Answering [75.1682163844354]
We address Active Learning in the multi-modal setting of Visual Question Answering (VQA)
In light of the multi-modal inputs, image and question, we propose a novel method for effective sample acquisition.
Our novel idea is simple to implement, cost-efficient, and readily adaptable to other multi-modal tasks.
arXiv Detail & Related papers (2021-10-21T05:38:45Z) - Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z) - ZSTAD: Zero-Shot Temporal Activity Detection [107.63759089583382]
We propose a novel task setting called zero-shot temporal activity detection (ZSTAD), where activities that have never been seen in training can still be detected.
We design an end-to-end deep network based on R-C3D as the architecture for this solution.
Experiments on both the THUMOS14 and the Charades datasets show promising performance in terms of detecting unseen activities.
arXiv Detail & Related papers (2020-03-12T02:40:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.