TAO-Amodal: A Benchmark for Tracking Any Object Amodally
- URL: http://arxiv.org/abs/2312.12433v3
- Date: Tue, 2 Apr 2024 18:09:22 GMT
- Title: TAO-Amodal: A Benchmark for Tracking Any Object Amodally
- Authors: Cheng-Yen Hsieh, Kaihua Chen, Achal Dave, Tarasha Khurana, Deva Ramanan,
- Abstract summary: We introduce TAO-Amodal, featuring 833 diverse categories in thousands of video sequences.
Our dataset includes textitamodal and modal bounding boxes for visible and partially or fully occluded objects, including those that are partially out of the camera frame.
- Score: 41.5396827282691
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Amodal perception, the ability to comprehend complete object structures from partial visibility, is a fundamental skill, even for infants. Its significance extends to applications like autonomous driving, where a clear understanding of heavily occluded objects is essential. However, modern detection and tracking algorithms often overlook this critical capability, perhaps due to the prevalence of \textit{modal} annotations in most benchmarks. To address the scarcity of amodal benchmarks, we introduce TAO-Amodal, featuring 833 diverse categories in thousands of video sequences. Our dataset includes \textit{amodal} and modal bounding boxes for visible and partially or fully occluded objects, including those that are partially out of the camera frame. We investigate the current lay of the land in both amodal tracking and detection by benchmarking state-of-the-art modal trackers and amodal segmentation methods. We find that existing methods, even when adapted for amodal tracking, struggle to detect and track objects under heavy occlusion. To mitigate this, we explore simple finetuning schemes that can increase the amodal tracking and detection metrics of occluded objects by 2.1\% and 3.3\%.
Related papers
- Amodal Ground Truth and Completion in the Wild [84.54972153436466]
We use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for partially occluded objects in real images.
This pipeline is used to construct an amodal completion evaluation benchmark, MP3D-Amodal, consisting of a variety of object categories and labels.
arXiv Detail & Related papers (2023-12-28T18:59:41Z) - AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous
Driving [10.928470926399566]
We introduce Amodal SynthDrive, a synthetic multi-task multi-modal amodal perception dataset.
The dataset provides multi-view camera images, 3D bounding boxes, LiDAR data, and odometry for 150 driving sequences.
Amodal SynthDrive supports multiple amodal scene understanding tasks including the introduced amodal depth estimation.
arXiv Detail & Related papers (2023-09-12T19:46:15Z) - OVTrack: Open-Vocabulary Multiple Object Tracking [64.73379741435255]
OVTrack is an open-vocabulary tracker capable of tracking arbitrary object classes.
It sets a new state-of-the-art on the large-scale, large-vocabulary TAO benchmark.
arXiv Detail & Related papers (2023-04-17T16:20:05Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z) - RLM-Tracking: Online Multi-Pedestrian Tracking Supported by Relative
Location Mapping [5.9669075749248774]
Problem of multi-object tracking is a fundamental computer vision research focus, widely used in public safety, transport, autonomous vehicles, robotics, and other regions involving artificial intelligence.
In this paper, we design a new multi-object tracker for the above issues that contains an object textbfRelative Location Mapping (RLM) model and textbfTarget Region Density (TRD) model.
The new tracker is more sensitive to the differences in position relationships between objects.
It can introduce low-score detection frames into different regions in real-time according to the density of object
arXiv Detail & Related papers (2022-10-19T11:37:14Z) - Amodal Cityscapes: A New Dataset, its Generation, and an Amodal Semantic
Segmentation Challenge Baseline [38.8592627329447]
We consider the task of amodal semantic segmentation and propose a generic way to generate datasets to train amodal semantic segmentation methods.
We use this approach to generate an amodal Cityscapes dataset, showing its applicability for amodal semantic segmentation in automotive environment perception.
arXiv Detail & Related papers (2022-06-01T14:38:33Z) - AutoLay: Benchmarking amodal layout estimation for autonomous driving [18.152206533685412]
AutoLay is a dataset and benchmark for amodal layout estimation from monocular images.
In addition to fine-grained attributes such as lanes, sidewalks, and vehicles, we also provide semantically annotated 3D point clouds.
arXiv Detail & Related papers (2021-08-20T08:21:11Z) - Learning to Track with Object Permanence [61.36492084090744]
We introduce an end-to-end trainable approach for joint object detection and tracking.
Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI, and MOT17 datasets.
arXiv Detail & Related papers (2021-03-26T04:43:04Z) - SoDA: Multi-Object Tracking with Soft Data Association [75.39833486073597]
Multi-object tracking (MOT) is a prerequisite for a safe deployment of self-driving cars.
We propose a novel approach to MOT that uses attention to compute track embeddings that encode dependencies between observed objects.
arXiv Detail & Related papers (2020-08-18T03:40:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.