EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset
- URL: http://arxiv.org/abs/2301.03213v5
- Date: Sun, 1 Oct 2023 22:54:53 GMT
- Title: EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset
- Authors: Hao Tang, Kevin Liang, Matt Feiszli, Weiyao Wang
- Abstract summary: Embodied tracking is a key component to many egocentric vision problems.
EgoTracks is a new dataset for long-term egocentric visual object tracking.
We show improvements that can be made to a STARK tracker to significantly increase its performance on egocentric data.
- Score: 19.496721051685135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual object tracking is a key component to many egocentric vision problems.
However, the full spectrum of challenges of egocentric tracking faced by an
embodied AI is underrepresented in many existing datasets; these tend to focus
on relatively short, third-person videos. Egocentric video has several
distinguishing characteristics from those commonly found in past datasets:
frequent large camera motions and hand interactions with objects commonly lead
to occlusions or objects exiting the frame, and object appearance can change
rapidly due to widely different points of view, scale, or object states.
Embodied tracking is also naturally long-term, and being able to consistently
(re-)associate objects to their appearances and disappearances over as long as
a lifetime is critical. Previous datasets under-emphasize this re-detection
problem, and their "framed" nature has led to adoption of various
spatiotemporal priors that we find do not necessarily generalize to egocentric
video. We thus introduce EgoTracks, a new dataset for long-term egocentric
visual object tracking. Sourced from the Ego4D dataset, this new dataset
presents a significant challenge to recent state-of-the-art single-object
tracking models, which we find score poorly on traditional tracking metrics for
our new dataset, compared to popular benchmarks. We further show improvements
that can be made to a STARK tracker to significantly increase its performance
on egocentric data, resulting in a baseline model we call EgoSTARK. We publicly
release our annotations and benchmark, hoping our dataset leads to further
advancements in tracking.
Related papers
- Ego3DT: Tracking Every 3D Object in Ego-centric Videos [20.96550148331019]
This paper introduces a novel zero-shot approach for the 3D reconstruction and tracking of all objects from the ego-centric video.
We present Ego3DT, a novel framework that initially identifies and extracts detection and segmentation information of objects within the ego environment.
We have also innovated a dynamic hierarchical association mechanism for creating stable 3D tracking trajectories of objects in ego-centric videos.
arXiv Detail & Related papers (2024-10-11T05:02:31Z) - 3D-Aware Instance Segmentation and Tracking in Egocentric Videos [107.10661490652822]
Egocentric videos present unique challenges for 3D scene understanding.
This paper introduces a novel approach to instance segmentation and tracking in first-person video.
By incorporating spatial and temporal cues, we achieve superior performance compared to state-of-the-art 2D approaches.
arXiv Detail & Related papers (2024-08-19T10:08:25Z) - Tracking Reflected Objects: A Benchmark [12.770787846444406]
We introduce TRO, a benchmark specifically for Tracking Reflected Objects.
TRO includes 200 sequences with around 70,000 frames, each carefully annotated with bounding boxes.
To provide a stronger baseline, we propose a new tracker, HiP-HaTrack, which uses hierarchical features to improve performance.
arXiv Detail & Related papers (2024-07-07T02:22:45Z) - EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object
Understanding [11.9023437362986]
EgoObjects is a large-scale egocentric dataset for fine-grained object understanding.
Pilot version contains over 9K videos collected by 250 participants from 50+ countries using 4 wearable devices.
EgoObjects also annotates each object with an instance-level identifier.
arXiv Detail & Related papers (2023-09-15T23:55:43Z) - DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse
Motion [56.1428110894411]
We propose a large-scale dataset for multi-human tracking, where humans have similar appearance, diverse motion and extreme articulation.
As the dataset contains mostly group dancing videos, we name it "DanceTrack"
We benchmark several state-of-the-art trackers on our dataset and observe a significant performance drop on DanceTrack when compared against existing benchmarks.
arXiv Detail & Related papers (2021-11-29T16:49:06Z) - Learning Target Candidate Association to Keep Track of What Not to Track [100.80610986625693]
We propose to keep track of distractor objects in order to continue tracking the target.
To tackle the problem of lacking ground-truth correspondences between distractor objects in visual tracking, we propose a training strategy that combines partial annotations with self-supervision.
Our tracker sets a new state-of-the-art on six benchmarks, achieving an AUC score of 67.2% on LaSOT and a +6.1% absolute gain on the OxUvA long-term dataset.
arXiv Detail & Related papers (2021-03-30T17:58:02Z) - Learning to Track with Object Permanence [61.36492084090744]
We introduce an end-to-end trainable approach for joint object detection and tracking.
Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI, and MOT17 datasets.
arXiv Detail & Related papers (2021-03-26T04:43:04Z) - SoDA: Multi-Object Tracking with Soft Data Association [75.39833486073597]
Multi-object tracking (MOT) is a prerequisite for a safe deployment of self-driving cars.
We propose a novel approach to MOT that uses attention to compute track embeddings that encode dependencies between observed objects.
arXiv Detail & Related papers (2020-08-18T03:40:25Z) - TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.