Track Anything Behind Everything: Zero-Shot Amodal Video Object Segmentation
- URL: http://arxiv.org/abs/2411.19210v1
- Date: Thu, 28 Nov 2024 15:30:56 GMT
- Title: Track Anything Behind Everything: Zero-Shot Amodal Video Object Segmentation
- Authors: Finlay G. C. Hudson, William A. P. Smith,
- Abstract summary: Track Anything Behind Everything (TABE) is a novel dataset, pipeline, and evaluation framework for zero-shot amodal completion from visible masks.<n>Unlike existing methods that require pretrained class labels, our approach uses a single query mask from the first frame where the object is visible.<n>Our dataset, TABE-51 provides highly accurate ground truth amodal segmentation masks without the need for human estimation or 3D reconstruction.
- Score: 15.272149101494005
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Track Anything Behind Everything (TABE), a novel dataset, pipeline, and evaluation framework for zero-shot amodal completion from visible masks. Unlike existing methods that require pretrained class labels, our approach uses a single query mask from the first frame where the object is visible, enabling flexible, zero-shot inference. Our dataset, TABE-51 provides highly accurate ground truth amodal segmentation masks without the need for human estimation or 3D reconstruction. Our TABE pipeline is specifically designed to handle amodal completion, even in scenarios where objects are completely occluded. We also introduce a specialised evaluation framework that isolates amodal completion performance, free from the influence of traditional visual segmentation metrics.
Related papers
- Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA [49.10341970643037]
Amodal segmentation aims to infer the complete shape of occluded objects, even when the occluded region's appearance is unavailable.
Current amodal segmentation methods lack the capability to interact with users through text input.
We propose a novel task named amodal reasoning segmentation, aiming to predict the complete amodal shape of occluded objects.
arXiv Detail & Related papers (2025-03-13T10:08:18Z) - Segment Anything, Even Occluded [35.150696061791805]
SAMEO is a novel framework that adapts the Segment Anything Model (SAM) as a versatile mask decoder.
We introduce Amodal-LVIS, a large-scale synthetic dataset comprising 300K images derived from the modal LVIS and LVVIS datasets.
Our results demonstrate that our approach, when trained on the newly extended dataset, achieves remarkable zero-shot performance on both COCOA-cls and D2SA benchmarks.
arXiv Detail & Related papers (2025-03-08T16:14:57Z) - LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Object-level Scene Deocclusion [92.39886029550286]
We present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, for object-level scene deocclusion.
To train PACO, we create a large-scale dataset with 500k samples to enable self-supervised learning.
Experiments on COCOA and various real-world scenes demonstrate the superior capability of PACO for scene deocclusion, surpassing the state of the arts by a large margin.
arXiv Detail & Related papers (2024-06-11T20:34:10Z) - Amodal Ground Truth and Completion in the Wild [84.54972153436466]
We use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for partially occluded objects in real images.
This pipeline is used to construct an amodal completion evaluation benchmark, MP3D-Amodal, consisting of a variety of object categories and labels.
arXiv Detail & Related papers (2023-12-28T18:59:41Z) - TAO-Amodal: A Benchmark for Tracking Any Object Amodally [41.5396827282691]
We introduce TAO-Amodal, featuring 833 diverse categories in thousands of video sequences.
Our dataset includes textitamodal and modal bounding boxes for visible and partially or fully occluded objects, including those that are partially out of the camera frame.
arXiv Detail & Related papers (2023-12-19T18:58:40Z) - Robust Visual Tracking by Segmentation [103.87369380021441]
Estimating the target extent poses a fundamental challenge in visual object tracking.
We propose a segmentation-centric tracking pipeline that produces a highly accurate segmentation mask.
Our tracker is able to better learn a target representation that clearly differentiates the target in the scene from background content.
arXiv Detail & Related papers (2022-03-21T17:59:19Z) - Self-Supervised Scene De-occlusion [186.89979151728636]
This paper investigates the problem of scene de-occlusion, which aims to recover the underlying occlusion ordering and complete the invisible parts of occluded objects.
We make the first attempt to address the problem through a novel and unified framework that recovers hidden scene structures without ordering and amodal annotations as supervisions.
Based on PCNet-M and PCNet-C, we devise a novel inference scheme to accomplish scene de-occlusion, via progressive ordering recovery, amodal completion and content completion.
arXiv Detail & Related papers (2020-04-06T16:31:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.