Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation
- URL: http://arxiv.org/abs/2104.04782v1
- Date: Sat, 10 Apr 2021 14:39:44 GMT
- Title: Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation
- Authors: Tianfei Zhou, Jianwu Li, Xueyi Li, Ling Shao
- Abstract summary: This paper addresses the task of unsupervised video multi-object segmentation.
We introduce a novel approach for more accurate and efficient unseen-temporal segmentation.
We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
- Score: 79.6596425920849
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper addresses the task of unsupervised video multi-object
segmentation. Current approaches follow a two-stage paradigm: 1) detect object
proposals using pre-trained Mask R-CNN, and 2) conduct generic feature matching
for temporal association using re-identification techniques. However, the
generic features, widely used in both stages, are not reliable for
characterizing unseen objects, leading to poor generalization. To address this,
we introduce a novel approach for more accurate and efficient spatio-temporal
segmentation. In particular, to address \textbf{instance discrimination}, we
propose to combine foreground region estimation and instance grouping together
in one network, and additionally introduce temporal guidance for segmenting
each frame, enabling more accurate object discovery. For \textbf{temporal
association}, we complement current video object segmentation architectures
with a discriminative appearance model, capable of capturing more fine-grained
target-specific information. Given object proposals from the instance
discrimination network, three essential strategies are adopted to achieve
accurate segmentation: 1) target-specific tracking using a memory-augmented
appearance model; 2) target-agnostic verification to trace possible tracklets
for the proposal; 3) adaptive memory updating using the verified segments. We
evaluate the proposed approach on DAVIS$_{17}$ and YouTube-VIS, and the results
demonstrate that it outperforms state-of-the-art methods both in segmentation
accuracy and inference speed.
Related papers
- Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - ISAR: A Benchmark for Single- and Few-Shot Object Instance Segmentation
and Re-Identification [24.709695178222862]
We propose ISAR, a benchmark and baseline method for single- and few-shot object identification.
We provide a semi-synthetic dataset of video sequences with ground-truth semantic annotations.
Our benchmark aligns with the emerging research trend of unifying Multi-Object Tracking, Video Object, and Re-identification.
arXiv Detail & Related papers (2023-11-05T18:51:33Z) - Segment Anything Meets Point Tracking [116.44931239508578]
This paper presents a novel method for point-centric interactive video segmentation, empowered by SAM and long-term point tracking.
We highlight the merits of point-based tracking through direct evaluation on the zero-shot open-world Unidentified Video Objects (UVO) benchmark.
Our experiments on popular video object segmentation and multi-object segmentation tracking benchmarks, including DAVIS, YouTube-VOS, and BDD100K, suggest that a point-based segmentation tracker yields better zero-shot performance and efficient interactions.
arXiv Detail & Related papers (2023-07-03T17:58:01Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - SegTAD: Precise Temporal Action Detection via Semantic Segmentation [65.01826091117746]
We formulate the task of temporal action detection in a novel perspective of semantic segmentation.
Owing to the 1-dimensional property of TAD, we are able to convert the coarse-grained detection annotations to fine-grained semantic segmentation annotations for free.
We propose an end-to-end framework SegTAD composed of a 1D semantic segmentation network (1D-SSN) and a proposal detection network (PDN)
arXiv Detail & Related papers (2022-03-03T06:52:13Z) - Prototypical Cross-Attention Networks for Multiple Object Tracking and
Segmentation [95.74244714914052]
Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes.
We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich-temporal information online.
PCAN outperforms current video instance tracking and segmentation competition winners on Youtube-VIS and BDD100K datasets.
arXiv Detail & Related papers (2021-06-22T17:57:24Z) - An Exploration of Target-Conditioned Segmentation Methods for Visual
Object Trackers [24.210580784051277]
We show how to transform a bounding-box tracker into a segmentation tracker.
Our analysis shows that such methods allow trackers to compete with recently proposed segmentation trackers.
arXiv Detail & Related papers (2020-08-03T16:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.