Related papers: Appearance-based Refinement for Object-Centric Motion Segmentation

Appearance-based Refinement for Object-Centric Motion Segmentation

URL: http://arxiv.org/abs/2312.11463v1
Date: Mon, 18 Dec 2023 18:59:51 GMT
Title: Appearance-based Refinement for Object-Centric Motion Segmentation
Authors: Junyu Xie, Weidi Xie, Andrew Zisserman
Abstract summary: We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals. Our approach involves a simple selection mechanism that identifies accurate flow-predicted masks as exemplars. Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTubeVOS, SegTrackv2, and FBMS-59.
Score: 95.80420062679104
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The goal of this paper is to discover, segment, and track independently moving objects in complex visual scenes. Previous approaches have explored the use of optical flow for motion segmentation, leading to imperfect predictions due to partial motion, background distraction, and object articulations and interactions. To address this issue, we introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals. Our approach involves a simple selection mechanism that identifies accurate flow-predicted masks as exemplars, and an object-centric architecture that refines problematic masks based on exemplar information. The model is pre-trained on synthetic data and then adapted to real-world videos in a self-supervised manner, eliminating the need for human annotations. Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTubeVOS, SegTrackv2, and FBMS-59. We achieve competitive performance on single-object segmentation, while significantly outperforming existing models on the more challenging problem of multi-object segmentation. Finally, we investigate the benefits of using our model as a prompt for a per-frame Segment Anything Model.

Related papers

Segment Any Motion in Videos [80.72424676419755]
We propose a novel approach for moving object segmentation that combines long-range trajectory motion cues with DINO-based semantic features. Our model employs Spatio-Temporal Trajectory Attention and Motion-Semantic Decoupled Embedding to prioritize motion while integrating semantic support.
arXiv Detail & Related papers (2025-03-28T09:34:11Z)
Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation [49.113131249753714]
We propose an efficient algorithm, termed MTNet, which concurrently exploits motion and temporal cues. MTNet is devised by effectively merging appearance and motion features during the feature extraction process within encoders. We employ a cascade of decoders all feature levels across all feature levels to optimally exploit the derived features.
arXiv Detail & Related papers (2025-01-14T03:15:46Z)
What is Point Supervision Worth in Video Instance Segmentation? [119.71921319637748]
Video instance segmentation (VIS) is a challenging vision task that aims to detect, segment, and track objects in videos. We reduce the human annotations to only one point for each object in a video frame during training, and obtain high-quality mask predictions close to fully supervised models. Comprehensive experiments on three VIS benchmarks demonstrate competitive performance of the proposed framework, nearly matching fully supervised methods.
arXiv Detail & Related papers (2024-04-01T17:38:25Z)
LOCATE: Self-supervised Object Discovery via Flow-guided Graph-cut and Bootstrapped Self-training [13.985488693082981]
We propose a self-supervised object discovery approach that leverages motion and appearance information to produce high-quality object segmentation masks. We demonstrate the effectiveness of our approach, named LOCATE, on multiple standard video object segmentation, image saliency detection, and object segmentation benchmarks.
arXiv Detail & Related papers (2023-08-22T07:27:09Z)
FODVid: Flow-guided Object Discovery in Videos [12.792602427704395]
We focus on building a generalizable solution that avoids overfitting to the individual intricacies. To solve Video Object (VOS) in an unsupervised setting, we propose a new pipeline (FODVid) based on the idea of guiding segmentation outputs.
arXiv Detail & Related papers (2023-07-10T07:55:42Z)
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation. Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal. We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z)
Efficient Unsupervised Video Object Segmentation Network Based on Motion Guidance [1.5736899098702974]
This paper proposes a video object segmentation network based on motion guidance. The model comprises a dual-stream network, motion guidance module, and multi-scale progressive fusion module. The experimental results prove the superior performance of the proposed method.
arXiv Detail & Related papers (2022-11-10T06:13:23Z)
Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation. We introduce a scalable pipeline for generating synthetic training data with multiple objects. We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z)
Weakly Supervised Instance Segmentation using Motion Information via Optical Flow [3.0763099528432263]
We propose a two-stream encoder that leverages appearance and motion features extracted from images and optical flows. Our results demonstrate that the proposed method improves the Average Precision of the state-of-the-art method by 3.1.
arXiv Detail & Related papers (2022-02-25T22:41:54Z)
Self-supervised Video Object Segmentation by Motion Grouping [79.13206959575228]
We develop a computer vision system able to segment objects by exploiting motion cues. We introduce a simple variant of the Transformer to segment optical flow frames into primary objects and the background. We evaluate the proposed architecture on public benchmarks (DAVIS2016, SegTrackv2, and FBMS59)
arXiv Detail & Related papers (2021-04-15T17:59:32Z)
Target-Aware Object Discovery and Association for Unsupervised Video Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation. We introduce a novel approach for more accurate and efficient unseen-temporal segmentation. We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.