FODVid: Flow-guided Object Discovery in Videos
- URL: http://arxiv.org/abs/2307.04392v1
- Date: Mon, 10 Jul 2023 07:55:42 GMT
- Title: FODVid: Flow-guided Object Discovery in Videos
- Authors: Silky Singh and Shripad Deshmukh and Mausoom Sarkar and Rishabh Jain
and Mayur Hemani and Balaji Krishnamurthy
- Abstract summary: We focus on building a generalizable solution that avoids overfitting to the individual intricacies.
To solve Video Object (VOS) in an unsupervised setting, we propose a new pipeline (FODVid) based on the idea of guiding segmentation outputs.
- Score: 12.792602427704395
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Segmentation of objects in a video is challenging due to the nuances such as
motion blurring, parallax, occlusions, changes in illumination, etc. Instead of
addressing these nuances separately, we focus on building a generalizable
solution that avoids overfitting to the individual intricacies. Such a solution
would also help us save enormous resources involved in human annotation of
video corpora. To solve Video Object Segmentation (VOS) in an unsupervised
setting, we propose a new pipeline (FODVid) based on the idea of guiding
segmentation outputs using flow-guided graph-cut and temporal consistency.
Basically, we design a segmentation model incorporating intra-frame appearance
and flow similarities, and inter-frame temporal continuation of the objects
under consideration. We perform an extensive experimental analysis of our
straightforward methodology on the standard DAVIS16 video benchmark. Though
simple, our approach produces results comparable (within a range of ~2 mIoU) to
the existing top approaches in unsupervised VOS. The simplicity and
effectiveness of our technique opens up new avenues for research in the video
domain.
Related papers
- Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation [76.68301884987348]
We propose a simple yet effective approach for self-supervised video object segmentation (VOS)
Our key insight is that the inherent structural dependencies present in DINO-pretrained Transformers can be leveraged to establish robust-temporal segmentation correspondences in videos.
Our method demonstrates state-of-the-art performance across multiple unsupervised VOS benchmarks and excels in complex real-world multi-object video segmentation tasks.
arXiv Detail & Related papers (2023-11-29T18:47:17Z) - Tsanet: Temporal and Scale Alignment for Unsupervised Video Object
Segmentation [21.19216164433897]
Unsupervised Video Object (UVOS) refers to the challenging task of segmenting the prominent object in videos without manual guidance.
We propose a novel framework for UVOS that can address the aforementioned limitations of the two approaches.
We present experimental results on public benchmark datasets, DAVIS 2016 and FBMS, which demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-03-08T04:59:43Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - Learning Video Salient Object Detection Progressively from Unlabeled
Videos [8.224670666756193]
We propose a novel VSOD method via a progressive framework that locates and segments salient objects in sequence without utilizing any video annotation.
Specifically, an algorithm for generating deeptemporal location labels, which consists of generating high-saliency location labels and tracking salient objects in adjacent frames, is proposed.
Although our method does not require labeled video at all, the experimental results on five public benchmarks of DAVIS, FBMS, ViSal, VOS, and DAVSOD demonstrate that our proposed method is competitive with fully supervised methods and outperforms the state-of-the-art weakly and unsupervised methods.
arXiv Detail & Related papers (2022-04-05T06:12:45Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - Box Supervised Video Segmentation Proposal Network [3.384080569028146]
We propose a box-supervised video object segmentation proposal network, which takes advantage of intrinsic video properties.
The proposed method outperforms the state-of-the-art self-supervised benchmark by 16.4% and 6.9%.
We provide extensive tests and ablations on the datasets, demonstrating the robustness of our method.
arXiv Detail & Related papers (2022-02-14T20:38:28Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Self-supervised Video Object Segmentation by Motion Grouping [79.13206959575228]
We develop a computer vision system able to segment objects by exploiting motion cues.
We introduce a simple variant of the Transformer to segment optical flow frames into primary objects and the background.
We evaluate the proposed architecture on public benchmarks (DAVIS2016, SegTrackv2, and FBMS59)
arXiv Detail & Related papers (2021-04-15T17:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.