Tsanet: Temporal and Scale Alignment for Unsupervised Video Object
Segmentation
- URL: http://arxiv.org/abs/2303.04376v2
- Date: Wed, 21 Feb 2024 06:45:45 GMT
- Title: Tsanet: Temporal and Scale Alignment for Unsupervised Video Object
Segmentation
- Authors: Seunghoon Lee, Suhwan Cho, Dogyoon Lee, Minhyeok Lee, Sangyoun Lee
- Abstract summary: Unsupervised Video Object (UVOS) refers to the challenging task of segmenting the prominent object in videos without manual guidance.
We propose a novel framework for UVOS that can address the aforementioned limitations of the two approaches.
We present experimental results on public benchmark datasets, DAVIS 2016 and FBMS, which demonstrate the effectiveness of our method.
- Score: 21.19216164433897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised Video Object Segmentation (UVOS) refers to the challenging task
of segmenting the prominent object in videos without manual guidance. In recent
works, two approaches for UVOS have been discussed that can be divided into:
appearance and appearance-motion-based methods, which have limitations
respectively. Appearance-based methods do not consider the motion of the target
object due to exploiting the correlation information between randomly paired
frames. Appearance-motion-based methods have the limitation that the dependency
on optical flow is dominant due to fusing the appearance with motion. In this
paper, we propose a novel framework for UVOS that can address the
aforementioned limitations of the two approaches in terms of both time and
scale. Temporal Alignment Fusion aligns the saliency information of adjacent
frames with the target frame to leverage the information of adjacent frames.
Scale Alignment Decoder predicts the target object mask by aggregating
multi-scale feature maps via continuous mapping with implicit neural
representation. We present experimental results on public benchmark datasets,
DAVIS 2016 and FBMS, which demonstrate the effectiveness of our method.
Furthermore, we outperform the state-of-the-art methods on DAVIS 2016.
Related papers
- Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - FODVid: Flow-guided Object Discovery in Videos [12.792602427704395]
We focus on building a generalizable solution that avoids overfitting to the individual intricacies.
To solve Video Object (VOS) in an unsupervised setting, we propose a new pipeline (FODVid) based on the idea of guiding segmentation outputs.
arXiv Detail & Related papers (2023-07-10T07:55:42Z) - Online Unsupervised Video Object Segmentation via Contrastive Motion
Clustering [27.265597448266988]
Online unsupervised video object segmentation (UVOS) uses the previous frames as its input to automatically separate the primary object(s) from a streaming video without using any further manual annotation.
A major challenge is that the model has no access to the future and must rely solely on the history, i.e., the segmentation mask is predicted from the current frame as soon as it is captured.
In this work, a novel contrastive motion clustering algorithm with an optical flow as its input is proposed for the online UVOS by exploiting the common fate principle that visual elements tend to be perceived as a group if they possess the same
arXiv Detail & Related papers (2023-06-21T06:40:31Z) - Improving Unsupervised Video Object Segmentation with Motion-Appearance
Synergy [52.03068246508119]
We present IMAS, a method that segments the primary objects in videos without manual annotation in training or inference.
IMAS achieves Improved UVOS with Motion-Appearance Synergy.
We demonstrate its effectiveness in tuning critical hyperparams previously tuned with human annotation or hand-crafted hyperparam-specific metrics.
arXiv Detail & Related papers (2022-12-17T06:47:30Z) - Motion-inductive Self-supervised Object Discovery in Videos [99.35664705038728]
We propose a model for processing consecutive RGB frames, and infer the optical flow between any pair of frames using a layered representation.
We demonstrate superior performance over previous state-of-the-art methods on three public video segmentation datasets.
arXiv Detail & Related papers (2022-10-01T08:38:28Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - Box Supervised Video Segmentation Proposal Network [3.384080569028146]
We propose a box-supervised video object segmentation proposal network, which takes advantage of intrinsic video properties.
The proposed method outperforms the state-of-the-art self-supervised benchmark by 16.4% and 6.9%.
We provide extensive tests and ablations on the datasets, demonstrating the robustness of our method.
arXiv Detail & Related papers (2022-02-14T20:38:28Z) - Exploring Motion and Appearance Information for Temporal Sentence
Grounding [52.01687915910648]
We propose a Motion-Appearance Reasoning Network (MARN) to solve temporal sentence grounding.
We develop separate motion and appearance branches to learn motion-guided and appearance-guided object relations.
Our proposed MARN significantly outperforms previous state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-01-03T02:44:18Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.