Flow-guided Semi-supervised Video Object Segmentation
- URL: http://arxiv.org/abs/2301.10492v1
- Date: Wed, 25 Jan 2023 10:02:31 GMT
- Title: Flow-guided Semi-supervised Video Object Segmentation
- Authors: Yushan Zhang, Andreas Robinson, Maria Magnusson, Michael Felsberg
- Abstract summary: We propose an optical flow-guided approach for semi-supervised video object segmentation.
A model to extract the combined information from optical flow and the image is proposed.
Experiments on DAVIS 2017 and YouTube-VOS 2019 show that by integrating the information extracted from optical flow into the original image branch results in a strong performance gain.
- Score: 14.357395825753827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an optical flow-guided approach for semi-supervised video object
segmentation. Optical flow is usually exploited as additional guidance
information in unsupervised video object segmentation. However, its relevance
in semi-supervised video object segmentation has not been fully explored. In
this work, we follow an encoder-decoder approach to address the segmentation
task. A model to extract the combined information from optical flow and the
image is proposed, which is then used as input to the target model and the
decoder network. Unlike previous methods where concatenation is used to
integrate information from image data and optical flow, a simple yet effective
attention mechanism is exploited in our work. Experiments on DAVIS 2017 and
YouTube-VOS 2019 show that by integrating the information extracted from
optical flow into the original image branch results in a strong performance
gain and our method achieves state-of-the-art performance.
Related papers
- Moving Object Proposals with Deep Learned Optical Flow for Video Object
Segmentation [1.551271936792451]
We propose a state of art architecture of neural networks to get the moving object proposals (MOP)
We first train an unsupervised convolutional neural network (UnFlow) to generate optical flow estimation.
Then we render the output of optical flow net to a fully convolutional SegNet model.
arXiv Detail & Related papers (2024-02-14T01:13:55Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - FODVid: Flow-guided Object Discovery in Videos [12.792602427704395]
We focus on building a generalizable solution that avoids overfitting to the individual intricacies.
To solve Video Object (VOS) in an unsupervised setting, we propose a new pipeline (FODVid) based on the idea of guiding segmentation outputs.
arXiv Detail & Related papers (2023-07-10T07:55:42Z) - Co-attention Propagation Network for Zero-Shot Video Object Segmentation [91.71692262860323]
Zero-shot object segmentation (ZS-VOS) aims to segment objects in a video sequence without prior knowledge of these objects.
Existing ZS-VOS methods often struggle to distinguish between foreground and background or to keep track of the foreground in complex scenarios.
We propose an encoder-decoder-based hierarchical co-attention propagation network (HCPN) capable of tracking and segmenting objects.
arXiv Detail & Related papers (2023-04-08T04:45:48Z) - Weakly Supervised Instance Segmentation using Motion Information via
Optical Flow [3.0763099528432263]
We propose a two-stream encoder that leverages appearance and motion features extracted from images and optical flows.
Our results demonstrate that the proposed method improves the Average Precision of the state-of-the-art method by 3.1.
arXiv Detail & Related papers (2022-02-25T22:41:54Z) - FAMINet: Learning Real-time Semi-supervised Video Object Segmentation
with Steepest Optimized Optical Flow [21.45623125216448]
Semi-supervised video object segmentation (VOS) aims to segment a few moving objects in a video sequence, where these objects are specified by annotation of first frame.
The optical flow has been considered in many existing semi-supervised VOS methods to improve the segmentation accuracy.
A FAMINet, which consists of a feature extraction network (F), an appearance network (A), a motion network (M), and an integration network (I), is proposed in this study to address the abovementioned problem.
arXiv Detail & Related papers (2021-11-20T07:24:33Z) - The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos [59.12750806239545]
We show that a video has different views of the same scene related by moving components, and the right region segmentation and region flow would allow mutual view synthesis.
Our model starts with two separate pathways: an appearance pathway that outputs feature-based region segmentation for a single image, and a motion pathway that outputs motion features for a pair of images.
By training the model to minimize view synthesis errors based on segment flow, our appearance and motion pathways learn region segmentation and flow estimation automatically without building them up from low-level edges or optical flows respectively.
arXiv Detail & Related papers (2021-11-11T18:59:11Z) - Optical Flow Estimation from a Single Motion-blurred Image [66.2061278123057]
Motion blur in an image may have practical interests in fundamental computer vision problems.
We propose a novel framework to estimate optical flow from a single motion-blurred image in an end-to-end manner.
arXiv Detail & Related papers (2021-03-04T12:45:18Z) - Beyond Single Stage Encoder-Decoder Networks: Deep Decoders for Semantic
Image Segmentation [56.44853893149365]
Single encoder-decoder methodologies for semantic segmentation are reaching their peak in terms of segmentation quality and efficiency per number of layers.
We propose a new architecture based on a decoder which uses a set of shallow networks for capturing more information content.
In order to further improve the architecture we introduce a weight function which aims to re-balance classes to increase the attention of the networks to under-represented objects.
arXiv Detail & Related papers (2020-07-19T18:44:34Z) - Motion-Attentive Transition for Zero-Shot Video Object Segmentation [99.44383412488703]
We present a Motion-Attentive Transition Network (MATNet) for zero-shot object segmentation.
An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder.
In this way, the encoder becomes deeply internative, allowing for closely hierarchical interactions between object motion and appearance.
arXiv Detail & Related papers (2020-03-09T16:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.