Efficient Unsupervised Video Object Segmentation Network Based on Motion
Guidance
- URL: http://arxiv.org/abs/2211.05364v1
- Date: Thu, 10 Nov 2022 06:13:23 GMT
- Title: Efficient Unsupervised Video Object Segmentation Network Based on Motion
Guidance
- Authors: Chao Hu, Liqiang Zhu
- Abstract summary: This paper proposes a video object segmentation network based on motion guidance.
The model comprises a dual-stream network, motion guidance module, and multi-scale progressive fusion module.
The experimental results prove the superior performance of the proposed method.
- Score: 1.5736899098702974
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Considerable unsupervised video object segmentation algorithms based on deep
learning have the problem of substantive model parameters and computation,
which significantly limits the application of the algorithm in practice. This
paper proposes a video object segmentation network based on motion guidance,
considerably reducing the number of model parameters and computation and
improving the video object segmentation performance. The model comprises a
dual-stream network, motion guidance module, and multi-scale progressive fusion
module. Specifically, RGB images and optical flow estimation are fed into
dual-stream network to extract object appearance features and motion features.
Then, the motion guidance module extracts the semantic information from the
motion features through local attention, which guides the appearance features
to learn rich semantic information. Finally, the multi-scale progressive fusion
module obtains the output features at each stage of the dual-stream network. It
gradually integrates the deep features into the shallow ones yet improves the
edge segmentation effect. In this paper, numerous evaluations are conducted on
three standard datasets, and the experimental results prove the superior
performance of the proposed method.
Related papers
- Moving Object Proposals with Deep Learned Optical Flow for Video Object
Segmentation [1.551271936792451]
We propose a state of art architecture of neural networks to get the moving object proposals (MOP)
We first train an unsupervised convolutional neural network (UnFlow) to generate optical flow estimation.
Then we render the output of optical flow net to a fully convolutional SegNet model.
arXiv Detail & Related papers (2024-02-14T01:13:55Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Semi-Weakly Supervised Object Kinematic Motion Prediction [56.282759127180306]
Given a 3D object, kinematic motion prediction aims to identify the mobile parts as well as the corresponding motion parameters.
We propose a graph neural network to learn the map between hierarchical part-level segmentation and mobile parts parameters.
The network predictions yield a large scale of 3D objects with pseudo labeled mobility information.
arXiv Detail & Related papers (2023-03-31T02:37:36Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - Learning to Segment Rigid Motions from Two Frames [72.14906744113125]
We propose a modular network, motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field.
It takes two consecutive frames as input and predicts segmentation masks for the background and multiple rigidly moving objects, which are then parameterized by 3D rigid transformations.
Our method achieves state-of-the-art performance for rigid motion segmentation on KITTI and Sintel.
arXiv Detail & Related papers (2021-01-11T04:20:30Z) - Unsupervised Learning Consensus Model for Dynamic Texture Videos
Segmentation [12.462608802359936]
We present an effective unsupervised learning consensus model for the segmentation of dynamic texture (ULCM)
In the proposed model, the set of values of the requantized local binary patterns (LBP) histogram around the pixel to be classified are used as features.
Experiments conducted on the challenging SynthDB dataset show that ULCM is significantly faster, easier to code, simple and has limited parameters.
arXiv Detail & Related papers (2020-06-29T16:40:59Z) - Motion-Attentive Transition for Zero-Shot Video Object Segmentation [99.44383412488703]
We present a Motion-Attentive Transition Network (MATNet) for zero-shot object segmentation.
An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder.
In this way, the encoder becomes deeply internative, allowing for closely hierarchical interactions between object motion and appearance.
arXiv Detail & Related papers (2020-03-09T16:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.