Hierarchical Graph Pattern Understanding for Zero-Shot VOS
- URL: http://arxiv.org/abs/2312.09525v1
- Date: Fri, 15 Dec 2023 04:13:21 GMT
- Title: Hierarchical Graph Pattern Understanding for Zero-Shot VOS
- Authors: Gensheng Pei, Fumin Shen, Yazhou Yao, Tao Chen, Xian-Sheng Hua, and
Heng-Tao Shen
- Abstract summary: This paper proposes a new hierarchical graph neural network (GNN) architecture for zero-shot video object segmentation (ZS-VOS)
Inspired by the strong ability of GNNs in capturing structural relations, HGPU innovatively leverages motion cues (ie, optical flow) to enhance the high-order representations from the neighbors of target frames.
- Score: 102.21052200245457
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The optical flow guidance strategy is ideal for obtaining motion information
of objects in the video. It is widely utilized in video segmentation tasks.
However, existing optical flow-based methods have a significant dependency on
optical flow, which results in poor performance when the optical flow
estimation fails for a particular scene. The temporal consistency provided by
the optical flow could be effectively supplemented by modeling in a structural
form. This paper proposes a new hierarchical graph neural network (GNN)
architecture, dubbed hierarchical graph pattern understanding (HGPU), for
zero-shot video object segmentation (ZS-VOS). Inspired by the strong ability of
GNNs in capturing structural relations, HGPU innovatively leverages motion cues
(\ie, optical flow) to enhance the high-order representations from the
neighbors of target frames. Specifically, a hierarchical graph pattern encoder
with message aggregation is introduced to acquire different levels of motion
and appearance features in a sequential manner. Furthermore, a decoder is
designed for hierarchically parsing and understanding the transformed
multi-modal contexts to achieve more accurate and robust results. HGPU achieves
state-of-the-art performance on four publicly available benchmarks (DAVIS-16,
YouTube-Objects, Long-Videos and DAVIS-17). Code and pre-trained model can be
found at \url{https://github.com/NUST-Machine-Intelligence-Laboratory/HGPU}.
Related papers
- Moving Object Proposals with Deep Learned Optical Flow for Video Object
Segmentation [1.551271936792451]
We propose a state of art architecture of neural networks to get the moving object proposals (MOP)
We first train an unsupervised convolutional neural network (UnFlow) to generate optical flow estimation.
Then we render the output of optical flow net to a fully convolutional SegNet model.
arXiv Detail & Related papers (2024-02-14T01:13:55Z) - Pair-wise Layer Attention with Spatial Masking for Video Prediction [46.17429511620538]
We develop a Pair-wise Layer Attention (PLA) module to enhance the layer-wise semantic dependency of the feature maps.
We also present a Pair-wise Layer Attention with Spatial Masking (SM-SM) framework for Translator prediction.
arXiv Detail & Related papers (2023-11-19T10:29:05Z) - GAFlow: Incorporating Gaussian Attention into Optical Flow [62.646389181507764]
We push Gaussian Attention (GA) into the optical flow models to accentuate local properties during representation learning.
We introduce a novel Gaussian-Constrained Layer (GCL) which can be easily plugged into existing Transformer blocks.
For reliable motion analysis, we provide a new Gaussian-Guided Attention Module (GGAM)
arXiv Detail & Related papers (2023-09-28T07:46:01Z) - From Hypergraph Energy Functions to Hypergraph Neural Networks [94.88564151540459]
We present an expressive family of parameterized, hypergraph-regularized energy functions.
We then demonstrate how minimizers of these energies effectively serve as node embeddings.
We draw parallels between the proposed bilevel hypergraph optimization, and existing GNN architectures in common use.
arXiv Detail & Related papers (2023-06-16T04:40:59Z) - Neighbor Correspondence Matching for Flow-based Video Frame Synthesis [90.14161060260012]
We introduce a neighbor correspondence matching (NCM) algorithm for flow-based frame synthesis.
NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel.
coarse-scale module is designed to leverage neighbor correspondences to capture large motion, while the fine-scale module is more efficient to speed up the estimation process.
arXiv Detail & Related papers (2022-07-14T09:17:00Z) - All-optical graph representation learning using integrated diffractive
photonic computing units [51.15389025760809]
Photonic neural networks perform brain-inspired computations using photons instead of electrons.
We propose an all-optical graph representation learning architecture, termed diffractive graph neural network (DGNN)
We demonstrate the use of DGNN extracted features for node and graph-level classification tasks with benchmark databases and achieve superior performance.
arXiv Detail & Related papers (2022-04-23T02:29:48Z) - Motion-aware Dynamic Graph Neural Network for Video Compressive Sensing [14.67994875448175]
Video snapshot imaging (SCI) utilizes a 2D detector to capture sequential video frames and compress them into a single measurement.
Most existing reconstruction methods are incapable of efficiently capturing long-range spatial and temporal dependencies.
We propose a flexible and robust approach based on the graph neural network (GNN) to efficiently model non-local interactions between pixels in space and time regardless of the distance.
arXiv Detail & Related papers (2022-03-01T12:13:46Z) - A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs)
The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved.
We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z) - FAMINet: Learning Real-time Semi-supervised Video Object Segmentation
with Steepest Optimized Optical Flow [21.45623125216448]
Semi-supervised video object segmentation (VOS) aims to segment a few moving objects in a video sequence, where these objects are specified by annotation of first frame.
The optical flow has been considered in many existing semi-supervised VOS methods to improve the segmentation accuracy.
A FAMINet, which consists of a feature extraction network (F), an appearance network (A), a motion network (M), and an integration network (I), is proposed in this study to address the abovementioned problem.
arXiv Detail & Related papers (2021-11-20T07:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.