Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks
- URL: http://arxiv.org/abs/2001.06807v1
- Date: Sun, 19 Jan 2020 10:45:27 GMT
- Title: Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks
- Authors: Wenguan Wang, Xiankai Lu, Jianbing Shen, David Crandall, and Ling Shao
- Abstract summary: This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS)
AGNN builds a fully connected graph to efficiently represent frames as nodes, and relations between arbitrary frame pairs as edges.
Experimental results on three video segmentation datasets show that AGNN sets a new state-of-the-art in each case.
- Score: 150.5425122989146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work proposes a novel attentive graph neural network (AGNN) for
zero-shot video object segmentation (ZVOS). The suggested AGNN recasts this
task as a process of iterative information fusion over video graphs.
Specifically, AGNN builds a fully connected graph to efficiently represent
frames as nodes, and relations between arbitrary frame pairs as edges. The
underlying pair-wise relations are described by a differentiable attention
mechanism. Through parametric message passing, AGNN is able to efficiently
capture and mine much richer and higher-order relations between video frames,
thus enabling a more complete understanding of video content and more accurate
foreground estimation. Experimental results on three video segmentation
datasets show that AGNN sets a new state-of-the-art in each case. To further
demonstrate the generalizability of our framework, we extend AGNN to an
additional task: image object co-segmentation (IOCS). We perform experiments on
two famous IOCS datasets and observe again the superiority of our AGNN model.
The extensive experiments verify that AGNN is able to learn the underlying
semantic/appearance relationships among video frames or related images, and
discover the common objects.
Related papers
- VideoSAGE: Video Summarization with Graph Representation Learning [9.21019970479227]
We propose a graph-based representation learning framework for video summarization.
A graph constructed this way aims to capture long-range interactions among video frames, and the sparsity ensures the model trains without hitting the memory and compute bottleneck.
arXiv Detail & Related papers (2024-04-14T15:49:02Z) - Multi-Scene Generalized Trajectory Global Graph Solver with Composite
Nodes for Multiple Object Tracking [61.69892497726235]
Composite Node Message Passing Network (CoNo-Link) is a framework for modeling ultra-long frames information for association.
In addition to the previous method of treating objects as nodes, the network innovatively treats object trajectories as nodes for information interaction.
Our model can learn better predictions on longer-time scales by adding composite nodes.
arXiv Detail & Related papers (2023-12-14T14:00:30Z) - Semantic Scene Graph Generation Based on an Edge Dual Scene Graph and
Message Passing Neural Network [3.9280441311534653]
Scene graph generation (SGG) captures the relationships between objects in an image and creates a structured graph-based representation.
Existing SGG methods have a limited ability to accurately predict detailed relationships.
A new approach to the modeling multiobject relationships, called edge dual scene graph generation (EdgeSGG), is proposed herein.
arXiv Detail & Related papers (2023-11-02T12:36:52Z) - STAR-GNN: Spatial-Temporal Video Representation for Content-based
Retrieval [39.50179338831056]
We propose a video feature representation learning framework called STAR-GNN.
It applies a pluggable graph neural network component on a multi-scale lattice feature graph.
We show that STAR-GNN effectively implements a dynamic attention mechanism on video frame sequences.
arXiv Detail & Related papers (2022-08-15T01:47:50Z) - End-to-end video instance segmentation via spatial-temporal graph neural
networks [30.748756362692184]
Video instance segmentation is a challenging task that extends image instance segmentation to the video domain.
Existing methods either rely only on single-frame information for the detection and segmentation subproblems or handle tracking as a separate post-processing step.
We propose a novel graph-neural-network (GNN) based method to handle the aforementioned limitation.
arXiv Detail & Related papers (2022-03-07T05:38:08Z) - Relation Regularized Scene Graph Generation [206.76762860019065]
Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations.
We propose a relation regularized network (R2-Net) which can predict whether there is a relationship between two objects.
Our R2-Net can effectively refine object labels and generate scene graphs.
arXiv Detail & Related papers (2022-02-22T11:36:49Z) - Representing Videos as Discriminative Sub-graphs for Action Recognition [165.54738402505194]
We introduce a new design of sub-graphs to represent and encode theriminative patterns of each action in the videos.
We present MUlti-scale Sub-Earn Ling (MUSLE) framework that novelly builds space-time graphs and clusters into compact sub-graphs on each scale.
arXiv Detail & Related papers (2022-01-11T16:15:25Z) - Target Adaptive Context Aggregation for Video Scene Graph Generation [36.669700084337045]
This paper deals with a challenging task of video scene graph generation (VidSGG)
We present a new em detect-to-track paradigm for this task by decoupling the context modeling for relation prediction from the complicated low-level entity tracking.
arXiv Detail & Related papers (2021-08-18T12:46:28Z) - Visual Relationship Forecasting in Videos [56.122037294234865]
We present a new task named Visual Relationship Forecasting (VRF) in videos to explore the prediction of visual relationships in a manner of reasoning.
Given a subject-object pair with H existing frames, VRF aims to predict their future interactions for the next T frames without visual evidence.
To evaluate the VRF task, we introduce two video datasets named VRF-AG and VRF-VidOR, with a series oftemporally localized visual relation annotations in a video.
arXiv Detail & Related papers (2021-07-02T16:43:19Z) - SumGraph: Video Summarization via Recursive Graph Modeling [59.01856443537622]
We propose graph modeling networks for video summarization, termed SumGraph, to represent a relation graph.
We achieve state-of-the-art performance on several benchmarks for video summarization in both supervised and unsupervised manners.
arXiv Detail & Related papers (2020-07-17T08:11:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.