Related papers: Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data

Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data

URL: http://arxiv.org/abs/2001.07613v1
Date: Fri, 17 Jan 2020 10:09:24 GMT
Title: Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data
Authors: Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo and Byoung-Tak Zhang
Abstract summary: We propose Cut-Based Graph Learning Networks (CB-GLNs) for learning video data by discovering complex structures of the video. CB-GLNs represent video data as a graph, with nodes and edges corresponding to frames of the video and their dependencies respectively. We evaluate the proposed method on the two different tasks for video understanding: Video theme classification (Youtube-8M dataset) and Video Question and Answering (TVQA dataset)
Score: 29.841574293529796
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conventional sequential learning methods such as Recurrent Neural Networks (RNNs) focus on interactions between consecutive inputs, i.e. first-order Markovian dependency. However, most of sequential data, as seen with videos, have complex dependency structures that imply variable-length semantic flows and their compositions, and those are hard to be captured by conventional methods. Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for learning video data by discovering these complex structures of the video. The CB-GLNs represent video data as a graph, with nodes and edges corresponding to frames of the video and their dependencies respectively. The CB-GLNs find compositional dependencies of the data in multilevel graph forms via a parameterized kernel with graph-cut and a message passing framework. We evaluate the proposed method on the two different tasks for video understanding: Video theme classification (Youtube-8M dataset) and Video Question and Answering (TVQA dataset). The experimental results show that our model efficiently learns the semantic compositional structure of video data. Furthermore, our model achieves the highest performance in comparison to other baseline methods.

Related papers

VideoSAGE: Video Summarization with Graph Representation Learning [9.21019970479227]
We propose a graph-based representation learning framework for video summarization. A graph constructed this way aims to capture long-range interactions among video frames, and the sparsity ensures the model trains without hitting the memory and compute bottleneck.
arXiv Detail & Related papers (2024-04-14T15:49:02Z)
GraphEdit: Large Language Models for Graph Structure Learning [62.618818029177355]
Graph Structure Learning (GSL) focuses on capturing intrinsic dependencies and interactions among nodes in graph-structured data. Existing GSL methods heavily depend on explicit graph structural information as supervision signals. We propose GraphEdit, an approach that leverages large language models (LLMs) to learn complex node relationships in graph-structured data.
arXiv Detail & Related papers (2024-02-23T08:29:42Z)
Semantic2Graph: Graph-based Multi-modal Feature Fusion for Action Segmentation in Videos [0.40778318140713216]
This study introduces a graph-structured approach named Semantic2Graph, to model long-term dependencies in videos. We have designed positive and negative semantic edges, accompanied by corresponding edge weights, to capture both long-term and short-term semantic relationships in video actions.
arXiv Detail & Related papers (2022-09-13T00:01:23Z)
A Robust Stacking Framework for Training Deep Graph Models with Multifaceted Node Features [61.92791503017341]
Graph Neural Networks (GNNs) with numerical node features and graph structure as inputs have demonstrated superior performance on various supervised learning tasks with graph data. The best models for such data types in most standard supervised learning settings with IID (non-graph) data are not easily incorporated into a GNN. Here we propose a robust stacking framework that fuses graph-aware propagation with arbitrary models intended for IID data.
arXiv Detail & Related papers (2022-06-16T22:46:33Z)
TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning [79.77010271213695]
We propose a novel video self-supervised learning framework named Temporal Contrastive Graph Learning (TCGL) Our TCGL integrates the prior knowledge about the frame and snippet orders into graph structures, i.e., the intra-/inter- snippet Temporal Contrastive Graphs (TCG) To generate supervisory signals for unlabeled videos, we introduce an Adaptive Snippet Order Prediction (ASOP) module.
arXiv Detail & Related papers (2021-12-07T09:27:56Z)
Reconstructive Sequence-Graph Network for Video Summarization [107.0328985865372]
Exploiting the inner-shot and inter-shot dependencies is essential for key-shot based video summarization. We propose a Reconstructive Sequence-Graph Network (RSGN) to encode the frames and shots as sequence and graph hierarchically. A reconstructor is developed to reward the summary generator, so that the generator can be optimized in an unsupervised manner.
arXiv Detail & Related papers (2021-05-10T01:47:55Z)
Temporal Contrastive Graph Learning for Video Action Recognition and Retrieval [83.56444443849679]
This work takes advantage of the temporal dependencies within videos and proposes a novel self-supervised method named Temporal Contrastive Graph Learning (TCGL) Our TCGL roots in a hybrid graph contrastive learning strategy to jointly regard the inter-snippet and intra-snippet temporal dependencies as self-supervision signals for temporal representation learning. Experimental results demonstrate the superiority of our TCGL over the state-of-the-art methods on large-scale action recognition and video retrieval benchmarks.
arXiv Detail & Related papers (2021-01-04T08:11:39Z)
SumGraph: Video Summarization via Recursive Graph Modeling [59.01856443537622]
We propose graph modeling networks for video summarization, termed SumGraph, to represent a relation graph. We achieve state-of-the-art performance on several benchmarks for video summarization in both supervised and unsupervised manners.
arXiv Detail & Related papers (2020-07-17T08:11:30Z)
Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks [150.5425122989146]
This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS) AGNN builds a fully connected graph to efficiently represent frames as nodes, and relations between arbitrary frame pairs as edges. Experimental results on three video segmentation datasets show that AGNN sets a new state-of-the-art in each case.
arXiv Detail & Related papers (2020-01-19T10:45:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.