Related papers: VideoSAGE: Video Summarization with Graph Representation Learning

VideoSAGE: Video Summarization with Graph Representation Learning

URL: http://arxiv.org/abs/2404.10539v1
Date: Sun, 14 Apr 2024 15:49:02 GMT
Title: VideoSAGE: Video Summarization with Graph Representation Learning
Authors: Jose M. Rojas Chaves, Subarna Tripathi,
Abstract summary: We propose a graph-based representation learning framework for video summarization. A graph constructed this way aims to capture long-range interactions among video frames, and the sparsity ensures the model trains without hitting the memory and compute bottleneck.
Score: 9.21019970479227
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a graph-based representation learning framework for video summarization. First, we convert an input video to a graph where nodes correspond to each of the video frames. Then, we impose sparsity on the graph by connecting only those pairs of nodes that are within a specified temporal distance. We then formulate the video summarization task as a binary node classification problem, precisely classifying video frames whether they should belong to the output summary video. A graph constructed this way aims to capture long-range interactions among video frames, and the sparsity ensures the model trains without hitting the memory and compute bottleneck. Experiments on two datasets(SumMe and TVSum) demonstrate the effectiveness of the proposed nimble model compared to existing state-of-the-art summarization approaches while being one order of magnitude more efficient in compute time and memory

Related papers

Memory Efficient Temporal & Visual Graph Model for Unsupervised Video Domain Adaptation [50.158454960223274]
Existing video domain adaption (DA) methods need to store all temporal combinations of video frames or pair the source and target videos. We propose a memory-efficient graph-based video DA approach.
arXiv Detail & Related papers (2022-08-13T02:56:10Z)
GraphVid: It Only Takes a Few Nodes to Understand a Video [0.0]
We propose a concise representation of videos that encode perceptually meaningful features into graphs. We construct superpixel-based graph representations of videos by considering superpixels as graph nodes. We leverage Graph Convolutional Networks to process this representation and predict the desired output.
arXiv Detail & Related papers (2022-07-04T12:52:54Z)
CGMN: A Contrastive Graph Matching Network for Self-Supervised Graph Similarity Learning [65.1042892570989]
We propose a contrastive graph matching network (CGMN) for self-supervised graph similarity learning. We employ two strategies, namely cross-view interaction and cross-graph interaction, for effective node representation learning. We transform node representations into graph-level representations via pooling operations for graph similarity computation.
arXiv Detail & Related papers (2022-05-30T13:20:26Z)
Line Graph Neural Networks for Link Prediction [71.00689542259052]
We consider the graph link prediction task, which is a classic graph analytical problem with many real-world applications. In this formalism, a link prediction problem is converted to a graph classification task. We propose to seek a radically different and novel path by making use of the line graphs in graph theory. In particular, each node in a line graph corresponds to a unique edge in the original graph. Therefore, link prediction problems in the original graph can be equivalently solved as a node classification problem in its corresponding line graph, instead of a graph classification task.
arXiv Detail & Related papers (2020-10-20T05:54:31Z)
Location-aware Graph Convolutional Networks for Video Question Answering [85.44666165818484]
We propose to represent the contents in the video as a location-aware graph. Based on the constructed graph, we propose to use graph convolution to infer both the category and temporal locations of an action. Our method significantly outperforms state-of-the-art methods on TGIF-QA, Youtube2Text-QA, and MSVD-QA datasets.
arXiv Detail & Related papers (2020-08-07T02:12:56Z)
Graph Neural Network for Video Relocalization [16.67309677191578]
We find that in video relocalization datasets, there exists a phenomenon showing that there does not exist consistent relationship between feature similarity by frame and feature similarity by video. Taking this phenomenon into account, in this article, we treat video features as a graph by concatenating the query video feature and proposal video feature along time dimension. With the power of graph neural networks, we propose a Multi-Graph Feature Fusion Module to fuse the relation feature of this graph.
arXiv Detail & Related papers (2020-07-20T04:01:40Z)
SumGraph: Video Summarization via Recursive Graph Modeling [59.01856443537622]
We propose graph modeling networks for video summarization, termed SumGraph, to represent a relation graph. We achieve state-of-the-art performance on several benchmarks for video summarization in both supervised and unsupervised manners.
arXiv Detail & Related papers (2020-07-17T08:11:30Z)
Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks [150.5425122989146]
This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS) AGNN builds a fully connected graph to efficiently represent frames as nodes, and relations between arbitrary frame pairs as edges. Experimental results on three video segmentation datasets show that AGNN sets a new state-of-the-art in each case.
arXiv Detail & Related papers (2020-01-19T10:45:27Z)
Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data [29.841574293529796]
We propose Cut-Based Graph Learning Networks (CB-GLNs) for learning video data by discovering complex structures of the video. CB-GLNs represent video data as a graph, with nodes and edges corresponding to frames of the video and their dependencies respectively. We evaluate the proposed method on the two different tasks for video understanding: Video theme classification (Youtube-8M dataset) and Video Question and Answering (TVQA dataset)
arXiv Detail & Related papers (2020-01-17T10:09:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.