Memory Efficient Temporal & Visual Graph Model for Unsupervised Video
Domain Adaptation
- URL: http://arxiv.org/abs/2208.06554v1
- Date: Sat, 13 Aug 2022 02:56:10 GMT
- Title: Memory Efficient Temporal & Visual Graph Model for Unsupervised Video
Domain Adaptation
- Authors: Xinyue Hu, Lin Gu, Liangchen Liu, Ruijiang Li, Chang Su, Tatsuya
Harada, Yingying Zhu
- Abstract summary: Existing video domain adaption (DA) methods need to store all temporal combinations of video frames or pair the source and target videos.
We propose a memory-efficient graph-based video DA approach.
- Score: 50.158454960223274
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing video domain adaption (DA) methods need to store all temporal
combinations of video frames or pair the source and target videos, which are
memory cost expensive and can't scale up to long videos. To address these
limitations, we propose a memory-efficient graph-based video DA approach as
follows. At first our method models each source or target video by a graph:
nodes represent video frames and edges represent the temporal or visual
similarity relationship between frames. We use a graph attention network to
learn the weight of individual frames and simultaneously align the source and
target video into a domain-invariant graph feature space. Instead of storing a
large number of sub-videos, our method only constructs one graph with a graph
attention mechanism for one video, reducing the memory cost substantially. The
extensive experiments show that, compared with the state-of-art methods, we
achieved superior performance while reducing the memory cost significantly.
Related papers
- VideoPatchCore: An Effective Method to Memorize Normality for Video Anomaly Detection [1.9384004397336387]
Video anomaly detection (VAD) is a crucial task in video analysis and surveillance within computer vision.
We propose an effective memory method for VAD, called VideoPatchCore.
Our approach introduces a structure that prioritizes memory optimization and configures three types of memory tailored to the characteristics of video data.
arXiv Detail & Related papers (2024-09-24T16:38:41Z) - VideoSAGE: Video Summarization with Graph Representation Learning [9.21019970479227]
We propose a graph-based representation learning framework for video summarization.
A graph constructed this way aims to capture long-range interactions among video frames, and the sparsity ensures the model trains without hitting the memory and compute bottleneck.
arXiv Detail & Related papers (2024-04-14T15:49:02Z) - A Simple Recipe for Contrastively Pre-training Video-First Encoders
Beyond 16 Frames [54.90226700939778]
We build on the common paradigm of transferring large-scale, image--text models to video via shallow temporal fusion.
We expose two limitations to the approach: (1) decreased spatial capabilities, likely due to poor video--language alignment in standard video datasets, and (2) higher memory consumption, bottlenecking the number of frames that can be processed.
arXiv Detail & Related papers (2023-12-12T16:10:19Z) - Is a Video worth $n\times n$ Images? A Highly Efficient Approach to
Transformer-based Video Question Answering [14.659023742381777]
Conventional Transformer-based Video Question Answering (VideoQA) approaches generally encode frames independently through one or more image encoders followed by interaction between frames and question.
We present a highly efficient approach for VideoQA based on existing vision-language pre-trained models where we video frames to a $ntimes n$ matrix and then convert it to one image.
arXiv Detail & Related papers (2023-05-16T02:12:57Z) - GraphVid: It Only Takes a Few Nodes to Understand a Video [0.0]
We propose a concise representation of videos that encode perceptually meaningful features into graphs.
We construct superpixel-based graph representations of videos by considering superpixels as graph nodes.
We leverage Graph Convolutional Networks to process this representation and predict the desired output.
arXiv Detail & Related papers (2022-07-04T12:52:54Z) - MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient
Long-Term Video Recognition [74.35009770905968]
We build a memory-augmented vision transformer that has a temporal support 30x longer than existing models.
MeMViT obtains state-of-the-art results on the AVA, EPIC-Kitchens-100 action classification, and action anticipation datasets.
arXiv Detail & Related papers (2022-01-20T18:59:54Z) - Memory-Augmented Non-Local Attention for Video Super-Resolution [61.55700315062226]
We propose a novel video super-resolution method that aims at generating high-fidelity high-resolution (HR) videos from low-resolution (LR) ones.
Previous methods predominantly leverage temporal neighbor frames to assist the super-resolution of the current frame.
In contrast, we devise a cross-frame non-local attention mechanism that allows video super-resolution without frame alignment.
arXiv Detail & Related papers (2021-08-25T05:12:14Z) - SumGraph: Video Summarization via Recursive Graph Modeling [59.01856443537622]
We propose graph modeling networks for video summarization, termed SumGraph, to represent a relation graph.
We achieve state-of-the-art performance on several benchmarks for video summarization in both supervised and unsupervised manners.
arXiv Detail & Related papers (2020-07-17T08:11:30Z) - Video Object Segmentation with Episodic Graph Memory Networks [198.74780033475724]
A graph memory network is developed to address the novel idea of "learning to update the segmentation model"
We exploit an episodic memory network, organized as a fully connected graph, to store frames as nodes and capture cross-frame correlations by edges.
The proposed graph memory network yields a neat yet principled framework, which can generalize well both one-shot and zero-shot video object segmentation tasks.
arXiv Detail & Related papers (2020-07-14T13:19:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.