Self-supervised Video Retrieval Transformer Network
- URL: http://arxiv.org/abs/2104.07993v1
- Date: Fri, 16 Apr 2021 09:43:45 GMT
- Title: Self-supervised Video Retrieval Transformer Network
- Authors: Xiangteng He, Yulin Pan, Mingqian Tang and Yiliang Lv
- Abstract summary: We propose SVRTN, which applies self-supervised training to learn video representation from unlabeled data.
It exploits transformer structure to aggregate frame-level features into clip-level to reduce both storage space and search complexity.
It can learn the complementary and discriminative information from the interactions among clip frames, as well as acquire the frame permutation and missing invariant ability to support more flexible retrieval manners.
- Score: 10.456881328982586
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Content-based video retrieval aims to find videos from a large video database
that are similar to or even near-duplicate of a given query video. Video
representation and similarity search algorithms are crucial to any video
retrieval system. To derive effective video representation, most video
retrieval systems require a large amount of manually annotated data for
training, making it costly inefficient. In addition, most retrieval systems are
based on frame-level features for video similarity searching, making it
expensive both storage wise and search wise. We propose a novel video retrieval
system, termed SVRTN, that effectively addresses the above shortcomings. It
first applies self-supervised training to effectively learn video
representation from unlabeled data to avoid the expensive cost of manual
annotation. Then, it exploits transformer structure to aggregate frame-level
features into clip-level to reduce both storage space and search complexity. It
can learn the complementary and discriminative information from the
interactions among clip frames, as well as acquire the frame permutation and
missing invariant ability to support more flexible retrieval manners.
Comprehensive experiments on two challenging video retrieval datasets, namely
FIVR-200K and SVD, verify the effectiveness of our proposed SVRTN method, which
achieves the best performance of video retrieval on accuracy and efficiency.
Related papers
- Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets [62.280729345770936]
We introduce the task of Alignable Video Retrieval (AVR)
Given a query video, our approach can identify well-alignable videos from a large collection of clips and temporally synchronize them to the query.
Our experiments on 3 datasets, including large-scale Kinetics700, demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-02T20:00:49Z) - EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval [52.375143786641196]
EgoCVR is an evaluation benchmark for fine-grained Composed Video Retrieval.
EgoCVR consists of 2,295 queries that specifically focus on high-quality temporal video understanding.
arXiv Detail & Related papers (2024-07-23T17:19:23Z) - VVS: Video-to-Video Retrieval with Irrelevant Frame Suppression [12.793922882841137]
We show that appropriate suppression of irrelevant frames can provide insight into the current obstacles of the video-level approaches.
We propose a Video-to-Video Suppression network (VVS) as a solution.
VVS is an end-to-end framework that consists of an easy distractor elimination stage to identify which frames to remove and a suppression weight generation stage to determine the extent to suppress the remaining frames.
arXiv Detail & Related papers (2023-03-15T20:02:54Z) - Deep Unsupervised Key Frame Extraction for Efficient Video
Classification [63.25852915237032]
This work presents an unsupervised method to retrieve the key frames, which combines Convolutional Neural Network (CNN) and Temporal Segment Density Peaks Clustering (TSDPC)
The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically.
Furthermore, a Long Short-Term Memory network (LSTM) is added on the top of the CNN to further elevate the performance of classification.
arXiv Detail & Related papers (2022-11-12T20:45:35Z) - Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval [55.088635195893325]
We propose the first quantized representation learning method for cross-view video retrieval, namely Hybrid Contrastive Quantization (HCQ)
HCQ learns both coarse-grained and fine-grained quantizations with transformers, which provide complementary understandings for texts and videos.
Experiments on three Web video benchmark datasets demonstrate that HCQ achieves competitive performance with state-of-the-art non-compressed retrieval methods.
arXiv Detail & Related papers (2022-02-07T18:04:10Z) - Video Corpus Moment Retrieval with Contrastive Learning [56.249924768243375]
Video corpus moment retrieval (VCMR) is to retrieve a temporal moment that semantically corresponds to a given text query.
We propose a Retrieval and Localization Network with Contrastive Learning (ReLoCLNet) for VCMR.
Experimental results show that ReLoCLNet encodes text and video separately for efficiency, its retrieval accuracy is comparable with baselines adopting cross-modal interaction learning.
arXiv Detail & Related papers (2021-05-13T12:54:39Z) - Temporal Context Aggregation for Video Retrieval with Contrastive
Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features.
The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z) - Feature Re-Learning with Data Augmentation for Video Relevance
Prediction [35.87597969685573]
Re-learning is realized by projecting a given deep feature into a new space by an affine transformation.
We propose a new data augmentation strategy which works directly on frame-level and video-level features.
arXiv Detail & Related papers (2020-04-08T05:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.