Feature-compatible Progressive Learning for Video Copy Detection
- URL: http://arxiv.org/abs/2304.10305v2
- Date: Fri, 12 May 2023 17:26:39 GMT
- Title: Feature-compatible Progressive Learning for Video Copy Detection
- Authors: Wenhao Wang, Yifan Sun, Yi Yang
- Abstract summary: Video Copy Detection (VCD) has been developed to identify instances of unauthorized or duplicated video content.
This paper presents our second place solutions to the Meta AI Video Similarity Challenge (VSC22), CVPR 2023.
- Score: 30.358206867280426
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Video Copy Detection (VCD) has been developed to identify instances of
unauthorized or duplicated video content. This paper presents our second place
solutions to the Meta AI Video Similarity Challenge (VSC22), CVPR 2023. In
order to compete in this challenge, we propose Feature-Compatible Progressive
Learning (FCPL) for VCD. FCPL trains various models that produce
mutually-compatible features, meaning that the features derived from multiple
distinct models can be directly compared with one another. We find this mutual
compatibility enables feature ensemble. By implementing progressive learning
and utilizing labeled ground truth pairs, we effectively gradually enhance
performance. Experimental results demonstrate the superiority of the proposed
FCPL over other competitors. Our code is available at
https://github.com/WangWenhao0716/VSC-DescriptorTrack-Submission and
https://github.com/WangWenhao0716/VSC-MatchingTrack-Submission.
Related papers
- Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets [62.280729345770936]
We introduce the task of Alignable Video Retrieval (AVR)
Given a query video, our approach can identify well-alignable videos from a large collection of clips and temporally synchronize them to the query.
Our experiments on 3 datasets, including large-scale Kinetics700, demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-02T20:00:49Z) - Comment-aided Video-Language Alignment via Contrastive Pre-training for Short-form Video Humor Detection [29.287017615414314]
We propose a novel model for short-form video humor detection, named Comment-aided Video-Language Alignment (CVLA)
CVLA operates on raw signals across various modal channels but also yields an appropriate multi-modal representation by aligning the video and language components within a consistent semantic space.
The experimental results on two humor detection datasets, including DY11k and UR-FUNNY, demonstrate that CVLA dramatically outperforms state-of-the-art and several competitive baseline approaches.
arXiv Detail & Related papers (2024-02-14T10:05:19Z) - A Similarity Alignment Model for Video Copy Segment Matching [13.517933749704866]
Meta AI hold Video Similarity Challenge on CVPR 2023 to push the technology forward.
We propose a Similarity Alignment Model for video copy segment matching.
Our SAM exhibits superior performance compared to other competitors.
arXiv Detail & Related papers (2023-05-25T03:08:51Z) - VLAB: Enhancing Video Language Pre-training by Feature Adapting and
Blending [78.1399386935455]
Large-scale image-text contrastive pre-training models, such as CLIP, have been demonstrated to effectively learn high-quality multimodal representations.
We propose a novel video-text pre-training method dubbed VLAB: Video Language pre-training by feature generativearity and Blending.
VLAB transfers CLIP representations to video pre-training tasks and develops unified video multimodal models for a wide range of video-text tasks.
arXiv Detail & Related papers (2023-05-22T15:54:22Z) - A Dual-level Detection Method for Video Copy Detection [13.517933749704866]
Meta AI hold Video Similarity Challenge on CVPR 2023 to push the technology forward.
We propose a dual-level detection method with Video Editing Detection (VED) and Frame Scenes Detection (FSD) to tackle the core challenges on Video Copy Detection.
arXiv Detail & Related papers (2023-05-21T06:19:08Z) - 3rd Place Solution to Meta AI Video Similarity Challenge [1.1470070927586016]
This paper presents our 3rd place solution in the Meta AI Video Similarity Challenge (VSC2022)
Our approach builds upon existing image copy detection techniques and incorporates several strategies to exploit on the properties of video data.
arXiv Detail & Related papers (2023-04-24T10:00:09Z) - Few-Shot Video Object Detection [70.43402912344327]
We introduce Few-Shot Video Object Detection (FSVOD) with three important contributions.
FSVOD-500 comprises of 500 classes with class-balanced videos in each category for few-shot learning.
Our TPN and TMN+ are jointly and end-to-end trained.
arXiv Detail & Related papers (2021-04-30T07:38:04Z) - CoCon: Cooperative-Contrastive Learning [52.342936645996765]
Self-supervised visual representation learning is key for efficient video analysis.
Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge.
We introduce a cooperative variant of contrastive learning to utilize complementary information across views.
arXiv Detail & Related papers (2021-04-30T05:46:02Z) - Semi-Supervised Action Recognition with Temporal Contrastive Learning [50.08957096801457]
We learn a two-pathway temporal contrastive model using unlabeled videos at two different speeds.
We considerably outperform video extensions of sophisticated state-of-the-art semi-supervised image recognition methods.
arXiv Detail & Related papers (2021-02-04T17:28:35Z) - Self-supervised Video Representation Learning Using Inter-intra
Contrastive Framework [43.002621928500425]
We propose a self-supervised method to learn feature representations from videos.
Because video representation is important, we extend negative samples by introducing intra-negative samples.
We conduct experiments on video retrieval and video recognition tasks using the learned video representation.
arXiv Detail & Related papers (2020-08-06T09:08:14Z) - Single Shot Video Object Detector [215.06904478667337]
Single Shot Video Object Detector (SSVD) is a new architecture that novelly integrates feature aggregation into a one-stage detector for object detection in videos.
For $448 times 448$ input, SSVD achieves 79.2% mAP on ImageNet VID dataset.
arXiv Detail & Related papers (2020-07-07T15:36:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.