Related papers: Feature-compatible Progressive Learning for Video Copy Detection

Feature-compatible Progressive Learning for Video Copy Detection

URL: http://arxiv.org/abs/2304.10305v2
Date: Fri, 12 May 2023 17:26:39 GMT
Title: Feature-compatible Progressive Learning for Video Copy Detection
Authors: Wenhao Wang, Yifan Sun, Yi Yang
Abstract summary: Video Copy Detection (VCD) has been developed to identify instances of unauthorized or duplicated video content. This paper presents our second place solutions to the Meta AI Video Similarity Challenge (VSC22), CVPR 2023.
Score: 30.358206867280426
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Video Copy Detection (VCD) has been developed to identify instances of unauthorized or duplicated video content. This paper presents our second place solutions to the Meta AI Video Similarity Challenge (VSC22), CVPR 2023. In order to compete in this challenge, we propose Feature-Compatible Progressive Learning (FCPL) for VCD. FCPL trains various models that produce mutually-compatible features, meaning that the features derived from multiple distinct models can be directly compared with one another. We find this mutual compatibility enables feature ensemble. By implementing progressive learning and utilizing labeled ground truth pairs, we effectively gradually enhance performance. Experimental results demonstrate the superiority of the proposed FCPL over other competitors. Our code is available at https://github.com/WangWenhao0716/VSC-DescriptorTrack-Submission and https://github.com/WangWenhao0716/VSC-MatchingTrack-Submission.

Related papers

Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets [62.280729345770936]
We introduce the task of Alignable Video Retrieval (AVR) Given a query video, our approach can identify well-alignable videos from a large collection of clips and temporally synchronize them to the query. Our experiments on 3 datasets, including large-scale Kinetics700, demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-02T20:00:49Z)
Comment-aided Video-Language Alignment via Contrastive Pre-training for Short-form Video Humor Detection [29.287017615414314]
We propose a novel model for short-form video humor detection, named Comment-aided Video-Language Alignment (CVLA) CVLA operates on raw signals across various modal channels but also yields an appropriate multi-modal representation by aligning the video and language components within a consistent semantic space. The experimental results on two humor detection datasets, including DY11k and UR-FUNNY, demonstrate that CVLA dramatically outperforms state-of-the-art and several competitive baseline approaches.
arXiv Detail & Related papers (2024-02-14T10:05:19Z)
A Similarity Alignment Model for Video Copy Segment Matching [13.517933749704866]
Meta AI hold Video Similarity Challenge on CVPR 2023 to push the technology forward. We propose a Similarity Alignment Model for video copy segment matching. Our SAM exhibits superior performance compared to other competitors.
arXiv Detail & Related papers (2023-05-25T03:08:51Z)
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending [78.1399386935455]
Large-scale image-text contrastive pre-training models, such as CLIP, have been demonstrated to effectively learn high-quality multimodal representations. We propose a novel video-text pre-training method dubbed VLAB: Video Language pre-training by feature generativearity and Blending. VLAB transfers CLIP representations to video pre-training tasks and develops unified video multimodal models for a wide range of video-text tasks.
arXiv Detail & Related papers (2023-05-22T15:54:22Z)
A Dual-level Detection Method for Video Copy Detection [13.517933749704866]
Meta AI hold Video Similarity Challenge on CVPR 2023 to push the technology forward. We propose a dual-level detection method with Video Editing Detection (VED) and Frame Scenes Detection (FSD) to tackle the core challenges on Video Copy Detection.
arXiv Detail & Related papers (2023-05-21T06:19:08Z)
3rd Place Solution to Meta AI Video Similarity Challenge [1.1470070927586016]
This paper presents our 3rd place solution in the Meta AI Video Similarity Challenge (VSC2022) Our approach builds upon existing image copy detection techniques and incorporates several strategies to exploit on the properties of video data.
arXiv Detail & Related papers (2023-04-24T10:00:09Z)
Few-Shot Video Object Detection [70.43402912344327]
We introduce Few-Shot Video Object Detection (FSVOD) with three important contributions. FSVOD-500 comprises of 500 classes with class-balanced videos in each category for few-shot learning. Our TPN and TMN+ are jointly and end-to-end trained.
arXiv Detail & Related papers (2021-04-30T07:38:04Z)
CoCon: Cooperative-Contrastive Learning [52.342936645996765]
Self-supervised visual representation learning is key for efficient video analysis. Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge. We introduce a cooperative variant of contrastive learning to utilize complementary information across views.
arXiv Detail & Related papers (2021-04-30T05:46:02Z)
Semi-Supervised Action Recognition with Temporal Contrastive Learning [50.08957096801457]
We learn a two-pathway temporal contrastive model using unlabeled videos at two different speeds. We considerably outperform video extensions of sophisticated state-of-the-art semi-supervised image recognition methods.
arXiv Detail & Related papers (2021-02-04T17:28:35Z)
Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework [43.002621928500425]
We propose a self-supervised method to learn feature representations from videos. Because video representation is important, we extend negative samples by introducing intra-negative samples. We conduct experiments on video retrieval and video recognition tasks using the learned video representation.
arXiv Detail & Related papers (2020-08-06T09:08:14Z)
Single Shot Video Object Detector [215.06904478667337]
Single Shot Video Object Detector (SSVD) is a new architecture that novelly integrates feature aggregation into a one-stage detector for object detection in videos. For $448 times 448$ input, SSVD achieves 79.2% mAP on ImageNet VID dataset.
arXiv Detail & Related papers (2020-07-07T15:36:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.