VPN: Video Provenance Network for Robust Content Attribution
- URL: http://arxiv.org/abs/2109.10038v1
- Date: Tue, 21 Sep 2021 09:07:05 GMT
- Title: VPN: Video Provenance Network for Robust Content Attribution
- Authors: Alexander Black, Tu Bui, Simon Jenni, Vishy Swaminathan, John
Collomosse
- Abstract summary: We present VPN - a content attribution method for recovering provenance information from videos shared online.
We learn a robust search embedding for matching such video, using full-length or truncated video queries.
Once matched against a trusted database of video clips, associated information on the provenance of the clip is presented to the user.
- Score: 72.12494245048504
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present VPN - a content attribution method for recovering provenance
information from videos shared online. Platforms, and users, often transform
video into different quality, codecs, sizes, shapes, etc. or slightly edit its
content such as adding text or emoji, as they are redistributed online. We
learn a robust search embedding for matching such video, invariant to these
transformations, using full-length or truncated video queries. Once matched
against a trusted database of video clips, associated information on the
provenance of the clip is presented to the user. We use an inverted index to
match temporal chunks of video using late-fusion to combine both visual and
audio features. In both cases, features are extracted via a deep neural network
trained using contrastive learning on a dataset of original and augmented video
clips. We demonstrate high accuracy recall over a corpus of 100,000 videos.
Related papers
- Spatio-temporal Prompting Network for Robust Video Feature Extraction [74.54597668310707]
Frametemporal is one of the main challenges in the field of video understanding.
Recent approaches exploit transformer-based integration modules to obtain quality-of-temporal information.
We present a neat and unified framework called N-Temporal Prompting Network (NNSTP)
It can efficiently extract video features by adjusting the input features in the network backbone.
arXiv Detail & Related papers (2024-02-04T17:52:04Z) - Video Infringement Detection via Feature Disentanglement and Mutual
Information Maximization [51.206398602941405]
We propose to disentangle an original high-dimensional feature into multiple sub-features.
On top of the disentangled sub-features, we learn an auxiliary feature to enhance the sub-features.
Our method achieves 90.1% TOP-100 mAP on the large-scale SVD dataset and also sets the new state-of-the-art on the VCSL benchmark dataset.
arXiv Detail & Related papers (2023-09-13T10:53:12Z) - VADER: Video Alignment Differencing and Retrieval [70.88247176534426]
VADER matches and aligns partial video fragments to candidate videos using a robust visual descriptor and scalable search over chunked video content.
A space-time comparator module identifies regions of manipulation between content, invariant to any changes due to any residual temporal misalignments or artifacts arising from non-editorial changes of the content.
arXiv Detail & Related papers (2023-03-23T11:50:44Z) - Partially Relevant Video Retrieval [39.747235541498135]
We propose a novel T2VR subtask termed Partially Relevant Video Retrieval (PRVR)
PRVR aims to retrieve partially relevant videos from a large collection of untrimmed videos.
We formulate PRVR as a multiple instance learning (MIL) problem, where a video is simultaneously viewed as a bag of video clips and a bag of video frames.
arXiv Detail & Related papers (2022-08-26T09:07:16Z) - A Feature-space Multimodal Data Augmentation Technique for Text-video
Retrieval [16.548016892117083]
Text-video retrieval methods have received increased attention over the past few years.
Data augmentation techniques were introduced to increase the performance on unseen test examples.
We propose a multimodal data augmentation technique which works in the feature space and creates new videos and captions by mixing semantically similar samples.
arXiv Detail & Related papers (2022-08-03T14:05:20Z) - Self-Supervised Video Representation Learning by Video Incoherence
Detection [28.540645395066434]
This paper introduces a novel self-supervised method that leverages incoherence detection for video representation learning.
It roots from the observation that visual systems of human beings can easily identify video incoherence based on their comprehensive understanding of videos.
arXiv Detail & Related papers (2021-09-26T04:58:13Z) - Less is More: ClipBERT for Video-and-Language Learning via Sparse
Sampling [98.41300980759577]
A canonical approach to video-and-language learning dictates a neural model to learn from offline-extracted dense video features.
We propose a generic framework ClipBERT that enables affordable end-to-end learning for video-and-language tasks.
Experiments on text-to-video retrieval and video question answering on six datasets demonstrate that ClipBERT outperforms existing methods.
arXiv Detail & Related papers (2021-02-11T18:50:16Z) - Efficient video integrity analysis through container characterization [77.45740041478743]
We introduce a container-based method to identify the software used to perform a video manipulation.
The proposed method is both efficient and effective and can also provide a simple explanation for its decisions.
It achieves an accuracy of 97.6% in distinguishing pristine from tampered videos and classifying the editing software.
arXiv Detail & Related papers (2021-01-26T14:13:39Z) - Self-supervised Video Representation Learning Using Inter-intra
Contrastive Framework [43.002621928500425]
We propose a self-supervised method to learn feature representations from videos.
Because video representation is important, we extend negative samples by introducing intra-negative samples.
We conduct experiments on video retrieval and video recognition tasks using the learned video representation.
arXiv Detail & Related papers (2020-08-06T09:08:14Z) - Feature Re-Learning with Data Augmentation for Video Relevance
Prediction [35.87597969685573]
Re-learning is realized by projecting a given deep feature into a new space by an affine transformation.
We propose a new data augmentation strategy which works directly on frame-level and video-level features.
arXiv Detail & Related papers (2020-04-08T05:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.