Near-duplicate video detection featuring coupled temporal and perceptual
visual structures and logical inference based matching
- URL: http://arxiv.org/abs/2005.07356v1
- Date: Fri, 15 May 2020 04:45:52 GMT
- Title: Near-duplicate video detection featuring coupled temporal and perceptual
visual structures and logical inference based matching
- Authors: B. Tahayna, M. Belkhatir
- Abstract summary: We propose an architecture for near-duplicate video detection based on: (i) index and query signature based structures integrating temporal and perceptual visual features.
For matching, we propose to instantiate a retrieval model based on logical inference through the coupling of an N-gram sliding window process and theoretically-sound lattice-based structures.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose in this paper an architecture for near-duplicate video detection
based on: (i) index and query signature based structures integrating temporal
and perceptual visual features and (ii) a matching framework computing the
logical inference between index and query documents. As far as indexing is
concerned, instead of concatenating low-level visual features in
high-dimensional spaces which results in curse of dimensionality and redundancy
issues, we adopt a perceptual symbolic representation based on color and
texture concepts. For matching, we propose to instantiate a retrieval model
based on logical inference through the coupling of an N-gram sliding window
process and theoretically-sound lattice-based structures. The techniques we
cover are robust and insensitive to general video editing and/or degradation,
making it ideal for re-broadcasted video search. Experiments are carried out on
large quantities of video data collected from the TRECVID 02, 03 and 04
collections and real-world video broadcasts recorded from two German TV
stations. An empirical comparison over two state-of-the-art dynamic programming
techniques is encouraging and demonstrates the advantage and feasibility of our
method.
Related papers
- Text-Video Retrieval via Variational Multi-Modal Hypergraph Networks [25.96897989272303]
Main obstacle for text-video retrieval is the semantic gap between the textual nature of queries and the visual richness of video content.
We propose chunk-level text-video matching, where the query chunks are extracted to describe a specific retrieval unit.
We formulate the chunk-level matching as n-ary correlations modeling between words of the query and frames of the video.
arXiv Detail & Related papers (2024-01-06T09:38:55Z) - Jointly Visual- and Semantic-Aware Graph Memory Networks for Temporal
Sentence Localization in Videos [67.12603318660689]
We propose a novel Hierarchical Visual- and Semantic-Aware Reasoning Network (HVSARN)
HVSARN enables both visual- and semantic-aware query reasoning from object-level to frame-level.
Experiments on three datasets demonstrate that our HVSARN achieves a new state-of-the-art performance.
arXiv Detail & Related papers (2023-03-02T08:00:22Z) - Correspondence Matters for Video Referring Expression Comprehension [64.60046797561455]
Video Referring Expression (REC) aims to localize the referent objects described in the sentence to visual regions in the video frames.
Existing methods suffer from two problems: 1) inconsistent localization results across video frames; 2) confusion between the referent and contextual objects.
We propose a novel Dual Correspondence Network (dubbed as DCNet) which explicitly enhances the dense associations in both the inter-frame and cross-modal manners.
arXiv Detail & Related papers (2022-07-21T10:31:39Z) - Condensing a Sequence to One Informative Frame for Video Recognition [113.3056598548736]
This paper studies a two-step alternative that first condenses the video sequence to an informative "frame"
A valid question is how to define "useful information" and then distill from a sequence down to one synthetic frame.
IFS consistently demonstrates evident improvements on image-based 2D networks and clip-based 3D networks.
arXiv Detail & Related papers (2022-01-11T16:13:43Z) - Video Imprint [107.1365846180187]
A new unified video analytics framework (ER3) is proposed for complex event retrieval, recognition and recounting.
The proposed video imprint representation exploits temporal correlations among image features across video frames.
The video imprint is fed into a reasoning network and a feature aggregation module, for event recognition/recounting and event retrieval tasks, respectively.
arXiv Detail & Related papers (2021-06-07T00:32:47Z) - Video Corpus Moment Retrieval with Contrastive Learning [56.249924768243375]
Video corpus moment retrieval (VCMR) is to retrieve a temporal moment that semantically corresponds to a given text query.
We propose a Retrieval and Localization Network with Contrastive Learning (ReLoCLNet) for VCMR.
Experimental results show that ReLoCLNet encodes text and video separately for efficiency, its retrieval accuracy is comparable with baselines adopting cross-modal interaction learning.
arXiv Detail & Related papers (2021-05-13T12:54:39Z) - Adaptive Intermediate Representations for Video Understanding [50.64187463941215]
We introduce a new way to leverage semantic segmentation as an intermediate representation for video understanding.
We propose a general framework which learns the intermediate representations (optical flow and semantic segmentation) jointly with the final video understanding task.
We obtain more powerful visual representations for videos which lead to performance gains over the state-of-the-art.
arXiv Detail & Related papers (2021-04-14T21:37:23Z) - FOCAL: A Forgery Localization Framework based on Video Coding
Self-Consistency [26.834506269499094]
This paper presents a video forgery localization framework that verifies the self-consistency of coding traces between and within video frames.
The overall framework was validated in two typical forgery scenarios: temporal and spatial splicing.
Experimental results show an improvement to the state-of-the-art on temporal splicing localization and also promising performance in the newly tackled case of spatial splicing.
arXiv Detail & Related papers (2020-08-24T13:55:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.