Selective Query-guided Debiasing Network for Video Corpus Moment
Retrieval
- URL: http://arxiv.org/abs/2210.08714v1
- Date: Mon, 17 Oct 2022 03:10:21 GMT
- Title: Selective Query-guided Debiasing Network for Video Corpus Moment
Retrieval
- Authors: Sunjae Yoon, Ji Woo Hong, Eunseop Yoon, Dahyun Kim, Junyeong Kim, Hee
Suk Yoon, and Chang D. Yoo
- Abstract summary: Video moment retrieval aims to localize target moments in untrimmed videos pertinent to a given textual query.
Existing retrieval systems tend to rely on retrieval bias as a shortcut.
We propose a Selective Query-guided Debiasing network (SQuiDNet)
- Score: 19.51766089306712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video moment retrieval (VMR) aims to localize target moments in untrimmed
videos pertinent to a given textual query. Existing retrieval systems tend to
rely on retrieval bias as a shortcut and thus, fail to sufficiently learn
multi-modal interactions between query and video. This retrieval bias stems
from learning frequent co-occurrence patterns between query and moments, which
spuriously correlate objects (e.g., a pencil) referred in the query with
moments (e.g., scene of writing with a pencil) where the objects frequently
appear in the video, such that they converge into biased moment predictions.
Although recent debiasing methods have focused on removing this retrieval bias,
we argue that these biased predictions sometimes should be preserved because
there are many queries where biased predictions are rather helpful. To
conjugate this retrieval bias, we propose a Selective Query-guided Debiasing
network (SQuiDNet), which incorporates the following two main properties: (1)
Biased Moment Retrieval that intentionally uncovers the biased moments inherent
in objects of the query and (2) Selective Query-guided Debiasing that performs
selective debiasing guided by the meaning of the query. Our experimental
results on three moment retrieval benchmarks (i.e., TVR, ActivityNet, DiDeMo)
show the effectiveness of SQuiDNet and qualitative analysis shows improved
interpretability.
Related papers
- Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy
for Temporal Sentence Grounding in Video [67.24316233946381]
Temporal Sentence Grounding in Video (TSGV) is troubled by dataset bias issue.
We propose the bias-conflict sample synthesis and adversarial removal debias strategy (BSSARD)
arXiv Detail & Related papers (2024-01-15T09:59:43Z) - Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal
Intervention [72.12974259966592]
We present a unique and systematic study of a temporal bias due to frame length discrepancy between training and test sets of trimmed video clips.
We propose a causal debiasing approach and perform extensive experiments and ablation studies on the Epic-Kitchens-100, YouCook2, and MSR-VTT datasets.
arXiv Detail & Related papers (2023-09-17T15:58:27Z) - Towards Video Anomaly Retrieval from Video Anomaly Detection: New
Benchmarks and Model [70.97446870672069]
Video anomaly detection (VAD) has been paid increasing attention due to its potential applications.
Video Anomaly Retrieval ( VAR) aims to pragmatically retrieve relevant anomalous videos by cross-modalities.
We present two benchmarks, UCFCrime-AR and XD-Violence, constructed on top of prevalent anomaly datasets.
arXiv Detail & Related papers (2023-07-24T06:22:37Z) - Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval [20.493241098064665]
Video corpus moment retrieval (VCMR) is the task to retrieve the most relevant video moment from a large video corpus using a natural language query.
We propose a self-supervised learning framework: Modal-specific Pseudo Query Generation Network (MPGN)
MPGN generates pseudo queries exploiting both visual and textual information from selected temporal moments.
We show that MPGN successfully learns to localize the video corpus moment without any explicit annotation.
arXiv Detail & Related papers (2022-10-23T05:05:18Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Learning Sample Importance for Cross-Scenario Video Temporal Grounding [30.82619216537177]
The paper investigates some superficial biases specific to the temporal grounding task.
We propose a novel method called Debiased Temporal Language Localizer (DebiasTLL) to prevent the model from naively memorizing the biases.
We evaluate the proposed model in cross-scenario temporal grounding, where the train / test data are heterogeneously sourced.
arXiv Detail & Related papers (2022-01-08T15:41:38Z) - Towards Debiasing Temporal Sentence Grounding in Video [59.42702544312366]
temporal sentence grounding in video (TSGV) task is to locate a temporal moment from an untrimmed video, to match a language query.
Without considering bias in moment annotations, many models tend to capture statistical regularities of the moment annotations.
We propose two debiasing strategies, data debiasing and model debiasing, to "force" a TSGV model to capture cross-modal interactions.
arXiv Detail & Related papers (2021-11-08T08:18:25Z) - Deconfounded Video Moment Retrieval with Causal Intervention [80.90604360072831]
We tackle the task of video moment retrieval (VMR), which aims to localize a specific moment in a video according to a textual query.
Existing methods primarily model the matching relationship between query and moment by complex cross-modal interactions.
We propose a causality-inspired VMR framework that builds structural causal model to capture the true effect of query and video content on the prediction.
arXiv Detail & Related papers (2021-06-03T01:33:26Z) - Uncovering Hidden Challenges in Query-Based Video Moment Retrieval [29.90001703587512]
We present a series of experiments assessing how well the benchmark results reflect the true progress in solving the moment retrieval task.
Our results indicate substantial biases in the popular datasets and unexpected behaviour of the state-of-the-art models.
We suggest possible directions to improve the temporal sentence grounding in the future.
arXiv Detail & Related papers (2020-09-01T10:07:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.