Learning Sample Importance for Cross-Scenario Video Temporal Grounding
- URL: http://arxiv.org/abs/2201.02848v1
- Date: Sat, 8 Jan 2022 15:41:38 GMT
- Title: Learning Sample Importance for Cross-Scenario Video Temporal Grounding
- Authors: Peijun Bao, Yadong Mu
- Abstract summary: The paper investigates some superficial biases specific to the temporal grounding task.
We propose a novel method called Debiased Temporal Language Localizer (DebiasTLL) to prevent the model from naively memorizing the biases.
We evaluate the proposed model in cross-scenario temporal grounding, where the train / test data are heterogeneously sourced.
- Score: 30.82619216537177
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of temporal grounding aims to locate video moment in an untrimmed
video, with a given sentence query. This paper for the first time investigates
some superficial biases that are specific to the temporal grounding task, and
proposes a novel targeted solution. Most alarmingly, we observe that existing
temporal ground models heavily rely on some biases (e.g., high preference on
frequent concepts or certain temporal intervals) in the visual modal. This
leads to inferior performance when generalizing the model in cross-scenario
test setting. To this end, we propose a novel method called Debiased Temporal
Language Localizer (DebiasTLL) to prevent the model from naively memorizing the
biases and enforce it to ground the query sentence based on true inter-modal
relationship. Debias-TLL simultaneously trains two models. By our design, a
large discrepancy of these two models' predictions when judging a sample
reveals higher probability of being a biased sample. Harnessing the informative
discrepancy, we devise a data re-weighing scheme for mitigating the data
biases. We evaluate the proposed model in cross-scenario temporal grounding,
where the train / test data are heterogeneously sourced. Experiments show
large-margin superiority of the proposed method in comparison with
state-of-the-art competitors.
Related papers
- Looking at Model Debiasing through the Lens of Anomaly Detection [11.113718994341733]
Deep neural networks are sensitive to bias in the data.
We propose a new bias identification method based on anomaly detection.
We reach state-of-the-art performance on synthetic and real benchmark datasets.
arXiv Detail & Related papers (2024-07-24T17:30:21Z) - Debiased Model-based Interactive Recommendation [22.007617148466807]
We develop a model called textbfidentifiable textbfDebiased textbfModel-based textbfInteractive textbfRecommendation (textbfiDMIR in short)
For the first drawback, we devise a debiased causal world model based on the causal mechanism of the time-varying recommendation generation process with identification guarantees.
For the second drawback, we devise a debiased contrastive policy, which coincides with the debiased contrastive learning and avoids sampling bias
arXiv Detail & Related papers (2024-02-24T14:10:04Z) - Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy
for Temporal Sentence Grounding in Video [67.24316233946381]
Temporal Sentence Grounding in Video (TSGV) is troubled by dataset bias issue.
We propose the bias-conflict sample synthesis and adversarial removal debias strategy (BSSARD)
arXiv Detail & Related papers (2024-01-15T09:59:43Z) - Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal
Intervention [72.12974259966592]
We present a unique and systematic study of a temporal bias due to frame length discrepancy between training and test sets of trimmed video clips.
We propose a causal debiasing approach and perform extensive experiments and ablation studies on the Epic-Kitchens-100, YouCook2, and MSR-VTT datasets.
arXiv Detail & Related papers (2023-09-17T15:58:27Z) - MomentDiff: Generative Video Moment Retrieval from Random to Real [71.40038773943638]
We provide a generative diffusion-based framework called MomentDiff.
MomentDiff simulates a typical human retrieval process from random browsing to gradual localization.
We show that MomentDiff consistently outperforms state-of-the-art methods on three public benchmarks.
arXiv Detail & Related papers (2023-07-06T09:12:13Z) - Echoes: Unsupervised Debiasing via Pseudo-bias Labeling in an Echo
Chamber [17.034228910493056]
This paper presents experimental analyses revealing that the existing biased models overfit to bias-conflicting samples in the training data.
We propose a straightforward and effective method called Echoes, which trains a biased model and a target model with a different strategy.
Our approach achieves superior debiasing results compared to the existing baselines on both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-05-06T13:13:18Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Towards Debiasing Temporal Sentence Grounding in Video [59.42702544312366]
temporal sentence grounding in video (TSGV) task is to locate a temporal moment from an untrimmed video, to match a language query.
Without considering bias in moment annotations, many models tend to capture statistical regularities of the moment annotations.
We propose two debiasing strategies, data debiasing and model debiasing, to "force" a TSGV model to capture cross-modal interactions.
arXiv Detail & Related papers (2021-11-08T08:18:25Z) - Deconfounded Video Moment Retrieval with Causal Intervention [80.90604360072831]
We tackle the task of video moment retrieval (VMR), which aims to localize a specific moment in a video according to a textual query.
Existing methods primarily model the matching relationship between query and moment by complex cross-modal interactions.
We propose a causality-inspired VMR framework that builds structural causal model to capture the true effect of query and video content on the prediction.
arXiv Detail & Related papers (2021-06-03T01:33:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.