Towards Debiasing Temporal Sentence Grounding in Video
- URL: http://arxiv.org/abs/2111.04321v1
- Date: Mon, 8 Nov 2021 08:18:25 GMT
- Title: Towards Debiasing Temporal Sentence Grounding in Video
- Authors: Hao Zhang and Aixin Sun and Wei Jing and Joey Tianyi Zhou
- Abstract summary: temporal sentence grounding in video (TSGV) task is to locate a temporal moment from an untrimmed video, to match a language query.
Without considering bias in moment annotations, many models tend to capture statistical regularities of the moment annotations.
We propose two debiasing strategies, data debiasing and model debiasing, to "force" a TSGV model to capture cross-modal interactions.
- Score: 59.42702544312366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The temporal sentence grounding in video (TSGV) task is to locate a temporal
moment from an untrimmed video, to match a language query, i.e., a sentence.
Without considering bias in moment annotations (e.g., start and end positions
in a video), many models tend to capture statistical regularities of the moment
annotations, and do not well learn cross-modal reasoning between video and
language query. In this paper, we propose two debiasing strategies, data
debiasing and model debiasing, to "force" a TSGV model to capture cross-modal
interactions. Data debiasing performs data oversampling through video
truncation to balance moment temporal distribution in train set. Model
debiasing leverages video-only and query-only models to capture the
distribution bias, and forces the model to learn cross-modal interactions.
Using VSLNet as the base model, we evaluate impact of the two strategies on two
datasets that contain out-of-distribution test instances. Results show that
both strategies are effective in improving model generalization capability.
Equipped with both debiasing strategies, VSLNet achieves best results on both
datasets.
Related papers
- Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy
for Temporal Sentence Grounding in Video [67.24316233946381]
Temporal Sentence Grounding in Video (TSGV) is troubled by dataset bias issue.
We propose the bias-conflict sample synthesis and adversarial removal debias strategy (BSSARD)
arXiv Detail & Related papers (2024-01-15T09:59:43Z) - Mitigating Representation Bias in Action Recognition: Algorithms and
Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects.
We tackle this problem from two different angles: algorithm and dataset.
We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z) - Meta Spatio-Temporal Debiasing for Video Scene Graph Generation [22.216881800098726]
We propose a novel Meta Video Scene Generation (MVSGG) framework to address bias problem.
Our framework first constructs a support set and a group query sets from the training data.
Then, by performing a meta training and testing process to optimize the model, our framework can effectively guide the model to learn well against biases.
arXiv Detail & Related papers (2022-07-23T07:06:06Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Learning Sample Importance for Cross-Scenario Video Temporal Grounding [30.82619216537177]
The paper investigates some superficial biases specific to the temporal grounding task.
We propose a novel method called Debiased Temporal Language Localizer (DebiasTLL) to prevent the model from naively memorizing the biases.
We evaluate the proposed model in cross-scenario temporal grounding, where the train / test data are heterogeneously sourced.
arXiv Detail & Related papers (2022-01-08T15:41:38Z) - Greedy Gradient Ensemble for Robust Visual Question Answering [163.65789778416172]
We stress the language bias in Visual Question Answering (VQA) that comes from two aspects, i.e., distribution bias and shortcut bias.
We propose a new de-bias framework, Greedy Gradient Ensemble (GGE), which combines multiple biased models for unbiased base model learning.
GGE forces the biased models to over-fit the biased data distribution in priority, thus makes the base model pay more attention to examples that are hard to solve by biased models.
arXiv Detail & Related papers (2021-07-27T08:02:49Z) - Interventional Video Grounding with Dual Contrastive Learning [16.0734337895897]
Video grounding aims to localize a moment from an untrimmed video for a given textual query.
We propose a novel paradigm from the perspective of causal inference to uncover the causality behind the model and data.
We also introduce a dual contrastive learning approach to better align the text and video.
arXiv Detail & Related papers (2021-06-21T12:11:28Z) - Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases.
First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method.
The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.