Moment of Untruth: Dealing with Negative Queries in Video Moment Retrieval
- URL: http://arxiv.org/abs/2502.08544v2
- Date: Thu, 13 Feb 2025 15:04:42 GMT
- Title: Moment of Untruth: Dealing with Negative Queries in Video Moment Retrieval
- Authors: Kevin Flanagan, Dima Damen, Michael Wray,
- Abstract summary: Video Moment Retrieval is a common task to evaluate the performance of visual-language models.
We propose the task of Negative-Aware Video Moment Retrieval (NA-VMR), which considers both moment retrieval accuracy and negative query rejection accuracy.
We analyse the ability of current SOTA video moment retrieval approaches to adapt to Negative-Aware Video Moment Retrieval and propose UniVTG-NA, an adaptation of UniVTG designed to tackle NA-VMR.
- Score: 23.625455539458606
- License:
- Abstract: Video Moment Retrieval is a common task to evaluate the performance of visual-language models - it involves localising start and end times of moments in videos from query sentences. The current task formulation assumes that the queried moment is present in the video, resulting in false positive moment predictions when irrelevant query sentences are provided. In this paper we propose the task of Negative-Aware Video Moment Retrieval (NA-VMR), which considers both moment retrieval accuracy and negative query rejection accuracy. We make the distinction between In-Domain and Out-of-Domain negative queries and provide new evaluation benchmarks for two popular video moment retrieval datasets: QVHighlights and Charades-STA. We analyse the ability of current SOTA video moment retrieval approaches to adapt to Negative-Aware Video Moment Retrieval and propose UniVTG-NA, an adaptation of UniVTG designed to tackle NA-VMR. UniVTG-NA achieves high negative rejection accuracy (avg. $98.4\%$) scores while retaining moment retrieval scores to within $3.87\%$ Recall@1. Dataset splits and code are available at https://github.com/keflanagan/MomentofUntruth
Related papers
- A Flexible and Scalable Framework for Video Moment Search [51.47907684209207]
This paper introduces a flexible framework for retrieving a ranked list of moments from collection of videos in any length to match a text query.
Our framework, called Segment-Proposal-Ranking (SPR), simplifies the search process into three independent stages: segment retrieval, proposal generation, and moment refinement with re-ranking.
Evaluations on the TVR-Ranking dataset demonstrate that our framework achieves state-of-the-art performance with significant reductions in computational cost and processing time.
arXiv Detail & Related papers (2025-01-09T08:54:19Z) - Video Referring Expression Comprehension via Transformer with
Content-conditioned Query [68.06199031102526]
Video Referring Expression (REC) aims to localize a target object in videos based on the queried natural language.
Recent improvements in video REC have been made using Transformer-based methods with learnable queries.
arXiv Detail & Related papers (2023-10-25T06:38:42Z) - Background-aware Moment Detection for Video Moment Retrieval [19.11524416308641]
Video moment retrieval (VMR) identifies a specific moment in an untrimmed video for a given natural language query.
Due to the ambiguity, a query does not fully cover the relevant details of the corresponding moment.
We propose a background-aware moment detection transformer (BM-DETR)
Our model learns to predict the target moment from the joint probability of each frame given the positive query and the complement of negative queries.
arXiv Detail & Related papers (2023-06-05T09:26:33Z) - Multi-video Moment Ranking with Multimodal Clue [69.81533127815884]
State-of-the-art work for VCMR is based on two-stage method.
MINUTE outperforms the baselines on TVR and DiDeMo datasets.
arXiv Detail & Related papers (2023-01-29T18:38:13Z) - Selective Query-guided Debiasing Network for Video Corpus Moment
Retrieval [19.51766089306712]
Video moment retrieval aims to localize target moments in untrimmed videos pertinent to a given textual query.
Existing retrieval systems tend to rely on retrieval bias as a shortcut.
We propose a Selective Query-guided Debiasing network (SQuiDNet)
arXiv Detail & Related papers (2022-10-17T03:10:21Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - QVHighlights: Detecting Moments and Highlights in Videos via Natural
Language Queries [89.24431389933703]
We present the Query-based Video Highlights (QVHighlights) dataset.
It consists of over 10,000 YouTube videos, covering a wide range of topics.
Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w.r.t. the query, and (3) five-point scale saliency scores for all query-relevant clips.
arXiv Detail & Related papers (2021-07-20T16:42:58Z) - Uncovering Hidden Challenges in Query-Based Video Moment Retrieval [29.90001703587512]
We present a series of experiments assessing how well the benchmark results reflect the true progress in solving the moment retrieval task.
Our results indicate substantial biases in the popular datasets and unexpected behaviour of the state-of-the-art models.
We suggest possible directions to improve the temporal sentence grounding in the future.
arXiv Detail & Related papers (2020-09-01T10:07:23Z) - Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment
Retrieval in Videos [108.55320735031721]
Video moment retrieval aims to localize the target moment in a video according to the given sentence.
Most existing weak-supervised methods apply a MIL-based framework to develop inter-sample confrontment.
We propose a novel Regularized Two-Branch Proposal Network to simultaneously consider the inter-sample and intra-sample confrontments.
arXiv Detail & Related papers (2020-08-19T04:42:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.