TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries
- URL: http://arxiv.org/abs/2407.06597v2
- Date: Wed, 24 Jul 2024 03:54:53 GMT
- Title: TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries
- Authors: Renjie Liang, Li Li, Chongzhi Zhang, Jing Wang, Xizhou Zhu, Aixin Sun,
- Abstract summary: We propose the task of textitRanked Video Moment Retrieval (RVMR) to locate a ranked list of matching moments from a collection of videos, through queries in natural language.
We develop the TVR-Ranking dataset, based on the raw videos and existing moment annotations provided in the TVR dataset.
Our experiments show that the new RVMR task brings new challenges to existing models and we believe this new dataset contributes to the research on multi-modality search.
- Score: 46.492091661862034
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose the task of \textit{Ranked Video Moment Retrieval} (RVMR) to locate a ranked list of matching moments from a collection of videos, through queries in natural language. Although a few related tasks have been proposed and studied by CV, NLP, and IR communities, RVMR is the task that best reflects the practical setting of moment search. To facilitate research in RVMR, we develop the TVR-Ranking dataset, based on the raw videos and existing moment annotations provided in the TVR dataset. Our key contribution is the manual annotation of relevance levels for 94,442 query-moment pairs. We then develop the $NDCG@K, IoU\geq \mu$ evaluation metric for this new task and conduct experiments to evaluate three baseline models. Our experiments show that the new RVMR task brings new challenges to existing models and we believe this new dataset contributes to the research on multi-modality search. The dataset is available at \url{https://github.com/Ranking-VMR/TVR-Ranking}
Related papers
- When One Moment Isn't Enough: Multi-Moment Retrieval with Cross-Moment Interactions [20.739538870657913]
Existing Moment retrieval (MR) methods focus on Single-Moment Retrieval (SMR)<n>This makes the existing datasets and methods insufficient for video temporal grounding.<n>We introduce a high-quality datasets called QVHighlights Multi-Moment dataset (QV-M$2$), along with new evaluation metrics tailored for multi-moment retrieval (MMR)
arXiv Detail & Related papers (2025-10-20T07:01:16Z) - Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning [76.50690734636477]
We introduce Rank-R1, a novel LLM-based reranker that performs reasoning over both the user query and candidate documents before performing the ranking task.
Our experiments on the TREC DL and BRIGHT datasets show that Rank-R1 is highly effective, especially for complex queries.
arXiv Detail & Related papers (2025-03-08T03:14:26Z) - MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos [62.01402470874109]
We present MomentSeeker, a benchmark to evaluate retrieval models' performance in handling general long-video moment retrieval tasks.
It incorporates long videos of over 500 seconds on average, making it the first benchmark specialized for long-video moment retrieval.
It covers a wide range of task categories (including Moment Search, Caption Alignment, Image-conditioned Moment Search, and Video-conditioned Moment Search) and diverse application scenarios.
We further fine-tune an MLLM-based LVMR retriever on synthetic data, which demonstrates strong performance on our benchmark.
arXiv Detail & Related papers (2025-02-18T05:50:23Z) - A Dataset for Evaluating LLM-based Evaluation Functions for Research Question Extraction Task [6.757249766769395]
This dataset consists of machine learning papers, RQ extracted from research papers by GPT-4, and human evaluations of the extracted RQ.
Using this dataset, we systematically compared recently proposed LLM-based evaluation functions for summarizations.
We found that none of the functions showed sufficiently high correlations with human evaluations.
arXiv Detail & Related papers (2024-09-10T21:54:46Z) - Zero-Shot Video Moment Retrieval from Frozen Vision-Language Models [58.17315970207874]
We propose a zero-shot method for adapting generalisable visual-textual priors from arbitrary VLM to facilitate moment-text alignment.
Experiments conducted on three VMR benchmark datasets demonstrate the notable performance advantages of our zero-shot algorithm.
arXiv Detail & Related papers (2023-09-01T13:06:50Z) - CoVR-2: Automatic Data Construction for Composed Video Retrieval [59.854331104466254]
Composed Image Retrieval (CoIR) has recently gained popularity as a task that considers both text and image queries together.
We propose a scalable automatic dataset creation methodology that generates triplets given video-caption pairs.
We also expand the scope of the task to include composed video retrieval (CoVR)
arXiv Detail & Related papers (2023-08-28T17:55:33Z) - MVMR: A New Framework for Evaluating Faithfulness of Video Moment Retrieval against Multiple Distractors [24.858928681280634]
We propose the MVMR (Massive Videos Moment Retrieval for Faithfulness Evaluation) task.
It aims to retrieve video moments within a massive video set, including multiple distractors, to evaluate the faithfulness of VMR models.
For this task, we suggest an automated massive video pool construction framework to categorize negative (distractors) and positive (false-negative) video sets.
arXiv Detail & Related papers (2023-08-15T17:38:55Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - AssistSR: Affordance-centric Question-driven Video Segment Retrieval [4.047098915826058]
Affordance-centric Question-driven Video Segment Retrieval (AQVSR)
We present a new task called Affordance-centric Question-driven Video Segment Retrieval (AQVSR)
arXiv Detail & Related papers (2021-11-30T01:14:10Z) - QVHighlights: Detecting Moments and Highlights in Videos via Natural
Language Queries [89.24431389933703]
We present the Query-based Video Highlights (QVHighlights) dataset.
It consists of over 10,000 YouTube videos, covering a wide range of topics.
Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w.r.t. the query, and (3) five-point scale saliency scores for all query-relevant clips.
arXiv Detail & Related papers (2021-07-20T16:42:58Z) - TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval [111.93601253692165]
TV show Retrieval (TVR) is a new multimodal retrieval dataset.
TVR requires systems to understand both videos and their associated subtitle (dialogue) texts.
The dataset contains 109K queries collected on 21.8K videos from 6 TV shows of diverse genres.
arXiv Detail & Related papers (2020-01-24T17:09:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.