Interactive Video Corpus Moment Retrieval using Reinforcement Learning
- URL: http://arxiv.org/abs/2302.09522v1
- Date: Sun, 19 Feb 2023 09:48:23 GMT
- Title: Interactive Video Corpus Moment Retrieval using Reinforcement Learning
- Authors: Zhixin Ma and Chong-Wah Ngo
- Abstract summary: This paper tackles the problem by reinforcement learning, aiming to reach a search target within a few rounds of interaction by long-term learning from user feedbacks.
We conduct experiments for the challenging task of video corpus moment retrieval (VCMR) to localize moments from a large video corpus.
- Score: 35.38916770127218
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Known-item video search is effective with human-in-the-loop to interactively
investigate the search result and refine the initial query. Nevertheless, when
the first few pages of results are swamped with visually similar items, or the
search target is hidden deep in the ranked list, finding the know-item target
usually requires a long duration of browsing and result inspection. This paper
tackles the problem by reinforcement learning, aiming to reach a search target
within a few rounds of interaction by long-term learning from user feedbacks.
Specifically, the system interactively plans for navigation path based on
feedback and recommends a potential target that maximizes the long-term reward
for user comment. We conduct experiments for the challenging task of video
corpus moment retrieval (VCMR) to localize moments from a large video corpus.
The experimental results on TVR and DiDeMo datasets verify that our proposed
work is effective in retrieving the moments that are hidden deep inside the
ranked lists of CONQUER and HERO, which are the state-of-the-art auto-search
engines for VCMR.
Related papers
- MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos [62.01402470874109]
We present MomentSeeker, a benchmark to evaluate retrieval models' performance in handling general long-video moment retrieval tasks.
It incorporates long videos of over 500 seconds on average, making it the first benchmark specialized for long-video moment retrieval.
It covers a wide range of task categories (including Moment Search, Caption Alignment, Image-conditioned Moment Search, and Video-conditioned Moment Search) and diverse application scenarios.
We further fine-tune an MLLM-based LVMR retriever on synthetic data, which demonstrates strong performance on our benchmark.
arXiv Detail & Related papers (2025-02-18T05:50:23Z) - A Flexible and Scalable Framework for Video Moment Search [51.47907684209207]
This paper introduces a flexible framework for retrieving a ranked list of moments from collection of videos in any length to match a text query.
Our framework, called Segment-Proposal-Ranking (SPR), simplifies the search process into three independent stages: segment retrieval, proposal generation, and moment refinement with re-ranking.
Evaluations on the TVR-Ranking dataset demonstrate that our framework achieves state-of-the-art performance with significant reductions in computational cost and processing time.
arXiv Detail & Related papers (2025-01-09T08:54:19Z) - Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning [56.873534081386]
A new topic HIREST is presented, including video retrieval, moment retrieval, moment segmentation, and step-captioning.
We propose a query-centric audio-visual cognition network to construct a reliable multi-modal representation for three tasks.
This can cognize user-preferred content and thus attain a query-centric audio-visual representation for three tasks.
arXiv Detail & Related papers (2024-12-18T06:43:06Z) - VCA: Video Curious Agent for Long Video Understanding [44.19323180593379]
We introduce a curiosity-driven video agent with self-exploration capability, dubbed as VCA.
Built upon VLMs, VCA autonomously navigates video segments and efficiently builds a comprehensive understanding of complex video sequences.
arXiv Detail & Related papers (2024-12-12T23:39:54Z) - Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement [72.7576395034068]
Video Corpus Moment Retrieval (VCMR) is a new video retrieval task aimed at retrieving a relevant moment from a large corpus of untrimmed videos using a text query.
We argue that effectively capturing the partial relevance between the query and video is essential for the VCMR task.
For video retrieval, we introduce a multi-modal collaborative video retriever, generating different query representations for the two modalities.
For moment localization, we propose the focus-then-fuse moment localizer, utilizing modality-specific gates to capture essential content.
arXiv Detail & Related papers (2024-02-21T07:16:06Z) - Deep Learning for Video-Text Retrieval: a Review [13.341694455581363]
Video-Text Retrieval (VTR) aims to search for the most relevant video related to the semantics in a given sentence.
In this survey, we review and summarize over 100 research papers related to VTR.
arXiv Detail & Related papers (2023-02-24T10:14:35Z) - Uncovering Hidden Challenges in Query-Based Video Moment Retrieval [29.90001703587512]
We present a series of experiments assessing how well the benchmark results reflect the true progress in solving the moment retrieval task.
Our results indicate substantial biases in the popular datasets and unexpected behaviour of the state-of-the-art models.
We suggest possible directions to improve the temporal sentence grounding in the future.
arXiv Detail & Related papers (2020-09-01T10:07:23Z) - Temporal Context Aggregation for Video Retrieval with Contrastive
Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features.
The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.