Related papers: IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment

IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment

URL: http://arxiv.org/abs/2510.11647v1
Date: Mon, 13 Oct 2025 17:27:08 GMT
Title: IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment
Authors: Yinan Chen, Jiangning Zhang, Teng Hu, Yuxiang Zeng, Zhucun Xue, Qingdong He, Chengjie Wang, Yong Liu, Xiaobin Hu, Shuicheng Yan,
Abstract summary: IVEBench is a benchmark suite specifically designed for instruction-guided video editing assessment.<n>It comprises a diverse database of 600 high-quality source videos, spanning seven semantic dimensions, and covering video lengths ranging from 32 to 1,024 frames.<n>IVEBench establishes a three-dimensional evaluation protocol encompassing video quality, instruction compliance and video fidelity.
Score: 108.8652018167452
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Instruction-guided video editing has emerged as a rapidly advancing research direction, offering new opportunities for intuitive content transformation while also posing significant challenges for systematic evaluation. Existing video editing benchmarks fail to support the evaluation of instruction-guided video editing adequately and further suffer from limited source diversity, narrow task coverage and incomplete evaluation metrics. To address the above limitations, we introduce IVEBench, a modern benchmark suite specifically designed for instruction-guided video editing assessment. IVEBench comprises a diverse database of 600 high-quality source videos, spanning seven semantic dimensions, and covering video lengths ranging from 32 to 1,024 frames. It further includes 8 categories of editing tasks with 35 subcategories, whose prompts are generated and refined through large language models and expert review. Crucially, IVEBench establishes a three-dimensional evaluation protocol encompassing video quality, instruction compliance and video fidelity, integrating both traditional metrics and multimodal large language model-based assessments. Extensive experiments demonstrate the effectiveness of IVEBench in benchmarking state-of-the-art instruction-guided video editing methods, showing its ability to provide comprehensive and human-aligned evaluation outcomes.

Related papers

UniVBench: Towards Unified Evaluation for Video Foundation Models [29.73247324829126]
Video foundation models aim to integrate video understanding, generation, editing, and instruction following within a single framework.<n>We introduce UniVBench, a benchmark for evaluating video foundation models across four core abilities.<n>Our benchmark substantially expands the complexity of evaluation by incorporating 200 high-quality, diverse and multi-shot videos.
arXiv Detail & Related papers (2026-02-25T12:08:53Z)
ReViSE: Towards Reason-Informed Video Editing in Unified Models with Self-Reflective Learning [57.08352504712699]
Video unified models exhibit strong capabilities in understanding and generation, yet they struggle with reason-informed visual editing.<n>We introduce the Reason-Informed Video Editing (RVE) task, which requires reasoning about physical plausibility and causal dynamics during editing.<n>We propose ReViSE, a framework that unifies generation and evaluation within a single architecture.
arXiv Detail & Related papers (2025-12-10T18:57:09Z)
In-Context Learning with Unpaired Clips for Instruction-based Video Editing [51.943707933717185]
We introduce a low-cost pretraining strategy for instruction-based video editing.<n>Our framework first pretrains on approximately 1M real video clips to learn basic editing concepts.<n>Our method surpasses existing instruction-based video editing approaches in both instruction alignment and visual fidelity.
arXiv Detail & Related papers (2025-10-16T13:02:11Z)
VideoScore2: Think before You Score in Generative Video Evaluation [69.43069741467603]
VideoScore2 is a multi-dimensional, interpretable, and human-aligned framework that explicitly evaluates visual quality, text-to-video alignment, and physical/common-sense consistency.<n>Our model is trained on a large-scale dataset VideoFeedback2 containing 27,168 human-annotated videos.
arXiv Detail & Related papers (2025-09-26T18:09:03Z)
TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs [54.44479359918971]
We introduce TDVE-DB, a large-scale benchmark dataset for text-driven video editing.<n> TDVE-DB consists of 3,857 edited videos generated from 12 diverse models across 8 editing categories.<n>We propose TDVE-Assessor, a novel VQA model specifically designed for text-driven video editing assessment.
arXiv Detail & Related papers (2025-05-26T05:47:09Z)
SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models [80.3895950009792]
Achieving fine-grained-temporal understanding in videos remains a major challenge for current Video Large Multimodels (Video LMMs)<n>We contribute in three core aspects: dataset, model, and benchmark.<n>First, we introduce SAMA-239K, a large-scale dataset comprising 15K videos specifically to enable joint learning of video understanding, grounding, and multi-turn video chat.<n>Second, we propose the SAMA model, which incorporates a versatile-temporal context aggregator and a Segment Model to jointly enhance fine-grained video comprehension and precise grounding capabilities.
arXiv Detail & Related papers (2025-05-24T18:13:16Z)
VEU-Bench: Towards Comprehensive Understanding of Video Editing [4.9254235505057835]
We introduce VEU-Bench (Video Editing Understanding Benchmark), a comprehensive benchmark that categorizes video editing components across various dimensions.<n>Unlike previous video editing understanding benchmarks that focus mainly on editing element classification, VEU-Bench encompasses 19 fine-grained tasks across three stages: recognition, reasoning, and judging.<n>We develop Oscars, a VEU expert model fine-tuned on the curated VEU-Bench dataset. It outperforms existing open-source Vid-LLMs on VEU-Bench by over 28.3% in accuracy and performance comparable to commercial models like GPT-4o.
arXiv Detail & Related papers (2025-04-24T04:36:28Z)
EditBoard: Towards a Comprehensive Evaluation Benchmark for Text-Based Video Editing Models [16.045012576543474]
Text-based video editing has emerged as a promising field, enabling precise modifications to videos based on text prompts.<n>Existing evaluations are limited and inconsistent, typically summarizing overall performance with a single score.<n>We propose EditBoard, the first comprehensive evaluation benchmark for text-based video editing models.
arXiv Detail & Related papers (2024-09-15T08:43:18Z)
VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment [6.627422081288281]
We introduce VE-Bench, a benchmark suite tailored to the assessment of text-driven video editing.<n>This suite includes VE-Bench DB, a video quality assessment (VQA) database for video editing.<n> VE-Bench QA focuses on the text-video alignment and the relevance modeling between source and edited videos.
arXiv Detail & Related papers (2024-08-21T09:49:32Z)
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models [81.84810348214113]
Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, and a diverse range of user inquiries. To guide the development of such a model, the establishment of a robust and comprehensive evaluation system becomes crucial. This paper proposes textitVideo-Bench, a new comprehensive benchmark along with a toolkit specifically designed for evaluating Video-LLMs.
arXiv Detail & Related papers (2023-11-27T18:59:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.