Fugu-MT 論文翻訳(概要): OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing

論文の概要: OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing

arxiv url: http://arxiv.org/abs/2512.07826v2
Date: Tue, 16 Dec 2025 16:37:57 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-17 14:48:05.90004
Title: OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing
Title（参考訳）: OpenVE-3M:インストラクションガイドによるビデオ編集のための大規模高品質データセット
Authors: Haoyang He, Jie Wang, Jiangning Zhang, Zhucun Xue, Xingyuan Bu, Qiangpeng Yang, Shilei Wen, Lei Xie,
Abstract要約: OpenVE-3Mは、命令ベースのビデオ編集のためのオープンソース、大規模、高品質なデータセットである。この分野での統一ベンチマークの欠如に対処するため,431組のビデオ編集ペアを含むOpenVE-Benchを構築した。データセットに基づいてトレーニングされた5BモデルであるOpenVE-Editを提示する。
参考スコア（独自算出の注目度）: 40.87442780303236
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The quality and diversity of instruction-based image editing datasets are continuously increasing, yet large-scale, high-quality datasets for instruction-based video editing remain scarce. To address this gap, we introduce OpenVE-3M, an open-source, large-scale, and high-quality dataset for instruction-based video editing. It comprises two primary categories: spatially-aligned edits (Global Style, Background Change, Local Change, Local Remove, Local Add, and Subtitles Edit) and non-spatially-aligned edits (Camera Multi-Shot Edit and Creative Edit). All edit types are generated via a meticulously designed data pipeline with rigorous quality filtering. OpenVE-3M surpasses existing open-source datasets in terms of scale, diversity of edit types, instruction length, and overall quality. Furthermore, to address the lack of a unified benchmark in the field, we construct OpenVE-Bench, containing 431 video-edit pairs that cover a diverse range of editing tasks with three key metrics highly aligned with human judgment. We present OpenVE-Edit, a 5B model trained on our dataset that demonstrates remarkable efficiency and effectiveness by setting a new state-of-the-art on OpenVE-Bench, outperforming all prior open-source models including a 14B baseline. Project page is at https://lewandofskee.github.io/projects/OpenVE.
Abstract（参考訳）: 命令ベースの画像編集データセットの品質と多様性は継続的に増加しているが、命令ベースのビデオ編集のための大規模かつ高品質なデータセットは依然として不足している。このギャップに対処するため,オープンソースの大規模かつ高品質な映像編集用データセットOpenVE-3Mを導入する。空間的に整列した編集(Global Style、Backside Change、Local Change、Local Remove、Local Add、Subtitles Edit)と非親密な編集(Camera Multi-Shot Edit、Creative Edit)である。すべての編集タイプは、厳密に設計されたデータパイプラインを通じて、厳密な品質のフィルタリングによって生成される。 OpenVE-3Mは、スケール、編集タイプ、命令長、全体的な品質の点で、既存のオープンソースデータセットを超える。さらに、この分野での統一ベンチマークの欠如に対処するため、我々はOpenVE-Benchを構築し、人間の判断に高度に適合した3つの重要な指標を用いて、多種多様な編集タスクをカバーする431の動画編集ペアを構築した。 OpenVE-Editはデータセットに基づいてトレーニングされた5Bモデルで、OpenVE-Benchに新たな最先端設定を行うことで、14Bベースラインを含む以前のすべてのオープンソースモデルよりも優れた効率と効果を示す。プロジェクトページはhttps://lewandofskee.github.io/projects/OpenVEにある。

論文の概要: OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing

関連論文リスト