The Anatomy of Video Editing: A Dataset and Benchmark Suite for
AI-Assisted Video Editing
- URL: http://arxiv.org/abs/2207.09812v2
- Date: Thu, 21 Jul 2022 06:53:02 GMT
- Title: The Anatomy of Video Editing: A Dataset and Benchmark Suite for
AI-Assisted Video Editing
- Authors: Dawit Mureja Argaw, Fabian Caba Heilbron, Joon-Young Lee, Markus
Woodson, In So Kweon
- Abstract summary: This work introduces the Anatomy of Video Editing, a dataset, and benchmark, to foster research in AI-assisted video editing.
Our benchmark suite focuses on video editing tasks, beyond visual effects, such as automatic footage organization and assisted video assembling.
To enable research on these fronts, we annotate more than 1.5M tags, with relevant concepts to cinematography, from 196176 shots sampled from movie scenes.
- Score: 90.59584961661345
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning is transforming the video editing industry. Recent advances
in computer vision have leveled-up video editing tasks such as intelligent
reframing, rotoscoping, color grading, or applying digital makeups. However,
most of the solutions have focused on video manipulation and VFX. This work
introduces the Anatomy of Video Editing, a dataset, and benchmark, to foster
research in AI-assisted video editing. Our benchmark suite focuses on video
editing tasks, beyond visual effects, such as automatic footage organization
and assisted video assembling. To enable research on these fronts, we annotate
more than 1.5M tags, with relevant concepts to cinematography, from 196176
shots sampled from movie scenes. We establish competitive baseline methods and
detailed analyses for each of the tasks. We hope our work sparks innovative
research towards underexplored areas of AI-assisted video editing.
Related papers
- A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model [10.736207095604414]
We propose a two-stage scheme for general editing. Firstly, unlike previous works that extract scene-specific features, we leverage the pre-trained Vision-Language Model (VLM)
We also propose a Reinforcement Learning (RL)-based editing framework to formulate the editing problem and train the virtual editor to make better sequential editing decisions.
arXiv Detail & Related papers (2024-11-07T18:20:28Z) - Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions [49.14827857853878]
ReimaginedAct comprises video understanding, reasoning, and editing modules.
Our method can accept not only direct instructional text prompts but also what if' questions to predict possible action changes.
arXiv Detail & Related papers (2024-03-11T22:46:46Z) - Reframe Anything: LLM Agent for Open World Video Reframing [0.8424099022563256]
We introduce Reframe Any Video Agent (RAVA), an AI-based agent that restructures visual content for video reframing.
RAVA operates in three stages: perception, where it interprets user instructions and video content; planning, where it determines aspect ratios and reframing strategies; and execution, where it invokes the editing tools to produce the final video.
Our experiments validate the effectiveness of RAVA in video salient object detection and real-world reframing tasks, demonstrating its potential as a tool for AI-powered video editing.
arXiv Detail & Related papers (2024-03-10T03:29:56Z) - Neural Video Fields Editing [56.558490998753456]
NVEdit is a text-driven video editing framework designed to mitigate memory overhead and improve consistency.
We construct a neural video field, powered by tri-plane and sparse grid, to enable encoding long videos with hundreds of frames.
Next, we update the video field through off-the-shelf Text-to-Image (T2I) models to text-driven editing effects.
arXiv Detail & Related papers (2023-12-12T14:48:48Z) - VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion
Models [96.55004961251889]
Video Instruction Diffusion (VIDiff) is a unified foundation model designed for a wide range of video tasks.
Our model can edit and translate the desired results within seconds based on user instructions.
We provide convincing generative results for diverse input videos and written instructions, both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-11-30T18:59:52Z) - AutoTransition: Learning to Recommend Video Transition Effects [20.384463765702417]
We present the premier work on performing automatic video transitions recommendation (VTR)
VTR is given a sequence of raw video shots and companion audio, recommend video transitions for each pair of neighboring shots.
We propose a novel multi-modal matching framework which consists of two parts.
arXiv Detail & Related papers (2022-07-27T12:00:42Z) - Learning to Cut by Watching Movies [114.57935905189416]
This paper focuses on a new task for computational video editing, namely the task of raking cut plausibility.
Our key idea is to leverage content that has already been edited to learn fine-grained audiovisual patterns that trigger cuts.
We devise a model that learns to discriminate between real and artificial cuts via contrastive learning.
arXiv Detail & Related papers (2021-08-09T18:37:17Z) - Where to look at the movies : Analyzing visual attention to understand
movie editing [75.16856363008128]
We propose a new eye-tracking database, containing gaze pattern information on movie sequences.
We show how state-of-the-art computational saliency techniques behave on this dataset.
arXiv Detail & Related papers (2021-02-26T09:54:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.