Learning to Cut by Watching Movies
- URL: http://arxiv.org/abs/2108.04294v1
- Date: Mon, 9 Aug 2021 18:37:17 GMT
- Title: Learning to Cut by Watching Movies
- Authors: Alejandro Pardo, Fabian Caba Heilbron, Juan Le\'on Alc\'azar, Ali
Thabet, Bernard Ghanem
- Abstract summary: This paper focuses on a new task for computational video editing, namely the task of raking cut plausibility.
Our key idea is to leverage content that has already been edited to learn fine-grained audiovisual patterns that trigger cuts.
We devise a model that learns to discriminate between real and artificial cuts via contrastive learning.
- Score: 114.57935905189416
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video content creation keeps growing at an incredible pace; yet, creating
engaging stories remains challenging and requires non-trivial video editing
expertise. Many video editing components are astonishingly hard to automate
primarily due to the lack of raw video materials. This paper focuses on a new
task for computational video editing, namely the task of raking cut
plausibility. Our key idea is to leverage content that has already been edited
to learn fine-grained audiovisual patterns that trigger cuts. To do this, we
first collected a data source of more than 10K videos, from which we extract
more than 255K cuts. We devise a model that learns to discriminate between real
and artificial cuts via contrastive learning. We set up a new task and a set of
baselines to benchmark video cut generation. We observe that our proposed model
outperforms the baselines by large margins. To demonstrate our model in
real-world applications, we conduct human studies in a collection of unedited
videos. The results show that our model does a better job at cutting than
random and alternative baselines.
Related papers
- V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data [20.23001319056999]
Diffusion-based generative models have recently shown remarkable image and video editing capabilities.
We focus on consistent and identity-preserving removal of glasses in videos, using it as a case study for consistent local attribute removal in videos.
We show that despite data imperfection, our model is able to perform the desired edit consistently while preserving the original video content.
arXiv Detail & Related papers (2024-06-20T17:14:43Z) - Neural Video Fields Editing [56.558490998753456]
NVEdit is a text-driven video editing framework designed to mitigate memory overhead and improve consistency.
We construct a neural video field, powered by tri-plane and sparse grid, to enable encoding long videos with hundreds of frames.
Next, we update the video field through off-the-shelf Text-to-Image (T2I) models to text-driven editing effects.
arXiv Detail & Related papers (2023-12-12T14:48:48Z) - VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion
Models [96.55004961251889]
Video Instruction Diffusion (VIDiff) is a unified foundation model designed for a wide range of video tasks.
Our model can edit and translate the desired results within seconds based on user instructions.
We provide convincing generative results for diverse input videos and written instructions, both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-11-30T18:59:52Z) - Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models [68.31777975873742]
Recent attempts at video editing require significant text-to-video data and computation resources for training.
We propose vid2vid-zero, a simple yet effective method for zero-shot video editing.
Experiments and analyses show promising results in editing attributes, subjects, places, etc., in real-world videos.
arXiv Detail & Related papers (2023-03-30T17:59:25Z) - The Anatomy of Video Editing: A Dataset and Benchmark Suite for
AI-Assisted Video Editing [90.59584961661345]
This work introduces the Anatomy of Video Editing, a dataset, and benchmark, to foster research in AI-assisted video editing.
Our benchmark suite focuses on video editing tasks, beyond visual effects, such as automatic footage organization and assisted video assembling.
To enable research on these fronts, we annotate more than 1.5M tags, with relevant concepts to cinematography, from 196176 shots sampled from movie scenes.
arXiv Detail & Related papers (2022-07-20T10:53:48Z) - MovieCuts: A New Dataset and Benchmark for Cut Type Recognition [114.57935905189416]
This paper introduces the cut type recognition task, which requires modeling of multi-modal information.
We construct a large-scale dataset called MovieCuts, which contains more than 170K videoclips labeled among ten cut types.
Our best model achieves 45.7% mAP, which suggests that the task is challenging and that attaining highly accurate cut type recognition is an open research problem.
arXiv Detail & Related papers (2021-09-12T17:36:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.