Related papers: Learning to Cut by Watching Movies

Learning to Cut by Watching Movies

URL: http://arxiv.org/abs/2108.04294v1
Date: Mon, 9 Aug 2021 18:37:17 GMT
Title: Learning to Cut by Watching Movies
Authors: Alejandro Pardo, Fabian Caba Heilbron, Juan Le\'on Alc\'azar, Ali Thabet, Bernard Ghanem
Abstract summary: This paper focuses on a new task for computational video editing, namely the task of raking cut plausibility. Our key idea is to leverage content that has already been edited to learn fine-grained audiovisual patterns that trigger cuts. We devise a model that learns to discriminate between real and artificial cuts via contrastive learning.
Score: 114.57935905189416
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Video content creation keeps growing at an incredible pace; yet, creating engaging stories remains challenging and requires non-trivial video editing expertise. Many video editing components are astonishingly hard to automate primarily due to the lack of raw video materials. This paper focuses on a new task for computational video editing, namely the task of raking cut plausibility. Our key idea is to leverage content that has already been edited to learn fine-grained audiovisual patterns that trigger cuts. To do this, we first collected a data source of more than 10K videos, from which we extract more than 255K cuts. We devise a model that learns to discriminate between real and artificial cuts via contrastive learning. We set up a new task and a set of baselines to benchmark video cut generation. We observe that our proposed model outperforms the baselines by large margins. To demonstrate our model in real-world applications, we conduct human studies in a collection of unedited videos. The results show that our model does a better job at cutting than random and alternative baselines.

Related papers

VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation [67.31149310468801]
We introduce VEGGIE, a simple end-to-end framework that unifies video concept editing, grounding, and reasoning based on diverse user instructions. VEGGIE shows strong performance in instructional video editing with different editing skills, outperforming the best instructional baseline as a versatile model.
arXiv Detail & Related papers (2025-03-18T15:31:12Z)
Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists [17.451911831989293]
We introduce Senorita-2M, a high-quality video editing dataset. It is built by crafting four high-quality, specialized video editing models. We propose a filtering pipeline to eliminate poorly edited video pairs.
arXiv Detail & Related papers (2025-02-10T17:58:22Z)
V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data [20.23001319056999]
Diffusion-based generative models have recently shown remarkable image and video editing capabilities. We focus on consistent and identity-preserving removal of glasses in videos, using it as a case study for consistent local attribute removal in videos. We show that despite data imperfection, our model is able to perform the desired edit consistently while preserving the original video content.
arXiv Detail & Related papers (2024-06-20T17:14:43Z)
Neural Video Fields Editing [56.558490998753456]
NVEdit is a text-driven video editing framework designed to mitigate memory overhead and improve consistency. We construct a neural video field, powered by tri-plane and sparse grid, to enable encoding long videos with hundreds of frames. Next, we update the video field through off-the-shelf Text-to-Image (T2I) models to text-driven editing effects.
arXiv Detail & Related papers (2023-12-12T14:48:48Z)
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models [96.55004961251889]
Video Instruction Diffusion (VIDiff) is a unified foundation model designed for a wide range of video tasks. Our model can edit and translate the desired results within seconds based on user instructions. We provide convincing generative results for diverse input videos and written instructions, both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-11-30T18:59:52Z)
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models [68.31777975873742]
Recent attempts at video editing require significant text-to-video data and computation resources for training. We propose vid2vid-zero, a simple yet effective method for zero-shot video editing. Experiments and analyses show promising results in editing attributes, subjects, places, etc., in real-world videos.
arXiv Detail & Related papers (2023-03-30T17:59:25Z)
The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing [90.59584961661345]
This work introduces the Anatomy of Video Editing, a dataset, and benchmark, to foster research in AI-assisted video editing. Our benchmark suite focuses on video editing tasks, beyond visual effects, such as automatic footage organization and assisted video assembling. To enable research on these fronts, we annotate more than 1.5M tags, with relevant concepts to cinematography, from 196176 shots sampled from movie scenes.
arXiv Detail & Related papers (2022-07-20T10:53:48Z)
MovieCuts: A New Dataset and Benchmark for Cut Type Recognition [114.57935905189416]
This paper introduces the cut type recognition task, which requires modeling of multi-modal information. We construct a large-scale dataset called MovieCuts, which contains more than 170K videoclips labeled among ten cut types. Our best model achieves 45.7% mAP, which suggests that the task is challenging and that attaining highly accurate cut type recognition is an open research problem.
arXiv Detail & Related papers (2021-09-12T17:36:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.