Related papers: Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor

Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor

URL: http://arxiv.org/abs/2110.08580v1
Date: Sat, 16 Oct 2021 14:19:12 GMT
Title: Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor
Authors: Anchit Gupta, Faizan Farooq Khan, Rudrabha Mukhopadhyay, Vinay P. Namboodiri, C. V. Jawahar
Abstract summary: This paper proposes a video editor based on OpenShot with several state-of-the-art facial video editing algorithms as added functionalities. Our editor provides an easy-to-use interface to apply modern lip-syncing algorithms interactively. Our evaluations show a clear improvement in the efficiency of using human editors and an improved video generation quality.
Score: 44.36920938661454
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This paper proposes a video editor based on OpenShot with several state-of-the-art facial video editing algorithms as added functionalities. Our editor provides an easy-to-use interface to apply modern lip-syncing algorithms interactively. Apart from lip-syncing, the editor also uses audio and facial re-enactment to generate expressive talking faces. The manual control improves the overall experience of video editing without missing out on the benefits of modern synthetic video generation algorithms. This control enables us to lip-sync complex dubbed movie scenes, interviews, television shows, and other visual content. Furthermore, our editor provides features that automatically translate lectures from spoken content, lip-sync of the professor, and background content like slides. While doing so, we also tackle the critical aspect of synchronizing background content with the translated speech. We qualitatively evaluate the usefulness of the proposed editor by conducting human evaluations. Our evaluations show a clear improvement in the efficiency of using human editors and an improved video generation quality. We attach demo videos with the supplementary material clearly explaining the tool and also showcasing multiple results.

Related papers

VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation [67.31149310468801]
We introduce VEGGIE, a simple end-to-end framework that unifies video concept editing, grounding, and reasoning based on diverse user instructions. VEGGIE shows strong performance in instructional video editing with different editing skills, outperforming the best instructional baseline as a versatile model.
arXiv Detail & Related papers (2025-03-18T15:31:12Z)
RASA: Replace Anyone, Say Anything -- A Training-Free Framework for Audio-Driven and Universal Portrait Video Editing [82.132107140504]
We introduce a training-free universal portrait video editing framework that provides a versatile and adaptable editing strategy. It supports portrait appearance editing conditioned on the changed first reference frame, as well as lip editing conditioned on varied speech. Our model can achieve more accurate and synchronized lip movements for the lip editing task, as well as more flexible motion transfer for the appearance editing task.
arXiv Detail & Related papers (2025-03-14T16:39:15Z)
DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency [66.49423641279374]
We introduce DeCo, a novel video editing framework specifically designed to treat humans and the background as separate editable targets. We propose a decoupled dynamic human representation that utilizes a human body prior to generate tailored humans. We extend the calculation of score distillation sampling into normal space and image space to enhance the texture of humans during the optimization.
arXiv Detail & Related papers (2024-08-14T11:53:40Z)
ExpressEdit: Video Editing with Natural Language and Sketching [28.814923641627825]
multimodality$-$natural language (NL) and sketching are natural modalities humans use for expression$-$can be utilized to support video editors. We present ExpressEdit, a system that enables editing videos via NL text and sketching on the video frame.
arXiv Detail & Related papers (2024-03-26T13:34:21Z)
UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing [28.140945021777878]
We present UniEdit, a tuning-free framework that supports both video motion and appearance editing. To realize motion editing while preserving source video content, we introduce auxiliary motion-reference and reconstruction branches. The obtained features are then injected into the main editing path via temporal and spatial self-attention layers.
arXiv Detail & Related papers (2024-02-20T17:52:12Z)
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts [116.05656635044357]
We propose a generic video editing framework called Make-A-Protagonist. Specifically, we leverage multiple experts to parse source video, target visual and textual clues, and propose a visual-textual-based video generation model. Results demonstrate the versatile and remarkable editing capabilities of Make-A-Protagonist.
arXiv Detail & Related papers (2023-05-15T17:59:03Z)
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert [89.07178484337865]
Talking face generation, also known as speech-to-lip generation, reconstructs facial motions concerning lips given coherent speech input. Previous studies revealed the importance of lip-speech synchronization and visual quality. We propose using a lip-reading expert to improve the intelligibility of the generated lip regions.
arXiv Detail & Related papers (2023-03-29T07:51:07Z)
VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild [37.93856291026653]
VideoReTalking is a new system to edit the faces of a real-world talking head video according to input audio. It produces a high-quality and lip-syncing output video even with a different emotion.
arXiv Detail & Related papers (2022-11-27T08:14:23Z)
The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing [90.59584961661345]
This work introduces the Anatomy of Video Editing, a dataset, and benchmark, to foster research in AI-assisted video editing. Our benchmark suite focuses on video editing tasks, beyond visual effects, such as automatic footage organization and assisted video assembling. To enable research on these fronts, we annotate more than 1.5M tags, with relevant concepts to cinematography, from 196176 shots sampled from movie scenes.
arXiv Detail & Related papers (2022-07-20T10:53:48Z)
Transcript to Video: Efficient Clip Sequencing from Texts [65.87890762420922]
We present Transcript-to-Video -- a weakly-supervised framework that uses texts as input to automatically create video sequences from an extensive collection of shots. Specifically, we propose a Content Retrieval Module and a Temporal Coherent Module to learn visual-language representations and model shot sequencing styles. For fast inference, we introduce an efficient search strategy for real-time video clip sequencing.
arXiv Detail & Related papers (2021-07-25T17:24:50Z)
Context-Aware Prosody Correction for Text-Based Speech Editing [28.459695630420832]
A major drawback of current systems is that edited recordings often sound unnatural because of prosody mismatches around edited regions. We propose a new context-aware method for more natural sounding text-based editing of speech.
arXiv Detail & Related papers (2021-02-16T18:16:30Z)
Iterative Text-based Editing of Talking-heads Using Neural Retargeting [42.964779538134714]
We present a text-based tool for editing talking-head video that enables an iterative editing workflow. On each iteration users can edit the wording of the speech, further refine mouth motions if necessary to reduce artifacts and manipulate non-verbal aspects of the performance. Our tool requires only 2-3 minutes of the target actor video and it synthesizes the video for each iteration in about 40 seconds.
arXiv Detail & Related papers (2020-11-21T01:05:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.