Intelligent Video Editing: Incorporating Modern Talking Face Generation
Algorithms in a Video Editor
- URL: http://arxiv.org/abs/2110.08580v1
- Date: Sat, 16 Oct 2021 14:19:12 GMT
- Title: Intelligent Video Editing: Incorporating Modern Talking Face Generation
Algorithms in a Video Editor
- Authors: Anchit Gupta, Faizan Farooq Khan, Rudrabha Mukhopadhyay, Vinay P.
Namboodiri, C. V. Jawahar
- Abstract summary: This paper proposes a video editor based on OpenShot with several state-of-the-art facial video editing algorithms as added functionalities.
Our editor provides an easy-to-use interface to apply modern lip-syncing algorithms interactively.
Our evaluations show a clear improvement in the efficiency of using human editors and an improved video generation quality.
- Score: 44.36920938661454
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper proposes a video editor based on OpenShot with several
state-of-the-art facial video editing algorithms as added functionalities. Our
editor provides an easy-to-use interface to apply modern lip-syncing algorithms
interactively. Apart from lip-syncing, the editor also uses audio and facial
re-enactment to generate expressive talking faces. The manual control improves
the overall experience of video editing without missing out on the benefits of
modern synthetic video generation algorithms. This control enables us to
lip-sync complex dubbed movie scenes, interviews, television shows, and other
visual content. Furthermore, our editor provides features that automatically
translate lectures from spoken content, lip-sync of the professor, and
background content like slides. While doing so, we also tackle the critical
aspect of synchronizing background content with the translated speech. We
qualitatively evaluate the usefulness of the proposed editor by conducting
human evaluations. Our evaluations show a clear improvement in the efficiency
of using human editors and an improved video generation quality. We attach demo
videos with the supplementary material clearly explaining the tool and also
showcasing multiple results.
Related papers
- DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency [66.49423641279374]
We introduce DeCo, a novel video editing framework specifically designed to treat humans and the background as separate editable targets.
We propose a decoupled dynamic human representation that utilizes a human body prior to generate tailored humans.
We extend the calculation of score distillation sampling into normal space and image space to enhance the texture of humans during the optimization.
arXiv Detail & Related papers (2024-08-14T11:53:40Z) - ExpressEdit: Video Editing with Natural Language and Sketching [28.814923641627825]
multimodality$-$natural language (NL) and sketching are natural modalities humans use for expression$-$can be utilized to support video editors.
We present ExpressEdit, a system that enables editing videos via NL text and sketching on the video frame.
arXiv Detail & Related papers (2024-03-26T13:34:21Z) - UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing [28.140945021777878]
We present UniEdit, a tuning-free framework that supports both video motion and appearance editing.
To realize motion editing while preserving source video content, we introduce auxiliary motion-reference and reconstruction branches.
The obtained features are then injected into the main editing path via temporal and spatial self-attention layers.
arXiv Detail & Related papers (2024-02-20T17:52:12Z) - Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts [116.05656635044357]
We propose a generic video editing framework called Make-A-Protagonist.
Specifically, we leverage multiple experts to parse source video, target visual and textual clues, and propose a visual-textual-based video generation model.
Results demonstrate the versatile and remarkable editing capabilities of Make-A-Protagonist.
arXiv Detail & Related papers (2023-05-15T17:59:03Z) - Seeing What You Said: Talking Face Generation Guided by a Lip Reading
Expert [89.07178484337865]
Talking face generation, also known as speech-to-lip generation, reconstructs facial motions concerning lips given coherent speech input.
Previous studies revealed the importance of lip-speech synchronization and visual quality.
We propose using a lip-reading expert to improve the intelligibility of the generated lip regions.
arXiv Detail & Related papers (2023-03-29T07:51:07Z) - VideoReTalking: Audio-based Lip Synchronization for Talking Head Video
Editing In the Wild [37.93856291026653]
VideoReTalking is a new system to edit the faces of a real-world talking head video according to input audio.
It produces a high-quality and lip-syncing output video even with a different emotion.
arXiv Detail & Related papers (2022-11-27T08:14:23Z) - The Anatomy of Video Editing: A Dataset and Benchmark Suite for
AI-Assisted Video Editing [90.59584961661345]
This work introduces the Anatomy of Video Editing, a dataset, and benchmark, to foster research in AI-assisted video editing.
Our benchmark suite focuses on video editing tasks, beyond visual effects, such as automatic footage organization and assisted video assembling.
To enable research on these fronts, we annotate more than 1.5M tags, with relevant concepts to cinematography, from 196176 shots sampled from movie scenes.
arXiv Detail & Related papers (2022-07-20T10:53:48Z) - Transcript to Video: Efficient Clip Sequencing from Texts [65.87890762420922]
We present Transcript-to-Video -- a weakly-supervised framework that uses texts as input to automatically create video sequences from an extensive collection of shots.
Specifically, we propose a Content Retrieval Module and a Temporal Coherent Module to learn visual-language representations and model shot sequencing styles.
For fast inference, we introduce an efficient search strategy for real-time video clip sequencing.
arXiv Detail & Related papers (2021-07-25T17:24:50Z) - Context-Aware Prosody Correction for Text-Based Speech Editing [28.459695630420832]
A major drawback of current systems is that edited recordings often sound unnatural because of prosody mismatches around edited regions.
We propose a new context-aware method for more natural sounding text-based editing of speech.
arXiv Detail & Related papers (2021-02-16T18:16:30Z) - Iterative Text-based Editing of Talking-heads Using Neural Retargeting [42.964779538134714]
We present a text-based tool for editing talking-head video that enables an iterative editing workflow.
On each iteration users can edit the wording of the speech, further refine mouth motions if necessary to reduce artifacts and manipulate non-verbal aspects of the performance.
Our tool requires only 2-3 minutes of the target actor video and it synthesizes the video for each iteration in about 40 seconds.
arXiv Detail & Related papers (2020-11-21T01:05:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.