MobileFaceSwap: A Lightweight Framework for Video Face Swapping
- URL: http://arxiv.org/abs/2201.03808v1
- Date: Tue, 11 Jan 2022 06:48:12 GMT
- Title: MobileFaceSwap: A Lightweight Framework for Video Face Swapping
- Authors: Zhiliang Xu, Zhibin Hong, Changxing Ding, Zhen Zhu, Junyu Han, Jingtuo
Liu, Errui Ding
- Abstract summary: We propose a lightweight Identity-aware Dynamic Network (IDN) for subject-agnostic face swapping.
The presented IDN contains only 0.50M parameters and needs 0.33G FLOPs per frame, making it capable for real-time video face swapping on mobile phones.
- Score: 56.87690462046143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advanced face swapping methods have achieved appealing results. However, most
of these methods have many parameters and computations, which makes it
challenging to apply them in real-time applications or deploy them on edge
devices like mobile phones. In this work, we propose a lightweight
Identity-aware Dynamic Network (IDN) for subject-agnostic face swapping by
dynamically adjusting the model parameters according to the identity
information. In particular, we design an efficient Identity Injection Module
(IIM) by introducing two dynamic neural network techniques, including the
weights prediction and weights modulation. Once the IDN is updated, it can be
applied to swap faces given any target image or video. The presented IDN
contains only 0.50M parameters and needs 0.33G FLOPs per frame, making it
capable for real-time video face swapping on mobile phones. In addition, we
introduce a knowledge distillation-based method for stable training, and a loss
reweighting module is employed to obtain better synthesized results. Finally,
our method achieves comparable results with the teacher models and other
state-of-the-art methods.
Related papers
- DynamicAvatars: Accurate Dynamic Facial Avatars Reconstruction and Precise Editing with Diffusion Models [4.851981427563145]
We present DynamicAvatars, a dynamic model that generates photorealistic, moving 3D head avatars from video clips.
Our approach enables precise editing through a novel prompt-based editing model.
arXiv Detail & Related papers (2024-11-24T06:22:30Z) - FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs [5.35588281968644]
We propose a novel framework, named Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs (Fine CLIPER)
Our Fine CLIPER achieves tunable SOTA performance on the DFEW, FERV39k, and MAFW datasets with few parameters.
arXiv Detail & Related papers (2024-07-02T10:55:43Z) - Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision [52.80792724919329]
We introduce a novel framework named Adapter-X to improve fine-tuning in 2D image and 3D point cloud modalities.
It is the first to outperform full fine-tuning in both 2D image and 3D point cloud modalities with significantly fewer parameters, i.e., only 0.20% and 1.88% of original trainable parameters for 2D and 3D classification tasks.
arXiv Detail & Related papers (2024-06-05T08:26:44Z) - Fiducial Focus Augmentation for Facial Landmark Detection [4.433764381081446]
We propose a novel image augmentation technique to enhance the model's understanding of facial structures.
We employ a Siamese architecture-based training mechanism with a Deep Canonical Correlation Analysis (DCCA)-based loss.
Our approach outperforms multiple state-of-the-art approaches across various benchmark datasets.
arXiv Detail & Related papers (2024-02-23T01:34:00Z) - From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos [88.08209394979178]
Dynamic facial expression recognition (DFER) in the wild is still hindered by data limitations.
We introduce a novel Static-to-Dynamic model (S2D) that leverages existing SFER knowledge and dynamic information implicitly encoded in extracted facial landmark-aware features.
arXiv Detail & Related papers (2023-12-09T03:16:09Z) - Migrating Face Swap to Mobile Devices: A lightweight Framework and A
Supervised Training Solution [7.572886749166295]
MobileFSGAN is a novel lightweight GAN for face swap that can run on mobile devices with much fewer parameters while achieving competitive performance.
A lightweight encoder-decoder structure is designed especially for image synthesis tasks, which is only 10.2MB and can run on mobile devices at a real-time speed.
arXiv Detail & Related papers (2022-04-13T05:35:11Z) - One to Many: Adaptive Instrument Segmentation via Meta Learning and
Dynamic Online Adaptation in Robotic Surgical Video [71.43912903508765]
MDAL is a dynamic online adaptive learning scheme for instrument segmentation in robot-assisted surgery.
It learns the general knowledge of instruments and the fast adaptation ability through the video-specific meta-learning paradigm.
It outperforms other state-of-the-art methods on two datasets.
arXiv Detail & Related papers (2021-03-24T05:02:18Z) - MVFNet: Multi-View Fusion Network for Efficient Video Recognition [79.92736306354576]
We introduce a multi-view fusion (MVF) module to exploit video complexity using separable convolution for efficiency.
MVFNet can be thought of as a generalized video modeling framework.
arXiv Detail & Related papers (2020-12-13T06:34:18Z) - The FaceChannel: A Fast & Furious Deep Neural Network for Facial
Expression Recognition [71.24825724518847]
Current state-of-the-art models for automatic Facial Expression Recognition (FER) are based on very deep neural networks that are effective but rather expensive to train.
We formalize the FaceChannel, a light-weight neural network that has much fewer parameters than common deep neural networks.
We demonstrate how our model achieves a comparable, if not better, performance to the current state-of-the-art in FER.
arXiv Detail & Related papers (2020-09-15T09:25:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.