Facial Expression Translation using Landmark Guided GANs
- URL: http://arxiv.org/abs/2209.02136v1
- Date: Mon, 5 Sep 2022 20:52:42 GMT
- Title: Facial Expression Translation using Landmark Guided GANs
- Authors: Hao Tang, Nicu Sebe
- Abstract summary: We propose a powerful Landmark guided Generative Adversarial Network (LandmarkGAN) for the facial expression-to-expression translation.
The proposed LandmarkGAN achieves better results compared with state-of-the-art approaches only using a single image.
- Score: 84.64650795005649
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a simple yet powerful Landmark guided Generative Adversarial
Network (LandmarkGAN) for the facial expression-to-expression translation using
a single image, which is an important and challenging task in computer vision
since the expression-to-expression translation is a non-linear and non-aligned
problem. Moreover, it requires a high-level semantic understanding between the
input and output images since the objects in images can have arbitrary poses,
sizes, locations, backgrounds, and self-occlusions. To tackle this problem, we
propose utilizing facial landmark information explicitly. Since it is a
challenging problem, we split it into two sub-tasks, (i) category-guided
landmark generation, and (ii) landmark-guided expression-to-expression
translation. Two sub-tasks are trained in an end-to-end fashion that aims to
enjoy the mutually improved benefits from the generated landmarks and
expressions. Compared with current keypoint-guided approaches, the proposed
LandmarkGAN only needs a single facial image to generate various expressions.
Extensive experimental results on four public datasets demonstrate that the
proposed LandmarkGAN achieves better results compared with state-of-the-art
approaches only using a single image. The code is available at
https://github.com/Ha0Tang/LandmarkGAN.
Related papers
- Enhanced Unsupervised Image-to-Image Translation Using Contrastive Learning and Histogram of Oriented Gradients [0.0]
This paper proposes an enhanced unsupervised image-to-image translation method based on the Contrastive Unpaired Translation (CUT) model.
This novel approach ensures the preservation of the semantic structure of images, even without semantic labels.
The method was tested on translating synthetic game environments from GTA5 dataset to realistic urban scenes in cityscapes dataset.
arXiv Detail & Related papers (2024-09-24T12:44:27Z) - SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial
Network for an end-to-end image translation [18.93434486338439]
SCONE-GAN is shown to be effective for learning to generate realistic and diverse scenery images.
For more realistic and diverse image generation we introduce style reference image.
We validate the proposed algorithm for image-to-image translation and stylizing outdoor images.
arXiv Detail & Related papers (2023-11-07T10:29:16Z) - Towards Better Multi-modal Keyphrase Generation via Visual Entity
Enhancement and Multi-granularity Image Noise Filtering [79.44443231700201]
Multi-modal keyphrase generation aims to produce a set of keyphrases that represent the core points of the input text-image pair.
The input text and image are often not perfectly matched, and thus the image may introduce noise into the model.
We propose a novel multi-modal keyphrase generation model, which not only enriches the model input with external knowledge, but also effectively filters image noise.
arXiv Detail & Related papers (2023-09-09T09:41:36Z) - IR-GAN: Image Manipulation with Linguistic Instruction by Increment
Reasoning [110.7118381246156]
Increment Reasoning Generative Adversarial Network (IR-GAN) aims to reason consistency between visual increment in images and semantic increment in instructions.
First, we introduce the word-level and instruction-level instruction encoders to learn user's intention from history-correlated instructions as semantic increment.
Second, we embed the representation of semantic increment into that of source image for generating target image, where source image plays the role of referring auxiliary.
arXiv Detail & Related papers (2022-04-02T07:48:39Z) - Fully Context-Aware Image Inpainting with a Learned Semantic Pyramid [102.24539566851809]
Restoring reasonable and realistic content for arbitrary missing regions in images is an important yet challenging task.
Recent image inpainting models have made significant progress in generating vivid visual details, but they can still lead to texture blurring or structural distortions.
We propose the Semantic Pyramid Network (SPN) motivated by the idea that learning multi-scale semantic priors can greatly benefit the recovery of locally missing content in images.
arXiv Detail & Related papers (2021-12-08T04:33:33Z) - Two-stage Visual Cues Enhancement Network for Referring Image
Segmentation [89.49412325699537]
Referring Image (RIS) aims at segmenting the target object from an image referred by one given natural language expression.
In this paper, we tackle this problem by devising a Two-stage Visual cues enhancement Network (TV-Net)
Through the two-stage enhancement, our proposed TV-Net enjoys better performances in learning fine-grained matching behaviors between the natural language expression and image.
arXiv Detail & Related papers (2021-10-09T02:53:39Z) - Exploring Explicit and Implicit Visual Relationships for Image
Captioning [11.82805641934772]
In this paper, we explore explicit and implicit visual relationships to enrich region-level representations for image captioning.
Explicitly, we build semantic graph over object pairs and exploit gated graph convolutional networks (Gated GCN) to selectively aggregate local neighbors' information.
Implicitly, we draw global interactions among the detected objects through region-based bidirectional encoder representations from transformers.
arXiv Detail & Related papers (2021-05-06T01:47:51Z) - Learning Semantic Person Image Generation by Region-Adaptive
Normalization [81.52223606284443]
We propose a new two-stage framework to handle the pose and appearance translation.
In the first stage, we predict the target semantic parsing maps to eliminate the difficulties of pose transfer.
In the second stage, we suggest a new person image generation method by incorporating the region-adaptive normalization.
arXiv Detail & Related papers (2021-04-14T06:51:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.