Related papers: ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection

ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection

URL: http://arxiv.org/abs/2305.11452v1
Date: Fri, 19 May 2023 06:13:26 GMT
Title: ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection
Authors: Shiwei Jin, Zhen Wang, Lei Wang, Ning Bi, Truong Nguyen
Abstract summary: Learning-based gaze estimation methods require large amounts of training data with accurate gaze annotations. We present a portable network, called ReDirTrans, achieving latent-to-latent translation for redirecting gaze directions. We also present improvements for the downstream learning-based gaze estimation task, using redirected samples as dataset augmentation.
Score: 12.474515318770237
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning-based gaze estimation methods require large amounts of training data with accurate gaze annotations. Facing such demanding requirements of gaze data collection and annotation, several image synthesis methods were proposed, which successfully redirected gaze directions precisely given the assigned conditions. However, these methods focused on changing gaze directions of the images that only include eyes or restricted ranges of faces with low resolution (less than $128\times128$) to largely reduce interference from other attributes such as hairs, which limits application scenarios. To cope with this limitation, we proposed a portable network, called ReDirTrans, achieving latent-to-latent translation for redirecting gaze directions and head orientations in an interpretable manner. ReDirTrans projects input latent vectors into aimed-attribute embeddings only and redirects these embeddings with assigned pitch and yaw values. Then both the initial and edited embeddings are projected back (deprojected) to the initial latent space as residuals to modify the input latent vectors by subtraction and addition, representing old status removal and new status addition. The projection of aimed attributes only and subtraction-addition operations for status replacement essentially mitigate impacts on other attributes and the distribution of latent vectors. Thus, by combining ReDirTrans with a pretrained fixed e4e-StyleGAN pair, we created ReDirTrans-GAN, which enables accurately redirecting gaze in full-face images with $1024\times1024$ resolution while preserving other attributes such as identity, expression, and hairstyle. Furthermore, we presented improvements for the downstream learning-based gaze estimation task, using redirected samples as dataset augmentation.

Related papers

PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE enhances global feature representation of point cloud masked autoencoders by making them both discriminative and sensitive to transformations.<n>We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations.
arXiv Detail & Related papers (2024-09-24T07:57:21Z)
Adaptive Nonlinear Latent Transformation for Conditional Face Editing [40.32385363670918]
We propose a novel adaptive nonlinear latent transformation for disentangled and conditional face editing, termed AdaTrans. AdaTrans divides the manipulation process into several finer steps; i.e., the direction and size at each step are conditioned on both the facial attributes and the latent codes. AdaTrans enables a controllable face editing with the advantages of disentanglement, flexibility with non-binary attributes, and high fidelity.
arXiv Detail & Related papers (2023-07-15T12:36:50Z)
LatentGaze: Cross-Domain Gaze Estimation through Gaze-Aware Analytic Latent Code Manipulation [0.0]
We propose a gaze-aware analytic manipulation method, based on a data-driven approach with generative adversarial network inversion's disentanglement characteristics. By utilizing GAN-based encoder-generator process, we shift the input image from the target domain to the source domain image, which a gaze estimator is sufficiently aware.
arXiv Detail & Related papers (2022-09-21T08:05:53Z)
Towards Self-Supervised Gaze Estimation [32.91601919228028]
We propose SwAT, an equivariant version of the online clustering-based self-supervised approach SwAV, to learn more informative representations for gaze estimation. We achieve up to 57% and 25% improvements in cross-dataset and within-dataset evaluation tasks on existing benchmarks.
arXiv Detail & Related papers (2022-03-21T13:35:16Z)
CUDA-GR: Controllable Unsupervised Domain Adaptation for Gaze Redirection [3.0141238193080295]
The aim of gaze redirection is to manipulate the gaze in an image to the desired direction. Advancement in generative adversarial networks has shown excellent results in generating photo-realistic images. To enable such fine-tuned control, one needs to obtain ground truth annotations for the training data which can be very expensive.
arXiv Detail & Related papers (2021-06-21T04:39:42Z)
Do Generative Models Know Disentanglement? Contrastive Learning is All You Need [59.033559925639075]
We propose an unsupervised and model-agnostic method: Disentanglement via Contrast (DisCo) in the Variation Space. DisCo achieves the state-of-the-art disentanglement given pretrained non-disentangled generative models, including GAN, VAE, and Flow.
arXiv Detail & Related papers (2021-02-21T08:01:20Z)
Weakly-Supervised Cross-Domain Adaptation for Endoscopic Lesions Segmentation [79.58311369297635]
We propose a new weakly-supervised lesions transfer framework, which can explore transferable domain-invariant knowledge across different datasets. A Wasserstein quantified transferability framework is developed to highlight widerange transferable contextual dependencies. A novel self-supervised pseudo label generator is designed to equally provide confident pseudo pixel labels for both hard-to-transfer and easy-to-transfer target samples.
arXiv Detail & Related papers (2020-12-08T02:26:03Z)
Unsupervised Discovery of Disentangled Manifolds in GANs [74.24771216154105]
Interpretable generation process is beneficial to various image editing applications. We propose a framework to discover interpretable directions in the latent space given arbitrary pre-trained generative adversarial networks.
arXiv Detail & Related papers (2020-11-24T02:18:08Z)
Self-Learning Transformations for Improving Gaze and Head Redirection [49.61091281780071]
We propose a novel generative model for images of faces, that is capable of producing high-quality images under fine-grained control over eye gaze and head orientation angles. This requires the disentangling of many appearance related factors including gaze and head orientation but also lighting, hue etc. We show that explicitly disentangling task-irrelevant factors results in more accurate modelling of gaze and head orientation.
arXiv Detail & Related papers (2020-10-23T11:18:37Z)
Adversarial Semantic Data Augmentation for Human Pose Estimation [96.75411357541438]
We propose Semantic Data Augmentation (SDA), a method that augments images by pasting segmented body parts with various semantic granularity. We also propose Adversarial Semantic Data Augmentation (ASDA), which exploits a generative network to dynamiclly predict tailored pasting configuration. State-of-the-art results are achieved on challenging benchmarks.
arXiv Detail & Related papers (2020-08-03T07:56:04Z)
Coarse-to-Fine Gaze Redirection with Numerical and Pictorial Guidance [74.27389895574422]
We propose a novel gaze redirection framework which exploits both a numerical and a pictorial direction guidance. The proposed method outperforms the state-of-the-art approaches in terms of both image quality and redirection precision.
arXiv Detail & Related papers (2020-04-07T01:17:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.