Related papers: Photorealistic and Identity-Preserving Image-Based Emotion Manipulation with Latent Diffusion Models

Photorealistic and Identity-Preserving Image-Based Emotion Manipulation with Latent Diffusion Models

URL: http://arxiv.org/abs/2308.03183v1
Date: Sun, 6 Aug 2023 18:28:26 GMT
Title: Photorealistic and Identity-Preserving Image-Based Emotion Manipulation with Latent Diffusion Models
Authors: Ioannis Pikoulis, Panagiotis P. Filntisis, Petros Maragos
Abstract summary: We investigate the emotion manipulation capabilities of diffusion models with "in-the-wild" images. We conduct extensive evaluations on AffectNet, demonstrating the superiority of our approach in terms of image quality and realism.
Score: 31.55798962786664
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we investigate the emotion manipulation capabilities of diffusion models with "in-the-wild" images, a rather unexplored application area relative to the vast and rapidly growing literature for image-to-image translation tasks. Our proposed method encapsulates several pieces of prior work, with the most important being Latent Diffusion models and text-driven manipulation with CLIP latents. We conduct extensive qualitative and quantitative evaluations on AffectNet, demonstrating the superiority of our approach in terms of image quality and realism, while achieving competitive results relative to emotion translation compared to a variety of GAN-based counterparts. Code is released as a publicly available repo.

Related papers

Multi-focal Conditioned Latent Diffusion for Person Image Synthesis [59.113899155476005]
The Latent Diffusion Model (LDM) has demonstrated strong capabilities in high-resolution image generation. We propose a Multi-focal Conditioned Latent Diffusion (MCLD) method to address these limitations. Our approach utilizes a multi-focal condition aggregation module, which effectively integrates facial identity and texture-specific information.
arXiv Detail & Related papers (2025-03-19T20:50:10Z)
d-Sketch: Improving Visual Fidelity of Sketch-to-Image Translation with Pretrained Latent Diffusion Models without Retraining [18.73832646369506]
We introduce a technique for sketch-to-image translation by exploiting the feature generalization capabilities of a large-scale diffusion model without retraining. Experimental results demonstrate that the proposed method outperforms the existing techniques in qualitative and quantitative benchmarks.
arXiv Detail & Related papers (2025-02-19T11:54:45Z)
DILLEMA: Diffusion and Large Language Models for Multi-Modal Augmentation [0.13124513975412253]
We present a novel framework for testing vision neural networks that leverages Large Language Models and control-conditioned Diffusion Models. Our approach begins by translating images into detailed textual descriptions using a captioning model. These descriptions are then used to produce new test images through a text-to-image diffusion process.
arXiv Detail & Related papers (2025-02-05T16:35:42Z)
Weak Supervision Dynamic KL-Weighted Diffusion Models Guided by Large Language Models [0.0]
We present a novel method for improving text-to-image generation by combining Large Language Models with diffusion models. Our approach incorporates semantic understanding from pre-trained LLMs to guide the generation process. Our method significantly improves both the visual quality and alignment of generated images with text descriptions.
arXiv Detail & Related papers (2025-02-02T15:43:13Z)
DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling [6.7206291284535125]
We present an effective data augmentation framework leveraging the Large Language Model (LLM) and Diffusion Model (DM) Our approach addresses the issue of increasing the diversity of synthetic images. Our method produces synthetic images with enhanced diversity while maintaining adherence to the target distribution.
arXiv Detail & Related papers (2024-09-25T14:02:43Z)
Coherent and Multi-modality Image Inpainting via Latent Space Optimization [61.99406669027195]
PILOT (intextbfPainting vtextbfIa textbfLatent textbfOptextbfTimization) is an optimization approach grounded on a novel textitsemantic centralization and textitbackground preservation loss. Our method searches latent spaces capable of generating inpainted regions that exhibit high fidelity to user-provided prompts while maintaining coherence with the background.
arXiv Detail & Related papers (2024-07-10T19:58:04Z)
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling [49.41822427811098]
We present Kaleido, a novel approach that enhances the diversity of samples by incorporating autoregressive latent priors. Kaleido integrates an autoregressive language model that encodes the original caption and generates latent variables. We show that Kaleido adheres closely to the guidance provided by the generated latent variables, demonstrating its capability to effectively control and direct the image generation process.
arXiv Detail & Related papers (2024-05-31T17:41:11Z)
Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers [120.49126407479717]
This paper explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR) We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos.
arXiv Detail & Related papers (2024-03-12T00:02:03Z)
Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS) A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z)
Improving Diffusion-based Image Translation using Asymmetric Gradient Guidance [51.188396199083336]
We present an approach that guides the reverse process of diffusion sampling by applying asymmetric gradient guidance. Our model's adaptability allows it to be implemented with both image-fusion and latent-dif models. Experiments show that our method outperforms various state-of-the-art models in image translation tasks.
arXiv Detail & Related papers (2023-06-07T12:56:56Z)
Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks. Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.