Improving Personalized Image Generation through Social Context Feedback
- URL: http://arxiv.org/abs/2507.16095v1
- Date: Mon, 21 Jul 2025 22:36:30 GMT
- Title: Improving Personalized Image Generation through Social Context Feedback
- Authors: Parul Gupta, Abhinav Dhall, Thanh-Toan Do,
- Abstract summary: We propose to overcome these shortcomings through feedback-based fine-tuning of existing personalized generation methods.<n>State-of-art detectors of pose, human-object-interaction, human facial recognition and human gaze-point estimation are used to refine the diffusion model.<n>The images generated in this manner show an improvement in the generated interactions, facial identities and image quality over three benchmark datasets.
- Score: 15.582260415127935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalized image generation, where reference images of one or more subjects are used to generate their image according to a scene description, has gathered significant interest in the community. However, such generated images suffer from three major limitations -- complex activities, such as $<$man, pushing, motorcycle$>$ are not generated properly with incorrect human poses, reference human identities are not preserved, and generated human gaze patterns are unnatural/inconsistent with the scene description. In this work, we propose to overcome these shortcomings through feedback-based fine-tuning of existing personalized generation methods, wherein, state-of-art detectors of pose, human-object-interaction, human facial recognition and human gaze-point estimation are used to refine the diffusion model. We also propose timestep-based inculcation of different feedback modules, depending upon whether the signal is low-level (such as human pose), or high-level (such as gaze point). The images generated in this manner show an improvement in the generated interactions, facial identities and image quality over three benchmark datasets.
Related papers
- Realistic Clothed Human and Object Joint Reconstruction from a Single Image [26.57698106821237]
We introduce a novel implicit approach for jointly reconstructing realistic 3D clothed humans and objects from a monocular view.<n>For the first time, we model both the human and the object with an implicit representation, allowing to capture more realistic details such as clothing.
arXiv Detail & Related papers (2025-02-25T12:26:36Z) - MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts [61.274246025372044]
We study human-centric text-to-image generation in context of faces and hands.
We propose a method called Mixture of Low-rank Experts (MoLE) by considering low-rank modules trained on close-up hand and face images respectively as experts.
This concept draws inspiration from our observation of low-rank refinement, where a low-rank module trained by a customized close-up dataset has the potential to enhance the corresponding image part when applied at an appropriate scale.
arXiv Detail & Related papers (2024-10-30T17:59:57Z) - Single Image, Any Face: Generalisable 3D Face Generation [59.9369171926757]
We propose a novel model, Gen3D-Face, which generates 3D human faces with unconstrained single image input.<n>To the best of our knowledge, this is the first attempt and benchmark for creating photorealistic 3D human face avatars from single images.
arXiv Detail & Related papers (2024-09-25T14:56:37Z) - HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance [80.97360194728705]
AbHuman is the first large-scale synthesized human benchmark focusing on anatomical anomalies.
HumanRefiner is a novel plug-and-play approach for the coarse-to-fine refinement of human anomalies in text-to-image generation.
arXiv Detail & Related papers (2024-07-09T15:14:41Z) - From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation [19.096741614175524]
Parts2Whole is a novel framework designed for generating customized portraits from multiple reference images.
We first develop a semantic-aware appearance encoder to retain details of different human parts.
Second, our framework supports multi-image conditioned generation through a shared self-attention mechanism.
arXiv Detail & Related papers (2024-04-23T17:56:08Z) - Towards Effective Usage of Human-Centric Priors in Diffusion Models for
Text-based Human Image Generation [24.49857926071974]
Vanilla text-to-image diffusion models struggle with generating accurate human images.
Existing methods address this issue mostly by fine-tuning the model with extra images or adding additional controls.
This paper explores the integration of human-centric priors directly into the model fine-tuning stage.
arXiv Detail & Related papers (2024-03-08T11:59:32Z) - HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion [114.15397904945185]
We propose a unified framework, HyperHuman, that generates in-the-wild human images of high realism and diverse layouts.
Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network.
Our framework yields the state-of-the-art performance, generating hyper-realistic human images under diverse scenarios.
arXiv Detail & Related papers (2023-10-12T17:59:34Z) - Semantically Consistent Person Image Generation [18.73832646369506]
We propose a data-driven approach for context-aware person image generation.<n>In our method, the position, scale, and appearance of the generated person are semantically conditioned on the existing persons in the scene.
arXiv Detail & Related papers (2023-02-28T16:34:55Z) - Neural Novel Actor: Learning a Generalized Animatable Neural
Representation for Human Actors [98.24047528960406]
We propose a new method for learning a generalized animatable neural representation from a sparse set of multi-view imagery of multiple persons.
The learned representation can be used to synthesize novel view images of an arbitrary person from a sparse set of cameras, and further animate them with the user's pose control.
arXiv Detail & Related papers (2022-08-25T07:36:46Z) - Generating Person Images with Appearance-aware Pose Stylizer [66.44220388377596]
We present a novel end-to-end framework to generate realistic person images based on given person poses and appearances.
The core of our framework is a novel generator called Appearance-aware Pose Stylizer (APS) which generates human images by coupling the target pose with the conditioned person appearance progressively.
arXiv Detail & Related papers (2020-07-17T15:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.