Related papers: Human-Aligned Generative Perception: Bridging Psychophysics and Generative Models

Human-Aligned Generative Perception: Bridging Psychophysics and Generative Models

URL: http://arxiv.org/abs/2512.22272v1
Date: Thu, 25 Dec 2025 01:26:11 GMT
Title: Human-Aligned Generative Perception: Bridging Psychophysics and Generative Models
Authors: Antara Titikhsha, Om Kulkarni, Dharun Muthaiah,
Abstract summary: This paper investigates whether geometric understanding can be introduced without specialized training by using lightweight, off-the-shelf discriminators as external guidance signals.<n>We propose a Human Perception Embedding (HPE) teacher trained on the THINGS triplet dataset, which captures human sensitivity to object shape.<n>Our results show that small teacher models can reliably guide large generative systems, enabling stronger geometric control and broadening the creative range of text-to-image synthesis.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-image diffusion models generate highly detailed textures, yet they often rely on surface appearance and fail to follow strict geometric constraints, particularly when those constraints conflict with the style implied by the text prompt. This reflects a broader semantic gap between human perception and current generative models. We investigate whether geometric understanding can be introduced without specialized training by using lightweight, off-the-shelf discriminators as external guidance signals. We propose a Human Perception Embedding (HPE) teacher trained on the THINGS triplet dataset, which captures human sensitivity to object shape. By injecting gradients from this teacher into the latent diffusion process, we show that geometry and style can be separated in a controllable manner. We evaluate this approach across three architectures: Stable Diffusion v1.5 with a U-Net backbone, the flow-matching model SiT-XL/2, and the diffusion transformer PixArt-Σ. Our experiments reveal that flow models tend to drift back toward their default trajectories without continuous guidance, and we demonstrate zero-shot transfer of complex three-dimensional shapes, such as an Eames chair, onto conflicting materials such as pink metal. This guided generation improves semantic alignment by about 80 percent compared to unguided baselines. Overall, our results show that small teacher models can reliably guide large generative systems, enabling stronger geometric control and broadening the creative range of text-to-image synthesis.

Related papers

StdGEN++: A Comprehensive System for Semantic-Decomposed 3D Character Generation [57.06461272772509]
StdGEN++ is a novel and comprehensive system for generating high-fidelity, semantically decomposed 3D characters from diverse inputs.<n>It achieves state-of-the-art performance, significantly outperforming existing methods in geometric accuracy and semantic disentanglement.<n>The resulting structural independence unlocks advanced downstream capabilities, including non-destructive editing, physics-compliant animation, and gaze tracking.
arXiv Detail & Related papers (2026-01-12T15:41:27Z)
Joint Geometry-Appearance Human Reconstruction in a Unified Latent Space via Bridge Diffusion [57.09673862519791]
This paper introduces textbfJGA-LBD, a novel framework that unifies the modeling of geometry and appearance into a joint latent representation.<n> Experiments demonstrate that JGA-LBD outperforms current state-of-the-art approaches in terms of both geometry fidelity and appearance quality.
arXiv Detail & Related papers (2026-01-01T12:48:56Z)
PointDico: Contrastive 3D Representation Learning Guided by Diffusion Models [5.077352707415241]
textitPointDico learns from both denoising generative modeling and cross-modal contrastive learning through knowledge distillation.<n>textitPointDico achieves a new state-of-the-art in 3D representation learning, textite.g., textbf94.32% accuracy on ScanObjectNN, textbf86.5% Inst. mIoU on ShapeNetPart.
arXiv Detail & Related papers (2025-12-09T07:57:56Z)
Generative Human Geometry Distribution [49.58025398670139]
We build upon Geometry distributions, a recently proposed representation that can model a single human geometry with high fidelity.<n>We propose a new geometry distribution model by two key techniques: encoding distributions as 2D feature maps rather than network parameters, and using SMPL models as the domain instead of Gaussian.<n> Experimental results demonstrate that our method outperforms existing state-of-the-art methods, achieving a 57% improvement in geometry quality.
arXiv Detail & Related papers (2025-03-03T11:55:19Z)
JADE: Joint-aware Latent Diffusion for 3D Human Generative Modeling [62.77347895550087]
We introduce JADE, a generative framework that learns the variations of human shapes with fined-grained control.<n>Our key insight is a joint-aware latent representation that decomposes human bodies into skeleton structures.<n>To generate coherent and plausible human shapes under our proposed decomposition, we also present a cascaded pipeline.
arXiv Detail & Related papers (2024-12-29T14:18:35Z)
MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction [4.457326808146675]
This paper investigates the research task of reconstructing the 3D clothed body from a monocular image.<n>Existing approaches leverage pre-trained SMPL(-X) estimation models or generative models to provide auxiliary information for human reconstruction.<n>We propose a multi-level geometry learning framework. Technically, we design three key components: skeleton-level enhancement, joint-level augmentation, and wrinkle-level refinement modules.
arXiv Detail & Related papers (2024-12-04T08:06:06Z)
A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We present a novel framework for training 3D image-conditioned diffusion models using only 2D supervision.<n>Most existing 3D generative models rely on full 3D supervision, which is impractical due to the scarcity of large-scale 3D datasets.
arXiv Detail & Related papers (2024-12-01T00:29:57Z)
What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models [17.273596999339077]
We study the local geometry of the learned manifold and its relationship to generation outcomes for a wide range of generative models.<n>We provide quantitative and qualitative evidence showing that for a given latent-image pair, the local descriptors are indicative of generation aesthetics, diversity, and memorization by the generative model.
arXiv Detail & Related papers (2024-08-15T17:59:06Z)
Sketch2Human: Deep Human Generation with Disentangled Geometry and Appearance Control [27.23770287587972]
This work presents Sketch2Human, the first system for controllable full-body human image generation guided by a semantic sketch. We present a sketch encoder trained with a large synthetic dataset sampled from StyleGAN-Human's latent space. Although our method is trained with synthetic data, it can handle hand-drawn sketches as well.
arXiv Detail & Related papers (2024-04-24T14:24:57Z)
GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image [94.56927147492738]
We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes from single images. We show that leveraging diffusion priors can markedly improve generalization, detail preservation, and efficiency in resource usage. We propose a simple yet effective strategy to segregate the complex data distribution of various scenes into distinct sub-distributions.
arXiv Detail & Related papers (2024-03-18T17:50:41Z)
Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models [83.35835521670955]
Surf-D is a novel method for generating high-quality 3D shapes as Surfaces with arbitrary topologies. We use the Unsigned Distance Field (UDF) as our surface representation to accommodate arbitrary topologies. We also propose a new pipeline that employs a point-based AutoEncoder to learn a compact and continuous latent space for accurately encoding UDF.
arXiv Detail & Related papers (2023-11-28T18:56:01Z)
Learning Versatile 3D Shape Generation with Improved AR Models [91.87115744375052]
Auto-regressive (AR) models have achieved impressive results in 2D image generation by modeling joint distributions in the grid space. We propose the Improved Auto-regressive Model (ImAM) for 3D shape generation, which applies discrete representation learning based on a latent vector instead of volumetric grids.
arXiv Detail & Related papers (2023-03-26T12:03:18Z)
Deformable Model-Driven Neural Rendering for High-Fidelity 3D Reconstruction of Human Heads Under Low-View Settings [20.07788905506271]
Reconstructing 3D human heads in low-view settings presents technical challenges. We propose geometry decomposition and adopt a two-stage, coarse-to-fine training strategy. Our method outperforms existing neural rendering approaches in terms of reconstruction accuracy and novel view synthesis under low-view settings.
arXiv Detail & Related papers (2023-03-24T08:32:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.