Odo: Depth-Guided Diffusion for Identity-Preserving Body Reshaping
- URL: http://arxiv.org/abs/2508.13065v3
- Date: Thu, 25 Sep 2025 05:05:11 GMT
- Title: Odo: Depth-Guided Diffusion for Identity-Preserving Body Reshaping
- Authors: Siddharth Khandelwal, Sridhar Kamath, Arjun Jain,
- Abstract summary: We introduce the first large-scale dataset of 18,573 images across 1523 subjects, specifically designed for controlled human shape editing.<n>We propose Odo, an end-to-end diffusion-based method that enables realistic and intuitive body reshaping guided by simple semantic attributes.<n>Our approach combines a frozen UNet that preserves fine-grained appearance and background details from the input image with a ControlNet that guides shape transformation using target SMPL depth maps.
- Score: 13.8541976816066
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human shape editing enables controllable transformation of a person's body shape, such as thin, muscular, or overweight, while preserving pose, identity, clothing, and background. Unlike human pose editing, which has advanced rapidly, shape editing remains relatively under-explored. Current approaches typically rely on 3D morphable models or image warping, often introducing unrealistic body proportions, texture distortions, and background inconsistencies due to alignment errors and deformations. A key limitation is the lack of large-scale, publicly available datasets for training and evaluating body shape manipulation methods. In this work, we introduce the first large-scale dataset of 18,573 images across 1523 subjects, specifically designed for controlled human shape editing. It features diverse variations in body shape, including fat, muscular and thin, captured under consistent identity, clothing, and background conditions. Using this dataset, we propose Odo, an end-to-end diffusion-based method that enables realistic and intuitive body reshaping guided by simple semantic attributes. Our approach combines a frozen UNet that preserves fine-grained appearance and background details from the input image with a ControlNet that guides shape transformation using target SMPL depth maps. Extensive experiments demonstrate that our method outperforms prior approaches, achieving per-vertex reconstruction errors as low as 7.5mm, significantly lower than the 13.6mm observed in baseline methods, while producing realistic results that accurately match the desired target shapes.
Related papers
- Make-It-Poseable: Feed-forward Latent Posing Model for 3D Humanoid Character Animation [74.6792422278706]
We introduce Make-It-Poseable, a novel feed-forward framework that reformulates character posing as a latent-space transformation problem.<n>Our method reconstructs the character in new poses by directly manipulating its latent representation.<n>It also naturally extends to 3D editing applications like part replacement and refinement.
arXiv Detail & Related papers (2025-12-18T17:01:44Z) - PHD: Personalized 3D Human Body Fitting with Point Diffusion [19.282384138333537]
PHD is a novel approach for personalized 3D human mesh recovery (HMR) and body fitting.<n>It leverages user-specific shape information to improve pose estimation accuracy from videos.
arXiv Detail & Related papers (2025-08-28T23:03:35Z) - Canonical Pose Reconstruction from Single Depth Image for 3D Non-rigid Pose Recovery on Limited Datasets [55.84702107871358]
3D reconstruction from 2D inputs, especially for non-rigid objects like humans, presents unique challenges.<n>Traditional methods often struggle with non-rigid shapes, which require extensive training data to cover the entire deformation space.<n>This study proposes a canonical pose reconstruction model that transforms single-view depth images of deformable shapes into a canonical form.
arXiv Detail & Related papers (2025-05-23T14:58:34Z) - ShapeBoost: Boosting Human Shape Estimation with Part-Based
Parameterization and Clothing-Preserving Augmentation [58.50613393500561]
We propose ShapeBoost, a new human shape recovery framework.
It achieves pixel-level alignment even for rare body shapes and high accuracy for people wearing different types of clothes.
arXiv Detail & Related papers (2024-03-02T23:40:23Z) - DiffBody: Diffusion-based Pose and Shape Editing of Human Images [1.7188280334580193]
We propose a one-shot approach that enables large edits with identity preservation.
To enable large edits, we fit a 3D body model, project the input image onto the 3D model, and change the body's pose and shape.
We further enhance the realism by fine-tuning text embeddings via self-supervised learning.
arXiv Detail & Related papers (2024-01-05T13:36:19Z) - DragD3D: Realistic Mesh Editing with Rigidity Control Driven by 2D Diffusion Priors [10.355568895429588]
Direct mesh editing and deformation are key components in the geometric modeling and animation pipeline.
Regularizers are not aware of the global context and semantics of the object.
We show that our deformations can be controlled to yield realistic shape deformations aware of the global context.
arXiv Detail & Related papers (2023-10-06T19:55:40Z) - Neural Shape Deformation Priors [14.14047635248036]
We present Neural Shape Deformation Priors, a novel method for shape manipulation.
We learn the deformation behavior based on the underlying geometric properties of a shape.
Our method can be applied to challenging deformations and generalizes well to unseen deformations.
arXiv Detail & Related papers (2022-10-11T17:03:25Z) - Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape
Laplacian [58.704089101826774]
We present a 3D-aware image deformation method with minimal restrictions on shape category and deformation type.
We take a supervised learning-based approach to predict the shape Laplacian of the underlying volume of a 3D reconstruction represented as a point cloud.
In the experiments, we present our results of deforming 2D character and clothed human images.
arXiv Detail & Related papers (2022-03-29T04:57:18Z) - NeuralReshaper: Single-image Human-body Retouching with Deep Neural
Networks [50.40798258968408]
We present NeuralReshaper, a novel method for semantic reshaping of human bodies in single images using deep generative networks.
Our approach follows a fit-then-reshape pipeline, which first fits a parametric 3D human model to a source human image.
To deal with the lack-of-data problem that no paired data exist, we introduce a novel self-supervised strategy to train our network.
arXiv Detail & Related papers (2022-03-20T09:02:13Z) - Detailed Avatar Recovery from Single Image [50.82102098057822]
This paper presents a novel framework to recover emphdetailed avatar from a single image.
We use the deep neural networks to refine the 3D shape in a Hierarchical Mesh Deformation framework.
Our method can restore detailed human body shapes with complete textures beyond skinned models.
arXiv Detail & Related papers (2021-08-06T03:51:26Z) - SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural
Implicit Shapes [117.76767853430243]
We introduce SNARF, which combines the advantages of linear blend skinning for polygonal meshes with neural implicit surfaces.
We propose a forward skinning model that finds all canonical correspondences of any deformed point using iterative root finding.
Compared to state-of-the-art neural implicit representations, our approach generalizes better to unseen poses while preserving accuracy.
arXiv Detail & Related papers (2021-04-08T17:54:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.