Related papers: PhyCustom: Towards Realistic Physical Customization in Text-to-Image Generation

PhyCustom: Towards Realistic Physical Customization in Text-to-Image Generation

URL: http://arxiv.org/abs/2512.02794v1
Date: Mon, 01 Dec 2025 16:57:02 GMT
Title: PhyCustom: Towards Realistic Physical Customization in Text-to-Image Generation
Authors: Fan Wu, Cheng Chen, Zhoujie Fu, Jiacheng Wei, Yi Xu, Deheng Ye, Guosheng Lin,
Abstract summary: We propose a fine-tuning framework comprising two novel regularization losses to activate diffusion model to perform physical customization.<n>Specifically, the proposed isometric loss aims at activating diffusion models to learn physical concepts while decouple loss helps to eliminate the mixture learning of independent concepts.
Score: 58.02373668073258
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent diffusion-based text-to-image customization methods have achieved significant success in understanding concrete concepts to control generation processes, such as styles and shapes. However, few efforts dive into the realistic yet challenging customization of physical concepts. The core limitation of current methods arises from the absence of explicitly introducing physical knowledge during training. Even when physics-related words appear in the input text prompts, our experiments consistently demonstrate that these methods fail to accurately reflect the corresponding physical properties in the generated results. In this paper, we propose PhyCustom, a fine-tuning framework comprising two novel regularization losses to activate diffusion model to perform physical customization. Specifically, the proposed isometric loss aims at activating diffusion models to learn physical concepts while decouple loss helps to eliminate the mixture learning of independent concepts. Experiments are conducted on a diverse dataset and our benchmark results demonstrate that PhyCustom outperforms previous state-of-the-art and popular methods in terms of physical customization quantitatively and qualitatively.

Related papers

PhysDrape: Learning Explicit Forces and Collision Constraints for Physically Realistic Garment Draping [4.854753036255255]
Deep learning-based garment draping has emerged as a promising alternative to traditional Physics-Based Simulation (PBS)<n>We present PhysDrape, a hybrid neural-physical solver for physically realistic garment draping driven by explicit forces and constraints.<n>This differentiable design guarantees physical validity through explicit constraints, while enabling end-to-end learning to optimize the network for physically consistent predictions.
arXiv Detail & Related papers (2026-02-08T15:46:01Z)
ProPhy: Progressive Physical Alignment for Dynamic World Simulation [55.456455952212416]
ProPhy is a Progressive Physical Alignment Framework that enables explicit physics-aware conditioning and anisotropic generation.<n>We show that ProPhy produces more realistic, dynamic, and physically coherent results than existing state-of-the-art methods.
arXiv Detail & Related papers (2025-12-05T09:39:26Z)
LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference [57.086932851733145]
We introduce LikePhys, a training-free method that evaluates intuitive physics in video diffusion models.<n>We benchmark intuitive physics understanding in current video diffusion models.<n> Empirical results show that, despite current models struggling with complex and chaotic dynamics, there is a clear trend of improvement in physics understanding as model capacity and inference settings scale.
arXiv Detail & Related papers (2025-10-13T15:19:07Z)
Enhancing Physical Plausibility in Video Generation by Reasoning the Implausibility [37.011366226968]
Diffusion models can generate realistic videos, but existing methods rely on implicitly learning physical reasoning from large-scale text-video datasets.<n>We introduce a training-free framework that improves physical plausibility at inference time by explicitly reasoning about implausibility and guiding the generation away from it.
arXiv Detail & Related papers (2025-09-29T12:32:54Z)
PhyRecon: Physically Plausible Neural Scene Reconstruction [81.73129450090684]
We introduce PHYRECON, the first approach to leverage both differentiable rendering and differentiable physics simulation to learn implicit surface representations. Central to this design is an efficient transformation between SDF-based implicit representations and explicit surface points. Our results also exhibit superior physical stability in physical simulators, with at least a 40% improvement across all datasets.
arXiv Detail & Related papers (2024-04-25T15:06:58Z)
Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting [51.606819347636076]
We analyze concept-agnostic overfitting, which undermines non-customized concept knowledge, and concept-specific overfitting, which is confined to customize on limited modalities. We propose Infusion, a T2I customization method that enables the learning of target concepts to avoid being constrained by limited training modalities.
arXiv Detail & Related papers (2024-04-22T09:16:25Z)
Deep Physics-aware Inference of Cloth Deformation for Monocular Human Performance Capture [84.73946704272113]
We show how integrating physics into the training process improves the learned cloth deformations and allows modeling clothing as a separate piece of geometry. Our approach leads to a significant improvement over current state-of-the-art methods and is thus a clear step towards realistic monocular capture of the entire deforming surface of a human clothed.
arXiv Detail & Related papers (2020-11-25T16:46:00Z)
Augmenting Physical Models with Deep Networks for Complex Dynamics Forecasting [34.61959169976758]
APHYNITY is a principled approach for augmenting incomplete physical dynamics described by differential equations with deep data-driven models. It consists in decomposing the dynamics into two components: a physical component accounting for the dynamics for which we have some prior knowledge, and a data-driven component accounting for errors of the physical model.
arXiv Detail & Related papers (2020-10-09T09:31:03Z)
Visual Grounding of Learned Physical Models [66.04898704928517]
Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions. We present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors. Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.
arXiv Detail & Related papers (2020-04-28T17:06:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.