Related papers: Generative Perception of Shape and Material from Differential Motion

Generative Perception of Shape and Material from Differential Motion

URL: http://arxiv.org/abs/2506.02473v1
Date: Tue, 03 Jun 2025 05:43:20 GMT
Title: Generative Perception of Shape and Material from Differential Motion
Authors: Xinran Nicole Han, Ko Nishino, Todd Zickler,
Abstract summary: We introduce a novel conditional denoising-diffusion model that generates shape-and-material maps from a short video of an object undergoing differential motions.<n>Our work suggests a generative perception approach for improving visual reasoning in physically-embodied systems.
Score: 17.090405682103167
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Perceiving the shape and material of an object from a single image is inherently ambiguous, especially when lighting is unknown and unconstrained. Despite this, humans can often disentangle shape and material, and when they are uncertain, they often move their head slightly or rotate the object to help resolve the ambiguities. Inspired by this behavior, we introduce a novel conditional denoising-diffusion model that generates samples of shape-and-material maps from a short video of an object undergoing differential motions. Our parameter-efficient architecture allows training directly in pixel-space, and it generates many disentangled attributes of an object simultaneously. Trained on a modest number of synthetic object-motion videos with supervision on shape and material, the model exhibits compelling emergent behavior: For static observations, it produces diverse, multimodal predictions of plausible shape-and-material maps that capture the inherent ambiguities; and when objects move, the distributions quickly converge to more accurate explanations. The model also produces high-quality shape-and-material estimates for less ambiguous, real-world objects. By moving beyond single-view to continuous motion observations, our work suggests a generative perception approach for improving visual reasoning in physically-embodied systems.

Related papers

Multi-Object Discovery by Low-Dimensional Object Motion [0.0]
We propose to model pixel-wise geometry and object motion to remove ambiguity in reconstructing flow from a single image. We achieve state-of-the-art results in unsupervised multi-object segmentation on synthetic and real-world datasets by modeling the scene structure and object motion.
arXiv Detail & Related papers (2023-07-16T12:35:46Z)
Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera [49.357174195542854]
A key challenge of learning the dynamics of the appearance lies in the requirement of a prohibitively large amount of observations. We show that our method can generate a temporally coherent video of dynamic humans for unseen body poses and novel views given a single view video.
arXiv Detail & Related papers (2022-03-24T00:22:03Z)
A Bayesian Treatment of Real-to-Sim for Deformable Object Manipulation [59.29922697476789]
We propose a novel methodology for extracting state information from image sequences via a technique to represent the state of a deformable object as a distribution embedding. Our experiments confirm that we can estimate posterior distributions of physical properties, such as elasticity, friction and scale of highly deformable objects, such as cloth and ropes.
arXiv Detail & Related papers (2021-12-09T17:50:54Z)
DiffSDFSim: Differentiable Rigid-Body Dynamics With Implicit Shapes [9.119424247289857]
Differentiable physics is a powerful tool in computer and robotics for scene understanding and reasoning about interactions. Existing approaches have frequently been limited to objects with simple shape or shapes that are in advance.
arXiv Detail & Related papers (2021-11-30T11:56:24Z)
Visual Vibration Tomography: Estimating Interior Material Properties from Monocular Video [66.94502090429806]
An object's interior material properties, while invisible to the human eye, determine motion observed on its surface. We propose an approach that estimates heterogeneous material properties of an object from a monocular video of its surface vibrations.
arXiv Detail & Related papers (2021-04-06T18:05:27Z)
Occlusion resistant learning of intuitive physics from videos [52.25308231683798]
Key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation. This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences.
arXiv Detail & Related papers (2020-04-30T19:35:54Z)
Visual Grounding of Learned Physical Models [66.04898704928517]
Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions. We present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors. Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.
arXiv Detail & Related papers (2020-04-28T17:06:38Z)
Cloth in the Wind: A Case Study of Physical Measurement through Simulation [50.31424339972478]
We propose to measure latent physical properties for cloth in the wind without ever having seen a real example before. Our solution is an iterative refinement procedure with simulation at its core. The correspondence is measured using an embedding function that maps physically similar examples to nearby points.
arXiv Detail & Related papers (2020-03-09T21:32:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.