Related papers: Multistable Shape from Shading Emerges from Patch Diffusion

Multistable Shape from Shading Emerges from Patch Diffusion

URL: http://arxiv.org/abs/2405.14530v1
Date: Thu, 23 May 2024 13:15:24 GMT
Title: Multistable Shape from Shading Emerges from Patch Diffusion
Authors: Xinran Nicole Han, Todd Zickler, Ko Nishino,
Abstract summary: We introduce a model that reconstructs a multimodal distribution of shapes from a single shading image. We show that multistable shape explanations emerge from this model for ''ambiguous'' test images. This may inspire new architectures for 3D shape perception that are more efficient and better aligned with human experience.
Score: 17.090405682103167
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Models for monocular shape reconstruction of surfaces with diffuse reflection -- shape from shading -- ought to produce distributions of outputs, because there are fundamental mathematical ambiguities of both continuous (e.g., bas-relief) and discrete (e.g., convex/concave) varieties which are also experienced by humans. Yet, the outputs of current models are limited to point estimates or tight distributions around single modes, which prevent them from capturing these effects. We introduce a model that reconstructs a multimodal distribution of shapes from a single shading image, which aligns with the human experience of multistable perception. We train a small denoising diffusion process to generate surface normal fields from $16\times 16$ patches of synthetic images of everyday 3D objects. We deploy this model patch-wise at multiple scales, with guidance from inter-patch shape consistency constraints. Despite its relatively small parameter count and predominantly bottom-up structure, we show that multistable shape explanations emerge from this model for ''ambiguous'' test images that humans experience as being multistable. At the same time, the model produces veridical shape estimates for object-like images that include distinctive occluding contours and appear less ambiguous. This may inspire new architectures for stochastic 3D shape perception that are more efficient and better aligned with human experience.

Related papers

ConsistentDreamer: View-Consistent Meshes Through Balanced Multi-View Gaussian Optimization [5.55656676725821]
We present ConsistentDreamer, where we first generate a set of fixed multi-view prior images and sample random views between them. Thereby, we limit the discrepancies between the views guided by the SDS loss and ensure a consistent rough shape. In each iteration, we also use our generated multi-view prior images for fine-detail reconstruction.
arXiv Detail & Related papers (2025-02-13T12:49:25Z)
OFER: Occluded Face Expression Reconstruction [16.06622406877353]
We introduce OFER, a novel approach for single image 3D face reconstruction that can generate plausible, diverse, and expressive 3D faces. We propose a novel ranking mechanism that sorts the outputs of the shape diffusion network based on the predicted shape accuracy scores to select the best match.
arXiv Detail & Related papers (2024-10-29T00:21:26Z)
MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image. Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z)
PolyDiff: Generating 3D Polygonal Meshes with Diffusion Models [15.846449180313778]
PolyDiff is the first diffusion-based approach capable of directly generating realistic and diverse 3D polygonal meshes. Our model is capable of producing high-quality 3D polygonal meshes, ready for integration into downstream 3D.
arXiv Detail & Related papers (2023-12-18T18:19:26Z)
Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision [76.32860119056964]
We propose a novel class of denoising diffusion probabilistic models that learn to sample from distributions of signals that are never directly observed. We demonstrate the effectiveness of our method on three challenging computer vision tasks.
arXiv Detail & Related papers (2023-06-20T17:53:00Z)
DiffPose: Multi-hypothesis Human Pose Estimation using Diffusion models [5.908471365011943]
We propose emphDiffPose, a conditional diffusion model that predicts multiple hypotheses for a given input image. We show that DiffPose slightly improves upon the state of the art for multi-hypothesis pose estimation for simple poses and outperforms it by a large margin for highly ambiguous poses.
arXiv Detail & Related papers (2022-11-29T18:55:13Z)
SinDiffusion: Learning a Diffusion Model from a Single Natural Image [159.4285444680301]
We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image. It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales. Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics.
arXiv Detail & Related papers (2022-11-22T18:00:03Z)
OCD: Learning to Overfit with Conditional Diffusion Models [95.1828574518325]
We present a dynamic model in which the weights are conditioned on an input sample x. We learn to match those weights that would be obtained by finetuning a base model on x and its label y.
arXiv Detail & Related papers (2022-10-02T09:42:47Z)
Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses. We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network. Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z)
SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes [117.76767853430243]
We introduce SNARF, which combines the advantages of linear blend skinning for polygonal meshes with neural implicit surfaces. We propose a forward skinning model that finds all canonical correspondences of any deformed point using iterative root finding. Compared to state-of-the-art neural implicit representations, our approach generalizes better to unseen poses while preserving accuracy.
arXiv Detail & Related papers (2021-04-08T17:54:59Z)
3D Shape Generation and Completion through Point-Voxel Diffusion [24.824065748889048]
We propose a novel approach for probabilistic generative modeling of 3D shapes. Point-Voxel Diffusion (PVD) is a unified, probabilistic formulation for unconditional shape generation and conditional, multimodal shape completion. PVD can be viewed as a series of denoising steps, reversing the diffusion process from observed point cloud data to Gaussian noise, and is trained by optimizing a variational lower bound to the (conditional) likelihood function.
arXiv Detail & Related papers (2021-04-08T10:38:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.