DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly
- URL: http://arxiv.org/abs/2402.19302v1
- Date: Thu, 29 Feb 2024 16:09:12 GMT
- Title: DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly
- Authors: Gianluca Scarpellini, Stefano Fiorini, Francesco Giuliari, Pietro
Morerio, Alessio Del Bue
- Abstract summary: We introduce DiffAssemble, a Graph Neural Network (GNN)-based architecture that learns to solve reassembly tasks.
Our method treats the elements of a set, whether pieces of 2D patch or 3D object fragments, as nodes of a spatial graph.
We highlight its remarkable reduction in run-time, performing 11 times faster than the quickest optimization-based method for puzzle solving.
- Score: 21.497180110855975
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reassembly tasks play a fundamental role in many fields and multiple
approaches exist to solve specific reassembly problems. In this context, we
posit that a general unified model can effectively address them all,
irrespective of the input data type (images, 3D, etc.). We introduce
DiffAssemble, a Graph Neural Network (GNN)-based architecture that learns to
solve reassembly tasks using a diffusion model formulation. Our method treats
the elements of a set, whether pieces of 2D patch or 3D object fragments, as
nodes of a spatial graph. Training is performed by introducing noise into the
position and rotation of the elements and iteratively denoising them to
reconstruct the coherent initial pose. DiffAssemble achieves state-of-the-art
(SOTA) results in most 2D and 3D reassembly tasks and is the first
learning-based approach that solves 2D puzzles for both rotation and
translation. Furthermore, we highlight its remarkable reduction in run-time,
performing 11 times faster than the quickest optimization-based method for
puzzle solving. Code available at https://github.com/IIT-PAVIS/DiffAssemble
Related papers
- Learning Structure-from-Motion with Graph Attention Networks [23.87562683118926]
We tackle the problem of learning Structure-from-Motion (SfM) through the use of graph attention networks.
In this work we learn a model that takes as input the 2D keypoints detected across multiple views, and outputs the corresponding camera poses and 3D keypoint coordinates.
Our model takes advantage of graph neural networks to learn SfM-specific primitives, and we show that it can be used for fast inference of the reconstruction for new and unseen sequences.
arXiv Detail & Related papers (2023-08-30T12:13:13Z) - Sampling is Matter: Point-guided 3D Human Mesh Reconstruction [0.0]
This paper presents a simple yet powerful method for 3D human mesh reconstruction from a single RGB image.
Experimental results on benchmark datasets show that the proposed method efficiently improves the performance of 3D human mesh reconstruction.
arXiv Detail & Related papers (2023-04-19T08:45:26Z) - In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing [28.790900756506833]
3D-aware GANs offer new capabilities for view synthesis while preserving the editing functionalities of their 2D counterparts.
GAN inversion is a crucial step that seeks the latent code to reconstruct input images or videos, subsequently enabling diverse editing tasks through manipulation of this latent code.
We address this issue by explicitly modeling OOD objects from the input in 3D-aware GANs.
arXiv Detail & Related papers (2023-02-09T18:59:56Z) - FvOR: Robust Joint Shape and Pose Optimization for Few-view Object
Reconstruction [37.81077373162092]
Reconstructing an accurate 3D object model from a few image observations remains a challenging problem in computer vision.
We present FvOR, a learning-based object reconstruction method that predicts accurate 3D models given a few images with noisy input poses.
arXiv Detail & Related papers (2022-05-16T15:39:27Z) - Multi-initialization Optimization Network for Accurate 3D Human Pose and
Shape Estimation [75.44912541912252]
We propose a three-stage framework named Multi-Initialization Optimization Network (MION)
In the first stage, we strategically select different coarse 3D reconstruction candidates which are compatible with the 2D keypoints of input sample.
In the second stage, we design a mesh refinement transformer (MRT) to respectively refine each coarse reconstruction result via a self-attention mechanism.
Finally, a Consistency Estimation Network (CEN) is proposed to find the best result from mutiple candidates by evaluating if the visual evidence in RGB image matches a given 3D reconstruction.
arXiv Detail & Related papers (2021-12-24T02:43:58Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z) - Neural Articulated Radiance Field [90.91714894044253]
We present Neural Articulated Radiance Field (NARF), a novel deformable 3D representation for articulated objects learned from images.
Experiments show that the proposed method is efficient and can generalize well to novel poses.
arXiv Detail & Related papers (2021-04-07T13:23:14Z) - Adjoint Rigid Transform Network: Task-conditioned Alignment of 3D Shapes [86.2129580231191]
Adjoint Rigid Transform (ART) Network is a neural module which can be integrated with a variety of 3D networks.
ART learns to rotate input shapes to a learned canonical orientation, which is crucial for a lot of tasks.
We will release our code and pre-trained models for further research.
arXiv Detail & Related papers (2021-02-01T20:58:45Z) - Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from
Single and Multiple Images [56.652027072552606]
We propose a novel framework for single-view and multi-view 3D object reconstruction, named Pix2Vox++.
By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image.
A multi-scale context-aware fusion module is then introduced to adaptively select high-quality reconstructions for different parts from all coarse 3D volumes to obtain a fused 3D volume.
arXiv Detail & Related papers (2020-06-22T13:48:09Z) - Implicit Functions in Feature Space for 3D Shape Reconstruction and
Completion [53.885984328273686]
Implicit Feature Networks (IF-Nets) deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data.
IF-Nets clearly outperform prior work in 3D object reconstruction in ShapeNet, and obtain significantly more accurate 3D human reconstructions.
arXiv Detail & Related papers (2020-03-03T11:14:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.