DiffPoint: Single and Multi-view Point Cloud Reconstruction with ViT
Based Diffusion Model
- URL: http://arxiv.org/abs/2402.11241v1
- Date: Sat, 17 Feb 2024 10:18:40 GMT
- Title: DiffPoint: Single and Multi-view Point Cloud Reconstruction with ViT
Based Diffusion Model
- Authors: Yu Feng, Xing Shi, Mengli Cheng, Yun Xiong
- Abstract summary: We propose a neat and powerful architecture called DiffPoint that combines ViT and diffusion models for the task of point cloud reconstruction.
We evaluate DiffPoint on both single-view and multi-view reconstruction tasks and achieve state-of-the-art results.
- Score: 10.253402444122084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the task of 2D-to-3D reconstruction has gained significant attention in
various real-world scenarios, it becomes crucial to be able to generate
high-quality point clouds. Despite the recent success of deep learning models
in generating point clouds, there are still challenges in producing
high-fidelity results due to the disparities between images and point clouds.
While vision transformers (ViT) and diffusion models have shown promise in
various vision tasks, their benefits for reconstructing point clouds from
images have not been demonstrated yet. In this paper, we first propose a neat
and powerful architecture called DiffPoint that combines ViT and diffusion
models for the task of point cloud reconstruction. At each diffusion step, we
divide the noisy point clouds into irregular patches. Then, using a standard
ViT backbone that treats all inputs as tokens (including time information,
image embeddings, and noisy patches), we train our model to predict target
points based on input images. We evaluate DiffPoint on both single-view and
multi-view reconstruction tasks and achieve state-of-the-art results.
Additionally, we introduce a unified and flexible feature fusion module for
aggregating image features from single or multiple input images. Furthermore,
our work demonstrates the feasibility of applying unified architectures across
languages and images to improve 3D reconstruction tasks.
Related papers
- LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction from Single Image [64.94932577552458]
Large Reconstruction Models have made significant strides in the realm of automated 3D content generation from single or multiple input images.
Despite their success, these models often produce 3D meshes with geometric inaccuracies, stemming from the inherent challenges of deducing 3D shapes solely from image data.
We introduce a novel framework, the Large Image and Point Cloud Alignment Model (LAM3D), which utilizes 3D point cloud data to enhance the fidelity of generated 3D meshes.
arXiv Detail & Related papers (2024-05-24T15:09:12Z) - Few-shot point cloud reconstruction and denoising via learned Guassian splats renderings and fine-tuned diffusion features [52.62053703535824]
We propose a method to reconstruct point clouds from few images and to denoise point clouds from their rendering.
To improve reconstruction in constraint settings, we regularize the training of a differentiable with hybrid surface and appearance.
We demonstrate how these learned filters can be used to remove point cloud noise coming without 3D supervision.
arXiv Detail & Related papers (2024-04-01T13:38:16Z) - HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation [106.09886920774002]
We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network.
Our method achieves consistent improvements over the baseline trained from scratch and significantly out- performs the existing schemes.
arXiv Detail & Related papers (2024-03-18T14:18:08Z) - Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models [97.58685709663287]
generative pre-training can boost the performance of fundamental models in 2D vision.
In 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training.
We propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model.
arXiv Detail & Related papers (2023-07-27T16:07:03Z) - Ponder: Point Cloud Pre-training via Neural Rendering [93.34522605321514]
We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural encoders.
The learned point-cloud can be easily integrated into various downstream tasks, including not only high-level rendering tasks like 3D detection and segmentation, but low-level tasks like 3D reconstruction and image rendering.
arXiv Detail & Related papers (2022-12-31T08:58:39Z) - PointVST: Self-Supervised Pre-training for 3D Point Clouds via
View-Specific Point-to-Image Translation [64.858505571083]
This paper proposes a translative pre-training framework, namely PointVST.
It is driven by a novel self-supervised pretext task of cross-modal translation from 3D point clouds to their corresponding diverse forms of 2D rendered images.
arXiv Detail & Related papers (2022-12-29T07:03:29Z) - Let Images Give You More:Point Cloud Cross-Modal Training for Shape
Analysis [43.13887916301742]
This paper introduces a simple but effective point cloud cross-modality training (PointCMT) strategy to boost point cloud analysis.
To effectively acquire auxiliary knowledge from view images, we develop a teacher-student framework and formulate the cross modal learning as a knowledge distillation problem.
We verify significant gains on various datasets using appealing backbones, i.e., equipped with PointCMT, PointNet++ and PointMLP.
arXiv Detail & Related papers (2022-10-09T09:35:22Z) - Flow-based GAN for 3D Point Cloud Generation from a Single Image [16.04710129379503]
We introduce a hybrid explicit-implicit generative modeling scheme, which inherits the flow-based explicit generative models for sampling point clouds with arbitrary resolutions.
We evaluate on the large-scale synthetic dataset ShapeNet, with the experimental results demonstrating the superior performance of the proposed method.
arXiv Detail & Related papers (2022-10-08T17:58:20Z) - Shrinking unit: a Graph Convolution-Based Unit for CNN-like 3D Point
Cloud Feature Extractors [0.0]
We argue that a lack of inspiration from the image domain might be the primary cause of such a gap.
We propose a graph convolution-based unit, dubbed Shrinking unit, that can be stacked vertically and horizontally for the design of CNN-like 3D point cloud feature extractors.
arXiv Detail & Related papers (2022-09-26T15:28:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.