A Strong View-Free Baseline Approach for Single-View Image Guided Point Cloud Completion
- URL: http://arxiv.org/abs/2506.15747v1
- Date: Wed, 18 Jun 2025 04:10:35 GMT
- Title: A Strong View-Free Baseline Approach for Single-View Image Guided Point Cloud Completion
- Authors: Fangzhou Lin, Zilin Dai, Rigved Sanku, Songlin Hou, Kazunori D Yamada, Haichong K. Zhang, Ziming Zhang,
- Abstract summary: We propose a strong baseline approach for SVIPC based on an attention-based multi-branch encoder-decoder network.<n>Our hierarchical self-fusion mechanism, driven by cross-attention and self-attention layers, effectively integrates information across multiple streams.<n>Our experiments and ablation studies on the ShapeNet-ViPC dataset demonstrate that our view-free framework performs superiorly to state-of-the-art SVIPC methods.
- Score: 11.617131779171933
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The single-view image guided point cloud completion (SVIPC) task aims to reconstruct a complete point cloud from a partial input with the help of a single-view image. While previous works have demonstrated the effectiveness of this multimodal approach, the fundamental necessity of image guidance remains largely unexamined. To explore this, we propose a strong baseline approach for SVIPC based on an attention-based multi-branch encoder-decoder network that only takes partial point clouds as input, view-free. Our hierarchical self-fusion mechanism, driven by cross-attention and self-attention layers, effectively integrates information across multiple streams, enriching feature representations and strengthening the networks ability to capture geometric structures. Extensive experiments and ablation studies on the ShapeNet-ViPC dataset demonstrate that our view-free framework performs superiorly to state-of-the-art SVIPC methods. We hope our findings provide new insights into the development of multimodal learning in SVIPC. Our demo code will be available at https://github.com/Zhang-VISLab.
Related papers
- Instruction-Guided Fusion of Multi-Layer Visual Features in Large Vision-Language Models [50.98559225639266]
We investigate the contributions of visual features from different encoder layers using 18 benchmarks spanning 6 task categories.<n>Our findings reveal that multilayer features provide complementary strengths with varying task dependencies, and uniform fusion leads to suboptimal performance.<n>We propose the instruction-guided vision aggregator, a module that dynamically integrates multi-layer visual features based on textual instructions.
arXiv Detail & Related papers (2024-12-26T05:41:31Z) - CLIP-based Point Cloud Classification via Point Cloud to Image Translation [19.836264118079573]
Contrastive Vision-Language Pre-training (CLIP) based point cloud classification model i.e. PointCLIP has added a new direction in the point cloud classification research domain.
We propose a Pretrained Point Cloud to Image Translation Network (PPCITNet) that produces generalized colored images along with additional salient visual cues to the point cloud depth maps.
arXiv Detail & Related papers (2024-08-07T04:50:05Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation [106.09886920774002]
We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network.
Our method achieves consistent improvements over the baseline trained from scratch and significantly out- performs the existing schemes.
arXiv Detail & Related papers (2024-03-18T14:18:08Z) - DiffPoint: Single and Multi-view Point Cloud Reconstruction with ViT
Based Diffusion Model [10.253402444122084]
We propose a neat and powerful architecture called DiffPoint that combines ViT and diffusion models for the task of point cloud reconstruction.
We evaluate DiffPoint on both single-view and multi-view reconstruction tasks and achieve state-of-the-art results.
arXiv Detail & Related papers (2024-02-17T10:18:40Z) - Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method.
We embed multi-scale complementary features from the same view position into a set of nodes.
By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z) - Ponder: Point Cloud Pre-training via Neural Rendering [93.34522605321514]
We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural encoders.
The learned point-cloud can be easily integrated into various downstream tasks, including not only high-level rendering tasks like 3D detection and segmentation, but low-level tasks like 3D reconstruction and image rendering.
arXiv Detail & Related papers (2022-12-31T08:58:39Z) - Cross-modal Learning for Image-Guided Point Cloud Shape Completion [23.779985842891705]
We show how it is possible to combine the information from the two modalities in a localized latent space.
We also investigate a novel weakly-supervised setting where the auxiliary image provides a supervisory signal.
Experiments show significant improvements over state-of-the-art supervised methods for both unimodal and multimodal completion.
arXiv Detail & Related papers (2022-09-20T08:37:05Z) - Multi-scale Network with Attentional Multi-resolution Fusion for Point
Cloud Semantic Segmentation [2.964101313270572]
We present a comprehensive point cloud semantic segmentation network that aggregates both local and global multi-scale information.
We introduce an Angle Correlation Point Convolution module to effectively learn the local shapes of points.
Third, inspired by HRNet which has excellent performance on 2D image vision tasks, we build an HRNet customized for point cloud to learn global multi-scale context.
arXiv Detail & Related papers (2022-06-27T21:03:33Z) - Series Photo Selection via Multi-view Graph Learning [52.33318426088579]
Series photo selection (SPS) is an important branch of the image aesthetics quality assessment.
We leverage a graph neural network to construct the relationships between multi-view features.
A siamese network is proposed to select the best one from a series of nearly identical photos.
arXiv Detail & Related papers (2022-03-18T04:23:25Z) - Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds
of Large Scenes with Learned Virtual View Visibility [17.929307870456416]
We present a novel framework for mesh reconstruction from unstructured point clouds.
We take advantage of the learned visibility of the 3D points in the virtual views and traditional graph-cut based mesh generation.
arXiv Detail & Related papers (2021-08-18T20:28:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.