Related papers: Hierarchical Image-Guided 3D Point Cloud Segmentation in Industrial Scenes via Multi-View Bayesian Fusion

Hierarchical Image-Guided 3D Point Cloud Segmentation in Industrial Scenes via Multi-View Bayesian Fusion

URL: http://arxiv.org/abs/2512.06882v1
Date: Sun, 07 Dec 2025 15:15:52 GMT
Title: Hierarchical Image-Guided 3D Point Cloud Segmentation in Industrial Scenes via Multi-View Bayesian Fusion
Authors: Yu Zhu, Naoya Chiba, Koichi Hashimoto,
Abstract summary: 3D segmentation is critical for understanding complex scenes with dense layouts and multi-scale objects.<n>Existing 3D point-based methods require costly annotations, while image-guided methods often suffer from semantic inconsistencies across views.<n>We propose a hierarchical image-guided 3D segmentation framework that progressively refines segmentation from instance-level to part-level.
Score: 4.679314646805623
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Reliable 3D segmentation is critical for understanding complex scenes with dense layouts and multi-scale objects, as commonly seen in industrial environments. In such scenarios, heavy occlusion weakens geometric boundaries between objects, and large differences in object scale will cause end-to-end models fail to capture both coarse and fine details accurately. Existing 3D point-based methods require costly annotations, while image-guided methods often suffer from semantic inconsistencies across views. To address these challenges, we propose a hierarchical image-guided 3D segmentation framework that progressively refines segmentation from instance-level to part-level. Instance segmentation involves rendering a top-view image and projecting SAM-generated masks prompted by YOLO-World back onto the 3D point cloud. Part-level segmentation is subsequently performed by rendering multi-view images of each instance obtained from the previous stage and applying the same 2D segmentation and back-projection process at each view, followed by Bayesian updating fusion to ensure semantic consistency across views. Experiments on real-world factory data demonstrate that our method effectively handles occlusion and structural complexity, achieving consistently high per-class mIoU scores. Additional evaluations on public dataset confirm the generalization ability of our framework, highlighting its robustness, annotation efficiency, and adaptability to diverse 3D environments.

Related papers

MV-SAM: Multi-view Promptable Segmentation using Pointmap Guidance [79.57732829495843]
We introduce MV-SAM, a framework for multi-view segmentation that achieves 3D consistency using pointmaps.<n>MV-SAM lifts images and prompts into 3D space, eliminating the need for explicit 3D networks or annotated 3D data.
arXiv Detail & Related papers (2026-01-25T15:00:37Z)
Towards 3D Object-Centric Feature Learning for Semantic Scene Completion [18.41627244498394]
Vision-based 3D Semantic Scene Completion (SSC) has received growing attention due to its potential in autonomous driving.<n>We propose Ocean, an object-centric prediction framework that decomposes the scene into individual object instances.<n>We show that Ocean achieves state-of-the-art performance, with mIoU scores of 17.40 and 20.28, respectively.
arXiv Detail & Related papers (2025-11-17T06:28:26Z)
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction [82.53307702809606]
Humans naturally perceive the geometric structure and semantic content of a 3D world as intertwined dimensions.<n>We propose InstanceGrounded Geometry Transformer (IGGT) to unify the knowledge for both spatial reconstruction and instance-level contextual understanding.
arXiv Detail & Related papers (2025-10-26T14:57:44Z)
IGFuse: Interactive 3D Gaussian Scene Reconstruction via Multi-Scans Fusion [15.837932667195037]
IGFuse is a novel framework that reconstructs interactive Gaussian scene by fusing observations from multiple scans.<n>Our method constructs segmentation aware Gaussian fields and enforces bi-directional photometric and semantic consistency across scans.<n>IGFuse enables high fidelity rendering and object level scene manipulation without dense observations or complex pipelines.
arXiv Detail & Related papers (2025-08-18T17:59:47Z)
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation [50.206100327643284]
HiScene is a novel hierarchical framework that bridges the gap between 2D image generation and 3D object generation.<n>We generate 3D content that aligns with 2D representations while maintaining compositional structure.
arXiv Detail & Related papers (2025-04-17T16:33:39Z)
IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction.<n>We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images.<n>We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z)
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation [91.94869042117621]
Reasoning segmentation aims to segment target objects in complex scenes based on human intent and spatial reasoning.<n>Recent multimodal large language models (MLLMs) have demonstrated impressive 2D image reasoning segmentation.<n>We introduce MLLM-For3D, a framework that transfers knowledge from 2D MLLMs to 3D scene understanding.
arXiv Detail & Related papers (2025-03-23T16:40:20Z)
View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields [52.08335264414515]
We learn a novel feature field within a Neural Radiance Field (NeRF) representing a 3D scene. Our method takes view-inconsistent multi-granularity 2D segmentations as input and produces a hierarchy of 3D-consistent segmentations as output. We evaluate our method and several baselines on synthetic datasets with multi-view images and multi-granular segmentation, showcasing improved accuracy and viewpoint-consistency.
arXiv Detail & Related papers (2024-05-30T04:14:58Z)
OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning [31.234212614311424]
We propose OmniSeg3D, an omniversal segmentation method for segmenting anything in 3D all at once. In tackling the challenges posed by inconsistent 2D segmentations, this framework yields a global consistent 3D feature field. Experiments demonstrate the effectiveness of our method on high-quality 3D segmentation and accurate hierarchical structure understanding.
arXiv Detail & Related papers (2023-11-20T11:04:59Z)
3D Instance Segmentation of MVS Buildings [5.2517244720510305]
We present a novel framework for instance segmentation of 3D buildings from Multi-view Stereo (MVS) urban scenes. The emphasis of this work lies in detecting and segmenting 3D building instances even if they are attached and embedded in a large and imprecise 3D surface model.
arXiv Detail & Related papers (2021-12-18T11:12:38Z)
Weakly Supervised Semantic Segmentation in 3D Graph-Structured Point Clouds of Wild Scenes [36.07733308424772]
The deficiency of 3D segmentation labels is one of the main obstacles to effective point cloud segmentation. We propose a novel deep graph convolutional network-based framework for large-scale semantic scene segmentation in point clouds with sole 2D supervision.
arXiv Detail & Related papers (2020-04-26T23:02:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.