PanSt3R: Multi-view Consistent Panoptic Segmentation
- URL: http://arxiv.org/abs/2506.21348v1
- Date: Thu, 26 Jun 2025 15:02:00 GMT
- Title: PanSt3R: Multi-view Consistent Panoptic Segmentation
- Authors: Lojze Zust, Yohann Cabon, Juliette Marrie, Leonid Antsfeld, Boris Chidlovskii, Jerome Revaud, Gabriela Csurka,
- Abstract summary: We argue that relying on 2D panoptic segmentation for a problem inherently 3D and multi-view is likely suboptimal.<n>We propose a unified and integrated approach PanSt3R, which eliminates the need for test-time optimization.<n>PanSt3R is conceptually simple, yet fast and scalable, and achieves state-of-the-art performance on several benchmarks.
- Score: 10.781185925397493
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Panoptic segmentation of 3D scenes, involving the segmentation and classification of object instances in a dense 3D reconstruction of a scene, is a challenging problem, especially when relying solely on unposed 2D images. Existing approaches typically leverage off-the-shelf models to extract per-frame 2D panoptic segmentations, before optimizing an implicit geometric representation (often based on NeRF) to integrate and fuse the 2D predictions. We argue that relying on 2D panoptic segmentation for a problem inherently 3D and multi-view is likely suboptimal as it fails to leverage the full potential of spatial relationships across views. In addition to requiring camera parameters, these approaches also necessitate computationally expensive test-time optimization for each scene. Instead, in this work, we propose a unified and integrated approach PanSt3R, which eliminates the need for test-time optimization by jointly predicting 3D geometry and multi-view panoptic segmentation in a single forward pass. Our approach builds upon recent advances in 3D reconstruction, specifically upon MUSt3R, a scalable multi-view version of DUSt3R, and enhances it with semantic awareness and multi-view panoptic segmentation capabilities. We additionally revisit the standard post-processing mask merging procedure and introduce a more principled approach for multi-view segmentation. We also introduce a simple method for generating novel-view predictions based on the predictions of PanSt3R and vanilla 3DGS. Overall, the proposed PanSt3R is conceptually simple, yet fast and scalable, and achieves state-of-the-art performance on several benchmarks, while being orders of magnitude faster than existing methods.
Related papers
- SeqAffordSplat: Scene-level Sequential Affordance Reasoning on 3D Gaussian Splatting [85.87902260102652]
We introduce the novel task of Sequential 3D Gaussian Affordance Reasoning.<n>We then propose SeqSplatNet, an end-to-end framework that directly maps an instruction to a sequence of 3D affordance masks.<n>Our method sets a new state-of-the-art on our challenging benchmark, effectively advancing affordance reasoning from single-step interactions to complex, sequential tasks at the scene level.
arXiv Detail & Related papers (2025-07-31T17:56:55Z) - econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians [56.85804719947]
We propose econSG for open-vocabulary semantic segmentation with 3DGS.<n>Our econSG shows state-of-the-art performance on four benchmark datasets compared to the existing methods.
arXiv Detail & Related papers (2025-04-08T13:12:31Z) - PanopticSplatting: End-to-End Panoptic Gaussian Splatting [20.04251473153725]
We propose PanopticSplatting, an end-to-end system for open-vocabulary panoptic reconstruction.<n>Our method introduces query-guided Gaussian segmentation with local cross attention, lifting 2D instance masks without cross-frame association.<n>Our method demonstrates strong performances in 3D scene panoptic reconstruction on the ScanNet-V2 and ScanNet++ datasets.
arXiv Detail & Related papers (2025-03-23T13:45:39Z) - Leverage Cross-Attention for End-to-End Open-Vocabulary Panoptic Reconstruction [24.82894136068243]
PanopticRecon++ is an end-to-end method that formulates panoptic reconstruction through a novel cross-attention perspective.<n>This perspective models the relationship between 3D instances (as queries) and the scene's 3D embedding field (as keys) through their attention map.<n>PanopticRecon++ shows competitive performance in terms of 3D and 2D segmentation and reconstruction performance on both simulation and real-world datasets.
arXiv Detail & Related papers (2025-01-02T07:37:09Z) - PanoSLAM: Panoptic 3D Scene Reconstruction via Gaussian SLAM [105.01907579424362]
PanoSLAM is the first SLAM system to integrate geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation within a unified framework.<n>For the first time, it achieves panoptic 3D reconstruction of open-world environments directly from the RGB-D video.
arXiv Detail & Related papers (2024-12-31T08:58:10Z) - View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields [52.08335264414515]
We learn a novel feature field within a Neural Radiance Field (NeRF) representing a 3D scene.
Our method takes view-inconsistent multi-granularity 2D segmentations as input and produces a hierarchy of 3D-consistent segmentations as output.
We evaluate our method and several baselines on synthetic datasets with multi-view images and multi-granular segmentation, showcasing improved accuracy and viewpoint-consistency.
arXiv Detail & Related papers (2024-05-30T04:14:58Z) - InstantSplat: Sparse-view Gaussian Splatting in Seconds [91.77050739918037]
We introduce InstantSplat, a novel approach for addressing sparse-view 3D scene reconstruction at lightning-fast speed.<n>InstantSplat employs a self-supervised framework that optimize 3D scene representation and camera poses.<n>It achieves an acceleration of over 30x in reconstruction and improves visual quality (SSIM) from 0.3755 to 0.7624 compared to traditional SfM with 3D-GS.
arXiv Detail & Related papers (2024-03-29T17:29:58Z) - SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical
Refinement and EM optimization [6.886220026399106]
We introduce Multi-View Stereo (SD-MVS) to tackle challenges in 3D reconstruction of textureless areas.
We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes.
We propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths.
arXiv Detail & Related papers (2024-01-12T05:25:57Z) - SAI3D: Segment Any Instance in 3D Scenes [68.57002591841034]
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach.
Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations.
Empirical evaluations on ScanNet, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach.
arXiv Detail & Related papers (2023-12-17T09:05:47Z) - Panoptic Lifting for 3D Scene Understanding with Neural Fields [32.59498558663363]
We propose a novel approach for learning panoptic 3D representations from images of in-the-wild scenes.
Our method requires only machine-generated 2D panoptic segmentation masks inferred from a pre-trained network.
Experimental results validate our approach on the challenging Hypersim, Replica, and ScanNet datasets.
arXiv Detail & Related papers (2022-12-19T19:15:36Z) - H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction [27.66008315400462]
Recent learning approaches that implicitly represent surface geometry have shown impressive results in the problem of multi-view 3D reconstruction.
We tackle these limitations for the specific problem of few-shot full 3D head reconstruction.
We learn a shape model of 3D heads from thousands of incomplete raw scans using implicit representations.
arXiv Detail & Related papers (2021-07-26T23:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.