Related papers: LMSeg: An end-to-end geometric message-passing network on barycentric dual graphs for large-scale landscape mesh segmentation

LMSeg: An end-to-end geometric message-passing network on barycentric dual graphs for large-scale landscape mesh segmentation

URL: http://arxiv.org/abs/2407.04326v3
Date: Sat, 08 Nov 2025 19:42:06 GMT
Title: LMSeg: An end-to-end geometric message-passing network on barycentric dual graphs for large-scale landscape mesh segmentation
Authors: Zexian Huang, Kourosh Khoshelham, Martin Tomko,
Abstract summary: We introduce the BudjBim Wall dataset, a large-scale annotated mesh dataset from the UNESCO World Heritage-listed Budj Bim cultural landscape in Victoria, Australia.<n>We propose LMSeg, a deep graph message-passing network for semantic segmentation of large-scale meshes.<n>Experiments on three large-scale benchmarks (SUM, H3D, and BBW) show that LMSeg achieves 75.1% mIoU on SUM, 78.4% O.A. on H3D, and 62.4% mIoU on BBW, using only 2.4M lightweight parameters.
Score: 5.698076346359676
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Semantic segmentation of large-scale 3D landscape meshes is critical for geospatial analysis in complex environments, yet existing approaches face persistent challenges of scalability, end-to-end trainability, and accurate segmentation of small and irregular objects. To address these issues, we introduce the BudjBim Wall (BBW) dataset, a large-scale annotated mesh dataset derived from high-resolution LiDAR scans of the UNESCO World Heritage-listed Budj Bim cultural landscape in Victoria, Australia. The BBW dataset captures historic dry-stone wall structures that are difficult to detect under vegetation occlusion, supporting research in underrepresented cultural heritage contexts. Building on this dataset, we propose LMSeg, a deep graph message-passing network for semantic segmentation of large-scale meshes. LMSeg employs a barycentric dual graph representation of mesh faces and introduces the Geometry Aggregation+ (GA+) module, a learnable softmax-based operator that adaptively combines neighborhood features and captures high-frequency geometric variations. A hierarchical-local dual pooling integrates hierarchical and local geometric aggregation to balance global context with fine-detail preservation. Experiments on three large-scale benchmarks (SUM, H3D, and BBW) show that LMSeg achieves 75.1% mIoU on SUM, 78.4% O.A. on H3D, and 62.4% mIoU on BBW, using only 2.4M lightweight parameters. In particular, LMSeg demonstrates accurate segmentation across both urban and natural scenes-capturing small-object classes such as vehicles and high vegetation in complex city environments, while also reliably detecting dry-stone walls in dense, occluded rural landscapes. Together, the BBW dataset and LMSeg provide a practical and extensible method for advancing 3D mesh segmentation in cultural heritage, environmental monitoring, and urban applications.

Related papers

SSR: Pushing the Limit of Spatial Intelligence with Structured Scene Reasoning [30.87517633729756]
SSR is a framework designed for Structured Scene Reasoning.<n>It seamlessly integrates 2D and 3D representations via a lightweight alignment mechanism.<n>It achieves state-of-the-art performance on multiple spatial intelligence benchmarks.
arXiv Detail & Related papers (2026-02-28T02:05:35Z)
Scalable Adaptation of 3D Geometric Foundation Models via Weak Supervision from Internet Video [76.32954467706581]
We propose SAGE, a framework for Scalable Adaptation of GEometric foundation models from raw video streams.<n>We use a hierarchical mining pipeline to transform videos into training trajectories and hybrid supervision.<n>Experiments show that SAGE significantly enhances zero-shot generalization, reducing Chamfer Distance by 20-42% on unseen benchmarks.
arXiv Detail & Related papers (2026-02-08T09:53:21Z)
Geodiffussr: Generative Terrain Texturing with Elevation Fidelity [48.82552523546255]
We introduce Geodiffussr, a flow-matching pipeline that synthesizes text-guided texture maps.<n>The core mechanism is multi-scale content aggregation (MCA): DEM features are injected into UNet blocks at multiple resolutions to enforce global-to-local elevation consistency.<n>To train and evaluate Geodiffussr, we assemble a globally distributed, biome- and climate-stratified corpus of triplets pairing SRTM-derived DEMs with Sentinel-2 imagery and vision-grounded natural-appearance captions.
arXiv Detail & Related papers (2025-11-28T09:52:44Z)
OpenUrban3D: Annotation-Free Open-Vocabulary Semantic Segmentation of Large-Scale Urban Point Clouds [23.982606719607702]
We present OpenUrban3D, the first 3D open-vocabulary semantic segmentation framework for large-scale urban scenes.<n>Our approach generates robust semantic features directly from raw point clouds through multi-view, multi-granularity rendering, mask-level vision-language feature extraction, and sample-balanced fusion.<n>This design enables zero-shot segmentation for arbitrary text queries while capturing both semantic richness and geometric priors.
arXiv Detail & Related papers (2025-09-13T15:03:28Z)
Self-Attention Based Multi-Scale Graph Auto-Encoder Network of 3D Meshes [1.573038298640368]
3D Geometric Mesh Network (3DGeoMeshNet), is a novel GCN-based framework that uses anisotropic convolution layers to learn both global and local features directly in the spatial domain.<n>Our architecture features a multi-scale encoder-decoder structure, where separate global and local pathways capture both large-scale geometric structures and fine-grained local details.
arXiv Detail & Related papers (2025-07-07T07:36:03Z)
Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation [54.04601077224252]
Embodied scene understanding requires not only comprehending visual-spatial information but also determining where to explore next in the 3D physical world.<n>underlinetextbf3D vision-language learning enables embodied agents to effectively explore and understand their environment.<n>model's versatility enables navigation using diverse input modalities, including categories, language descriptions, and reference images.
arXiv Detail & Related papers (2025-07-05T14:15:52Z)
Textured Mesh Saliency: Bridging Geometry and Texture for Human Perception in 3D Graphics [50.23625950905638]
We present a new dataset for textured mesh saliency, created through an innovative eye-tracking experiment in a six degrees of freedom (6-DOF) VR environment. Our proposed model predicts saliency maps for textured mesh surfaces by treating each triangular face as an individual unit and assigning a saliency density value to reflect the importance of each local surface region.
arXiv Detail & Related papers (2024-12-11T08:27:33Z)
Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields. LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation. It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z)
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations [55.022519020409405]
This paper builds the first largest ever multi-modal 3D scene dataset and benchmark with hierarchical grounded language annotations, MMScan.<n>The resulting multi-modal 3D dataset encompasses 1.4M meta-annotated captions on 109k objects and 7.7k regions as well as over 3.04M diverse samples for 3D visual grounding and question-answering benchmarks.
arXiv Detail & Related papers (2024-06-13T17:59:30Z)
ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames. Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z)
Semi-Weakly Supervised Object Kinematic Motion Prediction [56.282759127180306]
Given a 3D object, kinematic motion prediction aims to identify the mobile parts as well as the corresponding motion parameters. We propose a graph neural network to learn the map between hierarchical part-level segmentation and mobile parts parameters. The network predictions yield a large scale of 3D objects with pseudo labeled mobility information.
arXiv Detail & Related papers (2023-03-31T02:37:36Z)
Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology. Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z)
GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion. In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning. Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z)
City-scale Incremental Neural Mapping with Three-layer Sampling and Panoptic Representation [5.682979644056021]
We build a city-scale continual neural mapping system with a panoptic representation that consists of environment-level and instance-level modelling. Given a stream of sparse LiDAR point cloud, it maintains a dynamic generative model that maps 3D coordinates to signed distance field (SDF) values. To realize high fidelity mapping of instance under incomplete observation, category-specific prior is introduced to better model the geometric details.
arXiv Detail & Related papers (2022-09-28T13:14:40Z)
Deep residential representations: Using unsupervised learning to unlock elevation data for geo-demographic prediction [0.0]
LiDAR technology can be used to provide detailed three-dimensional elevation maps of urban and rural landscapes. To date, airborne LiDAR imaging has been predominantly confined to the environmental and archaeological domains. We consider the suitability of this data not just on its own but also as a source of data in combination with demographic features, thus providing a realistic use case for the embeddings.
arXiv Detail & Related papers (2021-12-02T17:10:52Z)
MDA-Net: Multi-Dimensional Attention-Based Neural Network for 3D Image Segmentation [4.221871357181261]
We propose a multi-dimensional attention network (MDA-Net) to efficiently integrate slice-wise, spatial, and channel-wise attention into a U-Net based network. We evaluate our model on the MICCAI iSeg and IBSR datasets, and the experimental results demonstrate consistent improvements over existing methods.
arXiv Detail & Related papers (2021-05-10T16:58:34Z)
PIG-Net: Inception based Deep Learning Architecture for 3D Point Cloud Segmentation [0.9137554315375922]
We propose a inception based deep network architecture called PIG-Net, that effectively characterizes the local and global geometric details of the point clouds. We perform an exhaustive experimental analysis of the PIG-Net architecture on two state-of-the-art datasets.
arXiv Detail & Related papers (2021-01-28T13:27:55Z)
Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges [52.624157840253204]
We present an urban-scale photogrammetric point cloud dataset with nearly three billion richly annotated points. Our dataset consists of large areas from three UK cities, covering about 7.6 km2 of the city landscape. We evaluate the performance of state-of-the-art algorithms on our dataset and provide a comprehensive analysis of the results.
arXiv Detail & Related papers (2020-09-07T14:47:07Z)
Weakly Supervised Semantic Segmentation in 3D Graph-Structured Point Clouds of Wild Scenes [36.07733308424772]
The deficiency of 3D segmentation labels is one of the main obstacles to effective point cloud segmentation. We propose a novel deep graph convolutional network-based framework for large-scale semantic scene segmentation in point clouds with sole 2D supervision.
arXiv Detail & Related papers (2020-04-26T23:02:23Z)
Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes. The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.