Related papers: LLM-Guided Material Inference for 3D Point Clouds

LLM-Guided Material Inference for 3D Point Clouds

URL: http://arxiv.org/abs/2512.03237v1
Date: Tue, 02 Dec 2025 21:14:04 GMT
Title: LLM-Guided Material Inference for 3D Point Clouds
Authors: Nafiseh Izadyar, Teseo Schneider,
Abstract summary: We introduce a two-stage large language model (LLM) based method for inferring material composition directly from 3D point clouds.<n>Across 1,000 shapes from Fusion/ABS and ShapeNet, our method achieves high semantic and material plausibility.<n>These results demonstrate that language models can serve as general-purpose priors for bridging geometric reasoning and material understanding in 3D data.
Score: 4.3442696471034985
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most existing 3D shape datasets and models focus solely on geometry, overlooking the material properties that determine how objects appear. We introduce a two-stage large language model (LLM) based method for inferring material composition directly from 3D point clouds with coarse segmentations. Our key insight is to decouple reasoning about what an object is from what it is made of. In the first stage, an LLM predicts the object's semantic; in the second stage, it assigns plausible materials to each geometric segment, conditioned on the inferred semantics. Both stages operate in a zero-shot manner, without task-specific training. Because existing datasets lack reliable material annotations, we evaluate our method using an LLM-as-a-Judge implemented in DeepEval. Across 1,000 shapes from Fusion/ABS and ShapeNet, our method achieves high semantic and material plausibility. These results demonstrate that language models can serve as general-purpose priors for bridging geometric reasoning and material understanding in 3D data.

Related papers

Point Linguist Model: Segment Any Object via Bridged Large 3D-Language Model [51.02616473941499]
3D object segmentation with Large Language Models (LLMs) has become a prevailing paradigm due to its broad semantics, task flexibility, and strong generalization.<n>However, this paradigm is hindered by representation misalignment: LLMs process high-level semantic tokens, whereas 3D point clouds convey only dense geometric structures.<n>We present the Point Linguist Model (PLM), a general framework that bridges the representation gap between LLMs and dense 3D point clouds.
arXiv Detail & Related papers (2025-09-09T15:01:28Z)
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation [91.94869042117621]
Reasoning segmentation aims to segment target objects in complex scenes based on human intent and spatial reasoning.<n>Recent multimodal large language models (MLLMs) have demonstrated impressive 2D image reasoning segmentation.<n>We introduce MLLM-For3D, a framework that transfers knowledge from 2D MLLMs to 3D scene understanding.
arXiv Detail & Related papers (2025-03-23T16:40:20Z)
Toward General Object-level Mapping from Sparse Views with 3D Diffusion Priors [8.701106353658346]
General Object-level mapping builds a 3D map of objects in a scene with detailed shapes and poses from multi-view sensor observations. Recent work introduces generative shape priors for object-level mapping from sparse views, but is limited to single-category objects. In this work, we propose a General Object-level Mapping system, GOM, which leverages a 3D diffusion model as shape prior with multi-category support and outputs Neural Radiance Fields (NeRFs) for both texture and geometry for all objects in a scene.
arXiv Detail & Related papers (2024-10-07T21:33:30Z)
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations [55.022519020409405]
This paper builds the first largest ever multi-modal 3D scene dataset and benchmark with hierarchical grounded language annotations, MMScan.<n>The resulting multi-modal 3D dataset encompasses 1.4M meta-annotated captions on 109k objects and 7.7k regions as well as over 3.04M diverse samples for 3D visual grounding and question-answering benchmarks.
arXiv Detail & Related papers (2024-06-13T17:59:30Z)
MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets [63.284244910964475]
We propose a 3D asset material generation framework to infer underlying material from the 2D semantic prior. Based on such a prior model, we devise a mechanism to parse material in 3D space.
arXiv Detail & Related papers (2024-04-22T07:00:17Z)
Unified Scene Representation and Reconstruction for 3D Large Language Models [40.693839066536505]
Existing approaches extract point clouds either from ground truth (GT) geometry or 3D scenes reconstructed by auxiliary models. We introduce Uni3DR2 extracts 3D geometric and semantic aware representation features via the frozen 2D foundation models. Our learned 3D representations not only contribute to the reconstruction process but also provide valuable knowledge for LLMs.
arXiv Detail & Related papers (2024-04-19T17:58:04Z)
Leveraging Large-Scale Pretrained Vision Foundation Models for Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task. Our approach involves making initial predictions of 2D semantic masks using different large vision models. To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z)
PointLLM: Empowering Large Language Models to Understand Point Clouds [63.39876878899682]
PointLLM understands colored object point clouds with human instructions. It generates contextually appropriate responses, illustrating its grasp of point clouds and common sense.
arXiv Detail & Related papers (2023-08-31T17:59:46Z)
3DLatNav: Navigating Generative Latent Spaces for Semantic-Aware 3D Object Manipulation [2.8661021832561757]
3D generative models have been recently successful in generating realistic 3D objects in the form of point clouds. Most models do not offer controllability to manipulate the shape semantics of component object parts without extensive semantic labels or other reference point clouds. We propose 3DLatNav; a novel approach to navigating pretrained generative latent spaces to enable controlled part-level semantic manipulation of 3D objects.
arXiv Detail & Related papers (2022-11-17T18:47:56Z)
RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets. Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications. In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.