SRAM: Shape-Realism Alignment Metric for No Reference 3D Shape Evaluation
- URL: http://arxiv.org/abs/2512.01373v1
- Date: Mon, 01 Dec 2025 07:40:11 GMT
- Title: SRAM: Shape-Realism Alignment Metric for No Reference 3D Shape Evaluation
- Authors: Sheng Liu, Tianyu Luan, Phani Nuney, Xuelu Feng, Junsong Yuan,
- Abstract summary: We propose a Shape-Realism Alignment Metric that leverages a large language model (LLM) as a bridge between mesh shape information and realism evaluation.<n>A dedicated realism decoder is designed to align the language model's output with human perception of realism.<n>We introduce a new dataset, RealismGrading, which provides human-annotated realism scores without the need for ground truth shapes.
- Score: 35.38154061503059
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D generation and reconstruction techniques have been widely used in computer games, film, and other content creation areas. As the application grows, there is a growing demand for 3D shapes that look truly realistic. Traditional evaluation methods rely on a ground truth to measure mesh fidelity. However, in many practical cases, a shape's realism does not depend on having a ground truth reference. In this work, we propose a Shape-Realism Alignment Metric that leverages a large language model (LLM) as a bridge between mesh shape information and realism evaluation. To achieve this, we adopt a mesh encoding approach that converts 3D shapes into the language token space. A dedicated realism decoder is designed to align the language model's output with human perception of realism. Additionally, we introduce a new dataset, RealismGrading, which provides human-annotated realism scores without the need for ground truth shapes. Our dataset includes shapes generated by 16 different algorithms on over a dozen objects, making it more representative of practical 3D shape distributions. We validate our metric's performance and generalizability through k-fold cross-validation across different objects. Experimental results show that our metric correlates well with human perceptions and outperforms existing methods, and has good generalizability.
Related papers
- Textured Geometry Evaluation: Perceptual 3D Textured Shape Metric via 3D Latent-Geometry Network [47.25343775786203]
Existing metrics, such as Chamfer Distance, often fail to align with how humans evaluate the fidelity of 3D shapes.<n>Recent learning-based metrics attempt to improve this by relying on rendered images and 2D image quality metrics.<n>We propose a new fidelity evaluation method that is based directly on 3D meshes with texture, without relying on rendering.
arXiv Detail & Related papers (2025-12-01T07:53:03Z) - AlignGS: Aligning Geometry and Semantics for Robust Indoor Reconstruction from Sparse Views [18.361136390711415]
The demand for semantically rich 3D models of indoor scenes is rapidly growing, driven by applications in augmented reality, virtual reality, and robotics.<n>Existing methods often treat semantics as a passive feature painted on an already-formed, and potentially flawed, geometry.<n>This paper introduces AlignGS, a novel framework that actualizes this vision by pioneering a synergistic, end-to-end optimization of geometry and semantics.
arXiv Detail & Related papers (2025-10-09T06:30:20Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Probing the 3D Awareness of Visual Foundation Models [56.68380136809413]
We analyze the 3D awareness of visual foundation models.
We conduct experiments using task-specific probes and zero-shot inference procedures on frozen features.
arXiv Detail & Related papers (2024-04-12T17:58:04Z) - FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding [11.118857208538039]
We present Foundation Model Embedded Gaussian Splatting (S), which incorporates vision-language embeddings of foundation models into 3D Gaussian Splatting (GS)
Results demonstrate remarkable multi-view semantic consistency, facilitating diverse downstream tasks, beating state-of-the-art methods by 10.2 percent on open-vocabulary language-based object detection.
This research explores the intersection of vision, language, and 3D scene representation, paving the way for enhanced scene understanding in uncontrolled real-world environments.
arXiv Detail & Related papers (2024-01-03T20:39:02Z) - Shape, Pose, and Appearance from a Single Image via Bootstrapped
Radiance Field Inversion [54.151979979158085]
We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available.
We leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution.
Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios.
arXiv Detail & Related papers (2022-11-21T17:42:42Z) - A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware
Image Synthesis [163.96778522283967]
We propose a shading-guided generative implicit model that is able to learn a starkly improved shape representation.
An accurate 3D shape should also yield a realistic rendering under different lighting conditions.
Our experiments on multiple datasets show that the proposed approach achieves photorealistic 3D-aware image synthesis.
arXiv Detail & Related papers (2021-10-29T10:53:12Z) - Fully Understanding Generic Objects: Modeling, Segmentation, and
Reconstruction [33.95791350070165]
Inferring 3D structure of a generic object from a 2D image is a long-standing objective of computer vision.
We take an alternative approach with semi-supervised learning. That is, for a 2D image of a generic object, we decompose it into latent representations of category, shape and albedo.
We show that the complete shape and albedo modeling enables us to leverage real 2D images in both modeling and model fitting.
arXiv Detail & Related papers (2021-04-02T02:39:29Z) - SHAD3S: A model to Sketch, Shade and Shadow [20.209172586699175]
Hatching is a common method used by artists to accentuate the third dimension of a sketch, and to illuminate the scene.
Our system SHAD3S attempts to compete with a human at hatching generic three-dimensional (3D) shapes, and also tries to assist her in a form exploration exercise.
arXiv Detail & Related papers (2020-11-13T09:25:46Z) - Learning to Reconstruct and Segment 3D Objects [4.709764624933227]
We aim to understand scenes and the objects within them by learning general and robust representations using deep neural networks.
This thesis makes three core contributions from object-level 3D shape estimation from single or multiple views to scene-level semantic understanding.
arXiv Detail & Related papers (2020-10-19T15:09:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.