Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects
- URL: http://arxiv.org/abs/2504.08125v1
- Date: Thu, 10 Apr 2025 20:57:40 GMT
- Title: Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects
- Authors: Shalini Maiti, Lourdes Agapito, Filippos Kokkinos,
- Abstract summary: We introduce Gen3DEval, a novel evaluation framework for 3D object quality assessment.<n>Gen3DEval evaluates text fidelity, appearance, and surface quality by analyzing 3D surface normals.<n>Compared to state-of-the-art task-agnostic models, Gen3DEval demonstrates superior performance in user-aligned evaluations.
- Score: 13.333670988010864
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Rapid advancements in text-to-3D generation require robust and scalable evaluation metrics that align closely with human judgment, a need unmet by current metrics such as PSNR and CLIP, which require ground-truth data or focus only on prompt fidelity. To address this, we introduce Gen3DEval, a novel evaluation framework that leverages vision large language models (vLLMs) specifically fine-tuned for 3D object quality assessment. Gen3DEval evaluates text fidelity, appearance, and surface quality by analyzing 3D surface normals, without requiring ground-truth comparisons, bridging the gap between automated metrics and user preferences. Compared to state-of-the-art task-agnostic models, Gen3DEval demonstrates superior performance in user-aligned evaluations, placing it as a comprehensive and accessible benchmark for future research on text-to-3D generation. The project page can be found here: \href{https://shalini-maiti.github.io/gen3deval.github.io/}{https://shalini-maiti.github.io/gen3deval.github.io/}.
Related papers
- Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation [134.53804996949287]
We introduce Eval3D, a fine-grained, interpretable evaluation tool that can faithfully evaluate the quality of generated 3D assets.
Our key observation is that many desired properties of 3D generation, such as semantic and geometric consistency, can be effectively captured.
Compared to prior work, Eval3D provides pixel-wise measurement, enables accurate 3D spatial feedback, and aligns more closely with human judgments.
arXiv Detail & Related papers (2025-04-25T17:22:05Z) - 3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models [94.48803082248872]
3D generation is experiencing rapid advancements, while the development of 3D evaluation has not kept pace.<n>We develop a large-scale human preference dataset 3DGen-Bench.<n>We then train a CLIP-based scoring model, 3DGen-Score, and a MLLM-based automatic evaluator, 3DGen-Eval.
arXiv Detail & Related papers (2025-03-27T17:53:00Z) - IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes [10.139461308573336]
IRef-VLA is the largest real-world dataset for the referential grounding task consisting of over 11.5K scanned 3D rooms.<n>We aim to provide a resource for 3D scene understanding that aids the development of robust, interactive navigation systems.
arXiv Detail & Related papers (2025-03-20T16:16:10Z) - Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation [26.0726219629689]
Text-to-3D generation has achieved remarkable progress in recent years, yet evaluating these methods remains challenging.<n>Existing benchmarks lack fine-grained evaluation on different prompt categories and evaluation dimensions.<n>We first propose a comprehensive benchmark named MATE-3D.<n>The benchmark contains eight well-designed prompt categories that cover single and multiple object generation, resulting in 1,280 generated textured meshes.
arXiv Detail & Related papers (2024-12-15T12:41:44Z) - Grounded 3D-LLM with Referent Tokens [58.890058568493096]
We propose Grounded 3D-LLM to consolidate various 3D vision tasks within a unified generative framework.
The model uses scene referent tokens as special noun phrases to reference 3D scenes.
Per-task instruction-following templates are employed to ensure natural and diversity in translating 3D vision tasks into language formats.
arXiv Detail & Related papers (2024-05-16T18:03:41Z) - GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation [93.55550787058012]
This paper presents an automatic, versatile, and human-aligned evaluation metric for text-to-3D generative models.
To this end, we first develop a prompt generator using GPT-4V to generate evaluating prompts.
We then design a method instructing GPT-4V to compare two 3D assets according to user-defined criteria.
arXiv Detail & Related papers (2024-01-08T18:52:09Z) - T$^3$Bench: Benchmarking Current Progress in Text-to-3D Generation [52.029698642883226]
Methods in text-to-3D leverage powerful pretrained diffusion models to optimize NeRF.
Most studies evaluate their results with subjective case studies and user experiments.
We introduce T$3$Bench, the first comprehensive text-to-3D benchmark.
arXiv Detail & Related papers (2023-10-04T17:12:18Z) - From 2D to 3D: Re-thinking Benchmarking of Monocular Depth Prediction [80.67873933010783]
We argue that MDP is currently witnessing benchmark over-fitting and relying on metrics that are only partially helpful to gauge the usefulness of the predictions for 3D applications.
This limits the design and development of novel methods that are truly aware of - and improving towards estimating - the 3D structure of the scene rather than optimizing 2D-based distances.
We propose a set of metrics well suited to evaluate the 3D geometry of MDP approaches and a novel indoor benchmark, RIO-D3D, crucial for the proposed evaluation methodology.
arXiv Detail & Related papers (2022-03-15T17:50:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.