Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation
- URL: http://arxiv.org/abs/2412.11170v1
- Date: Sun, 15 Dec 2024 12:41:44 GMT
- Title: Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation
- Authors: Yujie Zhang, Bingyang Cui, Qi Yang, Zhu Li, Yiling Xu,
- Abstract summary: Text-to-3D generation has achieved remarkable progress in recent years, yet evaluating these methods remains challenging.
Existing benchmarks lack fine-grained evaluation on different prompt categories and evaluation dimensions.
We first propose a comprehensive benchmark named MATE-3D.
The benchmark contains eight well-designed prompt categories that cover single and multiple object generation, resulting in 1,280 generated textured meshes.
- Score: 26.0726219629689
- License:
- Abstract: Text-to-3D generation has achieved remarkable progress in recent years, yet evaluating these methods remains challenging for two reasons: i) Existing benchmarks lack fine-grained evaluation on different prompt categories and evaluation dimensions. ii) Previous evaluation metrics only focus on a single aspect (e.g., text-3D alignment) and fail to perform multi-dimensional quality assessment. To address these problems, we first propose a comprehensive benchmark named MATE-3D. The benchmark contains eight well-designed prompt categories that cover single and multiple object generation, resulting in 1,280 generated textured meshes. We have conducted a large-scale subjective experiment from four different evaluation dimensions and collected 107,520 annotations, followed by detailed analyses of the results. Based on MATE-3D, we propose a novel quality evaluator named HyperScore. Utilizing hypernetwork to generate specified mapping functions for each evaluation dimension, our metric can effectively perform multi-dimensional quality assessment. HyperScore presents superior performance over existing metrics on MATE-3D, making it a promising metric for assessing and improving text-to-3D generation. The project is available at https://mate-3d.github.io/.
Related papers
- GT23D-Bench: A Comprehensive General Text-to-3D Generation Benchmark [111.81516104467039]
GT23D-Bench is first comprehensive benchmark for General Text-to-3D (GT23D)
Our dataset annotates each 3D object with 64-view depth maps, normal maps, rendered images, and coarse-to-fine captions.
Our metrics are dissected into a) Textual-3D Alignment measures textual alignment with multi-granularity visual 3D representations; and b) 3D Visual Quality which considers texture fidelity, multi-view consistency, and geometry correctness.
arXiv Detail & Related papers (2024-12-13T09:32:08Z) - 3DGCQA: A Quality Assessment Database for 3D AI-Generated Contents [50.730468291265886]
This paper introduces a novel 3DGC quality assessment dataset, 3DGCQA, built using 7 representative Text-to-3D generation methods.
The visualization intuitively reveals the presence of 6 common distortion categories in the generated 3DGCs.
subjective quality assessment is conducted by evaluators, whose ratings reveal significant variation in quality across different generation methods.
Several objective quality assessment algorithms are tested on the 3DGCQA dataset.
arXiv Detail & Related papers (2024-09-11T12:47:40Z) - TAPVid-3D: A Benchmark for Tracking Any Point in 3D [63.060421798990845]
We introduce a new benchmark, TAPVid-3D, for evaluating the task of Tracking Any Point in 3D.
This benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.
arXiv Detail & Related papers (2024-07-08T13:28:47Z) - MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations [55.022519020409405]
This paper builds the first largest ever multi-modal 3D scene dataset and benchmark with hierarchical grounded language annotations, MMScan.
The resulting multi-modal 3D dataset encompasses 1.4M meta-annotated captions on 109k objects and 7.7k regions as well as over 3.04M diverse samples for 3D visual grounding and question-answering benchmarks.
arXiv Detail & Related papers (2024-06-13T17:59:30Z) - T$^3$Bench: Benchmarking Current Progress in Text-to-3D Generation [52.029698642883226]
Methods in text-to-3D leverage powerful pretrained diffusion models to optimize NeRF.
Most studies evaluate their results with subjective case studies and user experiments.
We introduce T$3$Bench, the first comprehensive text-to-3D benchmark.
arXiv Detail & Related papers (2023-10-04T17:12:18Z) - From 2D to 3D: Re-thinking Benchmarking of Monocular Depth Prediction [80.67873933010783]
We argue that MDP is currently witnessing benchmark over-fitting and relying on metrics that are only partially helpful to gauge the usefulness of the predictions for 3D applications.
This limits the design and development of novel methods that are truly aware of - and improving towards estimating - the 3D structure of the scene rather than optimizing 2D-based distances.
We propose a set of metrics well suited to evaluate the 3D geometry of MDP approaches and a novel indoor benchmark, RIO-D3D, crucial for the proposed evaluation methodology.
arXiv Detail & Related papers (2022-03-15T17:50:54Z) - Subjective and Objective Visual Quality Assessment of Textured 3D Meshes [3.738515725866836]
We present a new subjective study to evaluate the perceptual quality of textured meshes, based on a paired comparison protocol.
We propose two new metrics for visual quality assessment of textured mesh, as optimized linear combinations of accurate geometry and texture quality measurements.
arXiv Detail & Related papers (2021-02-08T03:26:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.