Related papers: Towards Fine-Grained Text-to-3D Quality Assessment: A Benchmark and A Two-Stage Rank-Learning Metric

Towards Fine-Grained Text-to-3D Quality Assessment: A Benchmark and A Two-Stage Rank-Learning Metric

URL: http://arxiv.org/abs/2509.23841v2
Date: Wed, 05 Nov 2025 00:30:54 GMT
Title: Towards Fine-Grained Text-to-3D Quality Assessment: A Benchmark and A Two-Stage Rank-Learning Metric
Authors: Bingyang Cui, Yujie Zhang, Qi Yang, Zhu Li, Yiling Xu,
Abstract summary: Text-to-3D (T23D) generative models have enabled the synthesis of diverse, high-fidelity 3D assets from textual prompts.<n>Existing challenges restrict the development of reliable T23D quality assessment (T23DQA)<n>We introduce T23D-CompBench, a comprehensive benchmark for compositional T23D generation.<n>We also propose Rank2Score, an effective evaluator with two-stage training for T23DQA.
Score: 40.31630401986677
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in Text-to-3D (T23D) generative models have enabled the synthesis of diverse, high-fidelity 3D assets from textual prompts. However, existing challenges restrict the development of reliable T23D quality assessment (T23DQA). First, existing benchmarks are outdated, fragmented, and coarse-grained, making fine-grained metric training infeasible. Moreover, current objective metrics exhibit inherent design limitations, resulting in non-representative feature extraction and diminished metric robustness. To address these limitations, we introduce T23D-CompBench, a comprehensive benchmark for compositional T23D generation. We define five components with twelve sub-components for compositional prompts, which are used to generate 3,600 textured meshes from ten state-of-the-art generative models. A large-scale subjective experiment is conducted to collect 129,600 reliable human ratings across different perspectives. Based on T23D-CompBench, we further propose Rank2Score, an effective evaluator with two-stage training for T23DQA. Rank2Score enhances pairwise training via supervised contrastive regression and curriculum learning in the first stage, and subsequently refines predictions using mean opinion scores to achieve closer alignment with human judgments in the second stage. Extensive experiments and downstream applications demonstrate that Rank2Score consistently outperforms existing metrics across multiple dimensions and can additionally serve as a reward function to optimize generative models. The project is available at https://cbysjtu.github.io/Rank2Score/.

Related papers

Preference Score Distillation: Leveraging 2D Rewards to Align Text-to-3D Generation with Human Preference [69.34278282513593]
Preference Score Distillation (PSD) is an optimization-based framework for human-aligned text-to-3D synthesis without 3D training data.<n>Our key insight stems from the incompatibility of pixel-level gradients.<n>We introduce an adaptive strategy to co-optimize preference scores and negative text embeddings.
arXiv Detail & Related papers (2026-03-02T08:23:36Z)
D2D: Detector-to-Differentiable Critic for Improved Numeracy in Text-to-Image Generation [26.820694706602236]
Detector-to-Differentiable (D2D) is a novel framework that transforms non-differentiable detection models into differentiable critics.<n>Our experiments on SDXL-Turbo, SD-Turbo, and Pixart-DMD demonstrate consistent and substantial improvements in object counting accuracy.
arXiv Detail & Related papers (2025-10-22T06:27:05Z)
Multi-Dimensional Quality Assessment for Text-to-3D Assets: Dataset and Model [54.71130068043388]
Despite the growing popularity of text-to-3D asset generation, its evaluation has not been well considered and studied.<n>Given the significant quality discrepancies among various text-to-3D assets, there is a pressing need for quality assessment models aligned with human subjective judgments.<n>We first establish the largest text-to-3D asset quality assessment database to date, termed the AIGC-T23DAQA database.
arXiv Detail & Related papers (2025-02-24T07:20:13Z)
Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation [26.0726219629689]
Text-to-3D generation has achieved remarkable progress in recent years, yet evaluating these methods remains challenging.<n>Existing benchmarks lack fine-grained evaluation on different prompt categories and evaluation dimensions.<n>We first propose a comprehensive benchmark named MATE-3D.<n>The benchmark contains eight well-designed prompt categories that cover single and multiple object generation, resulting in 1,280 generated textured meshes.
arXiv Detail & Related papers (2024-12-15T12:41:44Z)
GT23D-Bench: A Comprehensive General Text-to-3D Generation Benchmark [111.81516104467039]
GT23D-Bench is first comprehensive benchmark for General Text-to-3D (GT23D)<n>Our dataset annotates each 3D object with 64-view depth maps, normal maps, rendered images, and coarse-to-fine captions.<n>Our metrics are dissected into a) Textual-3D Alignment measures textual alignment with multi-granularity visual 3D representations; and b) 3D Visual Quality which considers texture fidelity, multi-view consistency, and geometry correctness.
arXiv Detail & Related papers (2024-12-13T09:32:08Z)
A Contrastive Compositional Benchmark for Text-to-Image Synthesis: A Study with Unified Text-to-Image Fidelity Metrics [58.83242220266935]
We introduce Winoground-T2I, a benchmark designed to evaluate the compositionality of T2I models. This benchmark includes 11K complex, high-quality contrastive sentence pairs spanning 20 categories. We use Winoground-T2I with a dual objective: to evaluate the performance of T2I models and the metrics used for their evaluation.
arXiv Detail & Related papers (2023-12-04T20:47:48Z)
Text-to-3D with Classifier Score Distillation [80.14832887529259]
Classifier-free guidance is considered an auxiliary trick rather than the most essential. We name this method Score Distillation (CSD), which can be interpreted as using an implicit classification model for generation. We validate the effectiveness of CSD across a variety of text-to-3D tasks including shape generation, texture synthesis, and shape editing.
arXiv Detail & Related papers (2023-10-30T10:25:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.