GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models
- URL: http://arxiv.org/abs/2511.11134v1
- Date: Fri, 14 Nov 2025 10:07:53 GMT
- Title: GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models
- Authors: Jingxuan Wei, Caijun Jia, Xi Bai, Xinglong Xu, Siyuan Li, Linzhuang Sun, Bihui Yu, Conghui He, Lijun Wu, Cheng Tan,
- Abstract summary: GGBench is a benchmark designed specifically to evaluate geometric generative reasoning.<n>It provides a comprehensive framework for systematically diagnosing a model's ability to not only understand and reason but to actively construct a solution.
- Score: 37.832076253514735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advent of Unified Multimodal Models (UMMs) signals a paradigm shift in artificial intelligence, moving from passive perception to active, cross-modal generation. Despite their unprecedented ability to synthesize information, a critical gap persists in evaluation: existing benchmarks primarily assess discriminative understanding or unconstrained image generation separately, failing to measure the integrated cognitive process of generative reasoning. To bridge this gap, we propose that geometric construction provides an ideal testbed as it inherently demands a fusion of language comprehension and precise visual generation. We introduce GGBench, a benchmark designed specifically to evaluate geometric generative reasoning. It provides a comprehensive framework for systematically diagnosing a model's ability to not only understand and reason but to actively construct a solution, thereby setting a more rigorous standard for the next generation of intelligent systems. Project website: https://opendatalab-raiser.github.io/GGBench/.
Related papers
- UniG2U-Bench: Do Unified Models Advance Multimodal Understanding? [50.92401586025528]
Unified multimodal models have recently demonstrated strong generative capabilities, yet whether and when generation improves understanding remains unclear.<n>We introduce UniG2U-Bench, a comprehensive benchmark categorizing generation-to-understanding (G2U) evaluation into 7 regimes and 30 subtasks.
arXiv Detail & Related papers (2026-03-03T18:36:16Z) - From Laboratory to Real-World Applications: Benchmarking Agentic Code Reasoning at the Repository Level [38.24989792739013]
We present RepoReason, a diagnostic benchmark centered on abductive assertion verification.<n>We implement an execution-driven mutation framework that utilizes the environment as a semantic to regenerate ground-truth states.<n>Our findings provide granular white-box insights for optimizing the next generation of agentic software engineering.
arXiv Detail & Related papers (2026-01-07T09:22:28Z) - Generative Adversarial Gumbel MCTS for Abstract Visual Composition Generation [29.755551944026738]
We study abstract visual composition in which identity is determined by the configuration and relations among a small set of geometric primitives.<n>An AlphaGo-style search enforces feasibility, while a fine-tuned vision-language model scores semantic alignment as reward signals.<n>Inspired by the Generative Adversarial Network, we use the generated instances for adversarial reward refinement.
arXiv Detail & Related papers (2025-12-01T03:38:44Z) - A Survey on Generative Recommendation: Data, Model, and Tasks [55.36322811257545]
generative recommendation reconceptualizes recommendation as a generation task rather than discriminative scoring.<n>This survey provides a comprehensive examination through a unified tripartite framework spanning data, model, and task dimensions.<n>We identify five key advantages: world knowledge integration, natural language understanding, reasoning capabilities, scaling laws, and creative generation.
arXiv Detail & Related papers (2025-10-31T04:02:58Z) - GIR-Bench: Versatile Benchmark for Generating Images with Reasoning [40.09327641816171]
Unified multimodal models integrate the reasoning capacity of large language models with both image understanding and generation.<n>We introduce textbfGIR-Bench, a comprehensive benchmark that evaluates unified models across three complementary perspectives.
arXiv Detail & Related papers (2025-10-13T05:50:44Z) - RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark [71.3555284685426]
We introduce RealUnify, a benchmark designed to evaluate bidirectional capability synergy.<n>RealUnify comprises 1,000 meticulously human-annotated instances spanning 10 categories and 32 subtasks.<n>We find that current unified models still struggle to achieve effective synergy, indicating that architectural unification alone is insufficient.
arXiv Detail & Related papers (2025-09-29T15:07:28Z) - Riemannian-Geometric Fingerprints of Generative Models [10.098284109691138]
We propose a new definition of artifact and fingerprint of generative models (GMs)<n>We apply our theory to a new gradient-based algorithm for computing the fingerprints in practice.<n>Results show that it is more effective in distinguishing a large array of GMs, spanning across 4 different datasets in 2 different resolutions.
arXiv Detail & Related papers (2025-06-28T08:08:16Z) - NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation [23.592137999309546]
NeSyGeo is a novel neuro-symbolic framework for generating geometric reasoning data.<n>We release a new benchmark NeSyGeo-Test for evaluating geometric reasoning abilities in MLLMs.
arXiv Detail & Related papers (2025-05-21T16:45:49Z) - MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO [87.52631406241456]
Recent text-to-image systems face limitations in handling multimodal inputs and complex reasoning tasks.<n>We introduce Mind Omni, a unified multimodal large language model that addresses these challenges by incorporating reasoning generation through reinforcement learning.
arXiv Detail & Related papers (2025-05-19T12:17:04Z) - Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - Generalization Metrics for Practical Quantum Advantage in Generative
Models [68.8204255655161]
Generative modeling is a widely accepted natural use case for quantum computers.
We construct a simple and unambiguous approach to probe practical quantum advantage for generative modeling by measuring the algorithm's generalization performance.
Our simulation results show that our quantum-inspired models have up to a $68 times$ enhancement in generating unseen unique and valid samples.
arXiv Detail & Related papers (2022-01-21T16:35:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.