Related papers: GENIUS: Generative Fluid Intelligence Evaluation Suite

GENIUS: Generative Fluid Intelligence Evaluation Suite

URL: http://arxiv.org/abs/2602.11144v1
Date: Wed, 11 Feb 2026 18:55:54 GMT
Title: GENIUS: Generative Fluid Intelligence Evaluation Suite
Authors: Ruichuan An, Sihan Yang, Ziyu Guo, Wei Dai, Zijun Shen, Haodong Li, Renrui Zhang, Xinyu Wei, Guopeng Li, Wenshan Wu, Wentao Zhang,
Abstract summary: We introduce $textbfGENIUS$ ($textbfGEN$ Fluid $textbfI$ntelligence Eval$textbfU$ation $textbfS$uite)<n>We formalize $textitGFI$ as a synthesis of three primitives. These include $textitInducing Implicit Patterns$ (e.g., inferring personalized visual preferences), $textitExecuting Ad-hoc Constraints$ (e.g., visualizing abstract metaphors), and
Score: 45.98061608718251
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Unified Multimodal Models (UMMs) have shown remarkable progress in visual generation. Yet, existing benchmarks predominantly assess $\textit{Crystallized Intelligence}$, which relies on recalling accumulated knowledge and learned schemas. This focus overlooks $\textit{Generative Fluid Intelligence (GFI)}$: the capacity to induce patterns, reason through constraints, and adapt to novel scenarios on the fly. To rigorously assess this capability, we introduce $\textbf{GENIUS}$ ($\textbf{GEN}$ Fluid $\textbf{I}$ntelligence Eval$\textbf{U}$ation $\textbf{S}$uite). We formalize $\textit{GFI}$ as a synthesis of three primitives. These include $\textit{Inducing Implicit Patterns}$ (e.g., inferring personalized visual preferences), $\textit{Executing Ad-hoc Constraints}$ (e.g., visualizing abstract metaphors), and $\textit{Adapting to Contextual Knowledge}$ (e.g., simulating counter-intuitive physics). Collectively, these primitives challenge models to solve problems grounded entirely in the immediate context. Our systematic evaluation of 12 representative models reveals significant performance deficits in these tasks. Crucially, our diagnostic analysis disentangles these failure modes. It demonstrates that deficits stem from limited context comprehension rather than insufficient intrinsic generative capability. To bridge this gap, we propose a training-free attention intervention strategy. Ultimately, $\textbf{GENIUS}$ establishes a rigorous standard for $\textit{GFI}$, guiding the field beyond knowledge utilization toward dynamic, general-purpose reasoning. Our dataset and code will be released at: $\href{https://github.com/arctanxarc/GENIUS}{https://github.com/arctanxarc/GENIUS}$.

Related papers

ConvexBench: Can LLMs Recognize Convex Functions? [70.53167848190624]
Convex analysis is a modern branch of mathematics with many applications.<n>As Large Language Models (LLMs) start to automate research-level math and sciences, it is important for LLMs to demonstrate the ability to understand and reason with convexity.<n>We introduce cb, a scalable and mechanically verifiable benchmark for testing textitwhether LLMs can identify the convexity of a symbolic objective under deep functional composition.
arXiv Detail & Related papers (2026-02-01T07:41:17Z)
VK-Det: Visual Knowledge Guided Prototype Learning for Open-Vocabulary Aerial Object Detection [6.72903082348742]
We propose a text-guided open-vocabulary object $textbfDet$ection framework.<n>We discover and leverage vision encoder's inherent informative region perception to attain fine-grained localization and adaptive distillation.<n>Experiments show state-of-the-art performance, achieving 30.1 $mathrmmAPN$ on DIOR and 23.3 $mathrmmAPN$ on DOTA, outperforming even extra supervised methods.
arXiv Detail & Related papers (2025-11-22T14:19:59Z)
Reliability, Embeddedness, and Agency: A Utility-Driven Mathematical Framework for Agent-Centric AI Adoption [0.0]
We formalize three axioms for sustained adoption of agent-centric AI systems executing multi-step tasks.<n>We model adoption as a sum of a decaying novelty term and a growing utility term.
arXiv Detail & Related papers (2025-08-18T12:53:38Z)
UniF$^2$ace: A Unified Fine-grained Face Understanding and Generation Model [62.66515621965686]
We introduce a novel theoretical framework with a Dual Discrete Diffusion (D3Diff) loss, unifying masked generative models with discrete score matching diffusion.<n>This D3Diff significantly enhances the model's ability to synthesize high-fidelity facial details aligned with text input.<n>We construct UniF$2$aceD-1M, a large-scale dataset comprising 130K fine-grained image-caption pairs and 1M visual question-answering pairs.
arXiv Detail & Related papers (2025-03-11T07:34:59Z)
Uncovering Untapped Potential in Sample-Efficient World Model Agents [51.65485693709418]
Simulus is a highly modular TBWM agent that integrates a multi-modality tokenization framework, intrinsic motivation, prioritized WM replay, and regression-as-classification.<n>Simulus achieves state-of-the-art sample efficiency for planning-free WMs across three diverse benchmarks.
arXiv Detail & Related papers (2025-02-17T08:06:10Z)
Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis [55.561961365113554]
3D Gaussian Splatting (3DGS) has demonstrated remarkable effectiveness in novel view synthesis (NVS)<n>In this paper, we introduce Self-Ensembling Gaussian Splatting (SE-GS)<n>We achieve self-ensembling by incorporating an uncertainty-aware perturbation strategy during training.<n> Experimental results on the LLFF, Mip-NeRF360, DTU, and MVImgNet datasets demonstrate that our approach enhances NVS quality under few-shot training conditions.
arXiv Detail & Related papers (2024-10-31T18:43:48Z)
FLARE: Faithful Logic-Aided Reasoning and Exploration [47.46564769245296]
We introduce a novel approach for traversing the problem space using task decompositions.<n>We use the Large Language Models to plan a solution, soft-formalise the query into facts and predicates using a logic programming code.<n>Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers.
arXiv Detail & Related papers (2024-10-14T19:39:11Z)
Inertial Confinement Fusion Forecasting via Large Language Models [48.76222320245404]
In this study, we introduce $textbfLPI-LLM$, a novel integration of Large Language Models (LLMs) with classical reservoir computing paradigms. We propose the $textitLLM-anchored Reservoir$, augmented with a $textitFusion-specific Prompt$, enabling accurate forecasting of $textttLPI$-generated-hot electron dynamics during implosion. We also present $textbfLPI4AI$, the first $textttLPI$ benchmark based
arXiv Detail & Related papers (2024-07-15T05:46:44Z)
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training [42.89066583603415]
This work identifies three critical $textitO$bstacles: lack of comprehensive evaluation, ($textitO$2) untested viability for scaling, and ($textitO$3) lack of empirical guidelines. We show that a depthwise stacking operator, called $G_textstack$, exhibits remarkable acceleration in training, leading to decreased loss and improved overall performance.
arXiv Detail & Related papers (2024-05-24T08:00:00Z)
Mechanics of Next Token Prediction with Self-Attention [41.82477691012942]
Transformer-based language models are trained on large datasets to predict the next token given an input sequence. We show that training self-attention with gradient descent learns an automaton which generates the next token in two distinct steps. We hope that these findings shed light on how self-attention processes sequential data and pave the path toward demystifying more complex architectures.
arXiv Detail & Related papers (2024-03-12T21:15:38Z)
Unsupervised Semantic Segmentation by Distilling Feature Correspondences [94.73675308961944]
Unsupervised semantic segmentation aims to discover and localize semantically meaningful categories within image corpora without any form of annotation. We present STEGO, a novel framework that distills unsupervised features into high-quality discrete semantic labels. STEGO yields a significant improvement over the prior state of the art, on both the CocoStuff and Cityscapes challenges.
arXiv Detail & Related papers (2022-03-16T06:08:47Z)
Learning to extrapolate using continued fractions: Predicting the critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective. We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.