Quantifying Model Uniqueness in Heterogeneous AI Ecosystems
- URL: http://arxiv.org/abs/2601.22977v1
- Date: Fri, 30 Jan 2026 13:41:53 GMT
- Title: Quantifying Model Uniqueness in Heterogeneous AI Ecosystems
- Authors: Lei You,
- Abstract summary: We introduce a statistical framework for auditing model uniqueness based on In-Silico Quasi-Experimental Design.<n>By enforcing matched interventions across models, we isolate intrinsic model identity and quantify uniqueness as the Peer-Inexpressible Residual (PIER)<n>These results move trustworthy AI beyond explaining single models.
- Score: 1.1162481475388237
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As AI systems evolve from isolated predictors into complex, heterogeneous ecosystems of foundation models and specialized adapters, distinguishing genuine behavioral novelty from functional redundancy becomes a critical governance challenge. Here, we introduce a statistical framework for auditing model uniqueness based on In-Silico Quasi-Experimental Design (ISQED). By enforcing matched interventions across models, we isolate intrinsic model identity and quantify uniqueness as the Peer-Inexpressible Residual (PIER), i.e. the component of a target's behavior strictly irreducible to any stochastic convex combination of its peers, with vanishing PIER characterizing when such a routing-based substitution becomes possible. We establish the theoretical foundations of ecosystem auditing through three key contributions. First, we prove a fundamental limitation of observational logs: uniqueness is mathematically non-identifiable without intervention control. Second, we derive a scaling law for active auditing, showing that our adaptive query protocol achieves minimax-optimal sample efficiency ($dσ^2γ^{-2}\log(Nd/δ)$). Third, we demonstrate that cooperative game-theoretic methods, such as Shapley values, fundamentally fail to detect redundancy. We implement this framework via the DISCO (Design-Integrated Synthetic Control) estimator and deploy it across diverse ecosystems, including computer vision models (ResNet/ConvNeXt/ViT), large language models (BERT/RoBERTa), and city-scale traffic forecasters. These results move trustworthy AI beyond explaining single models: they establish a principled, intervention-based science of auditing and governing heterogeneous model ecosystems.
Related papers
- Hierarchical Inference and Closure Learning via Adaptive Surrogates for ODEs and PDEs [15.38864225184245]
Inverse problems are the task of calibrating models to match data.<n>We develop a principled methodology for leveraging data from collections of distinct yet related physical systems.<n>We learn the shared unknown dynamics in the form of an ML-based closure model.
arXiv Detail & Related papers (2026-03-04T10:30:08Z) - The Emergence of Lab-Driven Alignment Signatures: A Psychometric Framework for Auditing Latent Bias and Compounding Risk in Generative AI [0.0]
This paper introduces a novel auditing framework to quantify latent trait estimation under ordinal uncertainty.<n>The research audits nine leading models across dimensions including Optimization Bias, Sycophancy, and Status-Quo Legitimization.
arXiv Detail & Related papers (2026-02-19T06:56:01Z) - AI-NativeBench: An Open-Source White-Box Agentic Benchmark Suite for AI-Native Systems [52.65695508605237]
We introduce AI-NativeBench, the first application-centric and white-box AI-Native benchmark suite grounded in Model Context Protocol (MCP) and Agent-to-Agent (A2A) standards.<n>By treating agentic spans as first-class citizens within distributed traces, our methodology enables granular analysis of engineering characteristics beyond simple capabilities.<n>This work provides the first systematic evidence to guide the transition from measuring model capability to engineering reliable AI-Native systems.
arXiv Detail & Related papers (2026-01-14T11:32:07Z) - On the Limits of Self-Improving in LLMs and Why AGI, ASI and the Singularity Are Not Near Without Symbolic Model Synthesis [0.01269104766024433]
We formalise self-training in Large Language Models (LLMs) and Generative AI as a discrete-time dynamical system.<n>We derive two fundamental failure modes: (1) Entropy Decay, where finite sampling effects cause a monotonic loss of distributional diversity (mode collapse), and (2) Variance Amplification, where the loss of external grounding causes the model's representation of truth to drift as a random walk.
arXiv Detail & Related papers (2026-01-05T19:50:49Z) - STORE: Semantic Tokenization, Orthogonal Rotation and Efficient Attention for Scaling Up Ranking Models [11.965535230928372]
Store is a unified and scalable token-based ranking framework built upon three core innovations.<n>Our framework consistently improves prediction accuracy(online CTR by 2.71%, AUC by 1.195%) and training effeciency (1.84 throughput)
arXiv Detail & Related papers (2025-11-24T06:20:02Z) - RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark [71.3555284685426]
We introduce RealUnify, a benchmark designed to evaluate bidirectional capability synergy.<n>RealUnify comprises 1,000 meticulously human-annotated instances spanning 10 categories and 32 subtasks.<n>We find that current unified models still struggle to achieve effective synergy, indicating that architectural unification alone is insufficient.
arXiv Detail & Related papers (2025-09-29T15:07:28Z) - Model Correlation Detection via Random Selection Probing [62.093777777813756]
Existing similarity-based methods require access to model parameters or produce scores without thresholds.<n>We introduce Random Selection Probing (RSP), a hypothesis-testing framework that formulates model correlation detection as a statistical test.<n>RSP produces rigorous p-values that quantify evidence of correlation.
arXiv Detail & Related papers (2025-09-29T01:40:26Z) - A2R: An Asymmetric Two-Stage Reasoning Framework for Parallel Reasoning [57.727084580884075]
Asymmetric Two-Stage Reasoning framework designed to bridge gap between a model's potential and its actual performance.<n>A2R-Efficient is a "small-to-big" variant that combines a Qwen3-4B explorer with a Qwen3-8B synthesizer.<n>Results show A2R is not only a performance-boosting framework but also an efficient and practical solution for real-world applications.
arXiv Detail & Related papers (2025-09-26T08:27:03Z) - On the Reasoning Capacity of AI Models and How to Quantify It [0.0]
Large Language Models (LLMs) have intensified the debate surrounding the fundamental nature of their reasoning capabilities.<n>While achieving high performance on benchmarks such as GPQA and MMLU, these models exhibit limitations in more complex reasoning tasks.<n>We propose a novel phenomenological approach that goes beyond traditional accuracy metrics to probe the underlying mechanisms of model behavior.
arXiv Detail & Related papers (2025-01-23T16:58:18Z) - Identifiable Representation and Model Learning for Latent Dynamic Systems [0.0]
We study the problem of identifiable representation and model learning for latent dynamic systems.<n>We prove that, for linear and affine nonlinear latent dynamic systems with sparse input matrices, it is possible to identify the latent variables up to scaling.
arXiv Detail & Related papers (2024-10-23T13:55:42Z) - Correct-by-Construction Control for Stochastic and Uncertain Dynamical
Models via Formal Abstractions [44.99833362998488]
We develop an abstraction framework that can be used to solve this problem under various modeling assumptions.
We use state-of-the-art verification techniques to compute an optimal policy on the iMDP with guarantees for satisfying the given specification.
We then show that, by construction, we can refine this policy into a feedback controller for which these guarantees carry over to the dynamical model.
arXiv Detail & Related papers (2023-11-16T11:03:54Z) - Understanding, Predicting and Better Resolving Q-Value Divergence in
Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z) - Explaining a Series of Models by Propagating Local Feature Attributions [9.66840768820136]
Pipelines involving several machine learning models improve performance in many domains but are difficult to understand.
We introduce a framework to propagate local feature attributions through complex pipelines of models based on a connection to the Shapley value.
Our framework enables us to draw higher-level conclusions based on groups of gene expression features for Alzheimer's and breast cancer histologic grade prediction.
arXiv Detail & Related papers (2021-04-30T22:20:58Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.