Related papers: Quantifying Model Uniqueness in Heterogeneous AI Ecosystems

Quantifying Model Uniqueness in Heterogeneous AI Ecosystems

URL: http://arxiv.org/abs/2601.22977v1
Date: Fri, 30 Jan 2026 13:41:53 GMT
Title: Quantifying Model Uniqueness in Heterogeneous AI Ecosystems
Authors: Lei You,
Abstract summary: We introduce a statistical framework for auditing model uniqueness based on In-Silico Quasi-Experimental Design.<n>By enforcing matched interventions across models, we isolate intrinsic model identity and quantify uniqueness as the Peer-Inexpressible Residual (PIER)<n>These results move trustworthy AI beyond explaining single models.
Score: 1.1162481475388237
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As AI systems evolve from isolated predictors into complex, heterogeneous ecosystems of foundation models and specialized adapters, distinguishing genuine behavioral novelty from functional redundancy becomes a critical governance challenge. Here, we introduce a statistical framework for auditing model uniqueness based on In-Silico Quasi-Experimental Design (ISQED). By enforcing matched interventions across models, we isolate intrinsic model identity and quantify uniqueness as the Peer-Inexpressible Residual (PIER), i.e. the component of a target's behavior strictly irreducible to any stochastic convex combination of its peers, with vanishing PIER characterizing when such a routing-based substitution becomes possible. We establish the theoretical foundations of ecosystem auditing through three key contributions. First, we prove a fundamental limitation of observational logs: uniqueness is mathematically non-identifiable without intervention control. Second, we derive a scaling law for active auditing, showing that our adaptive query protocol achieves minimax-optimal sample efficiency ($dσ^2γ^{-2}\log(Nd/δ)$). Third, we demonstrate that cooperative game-theoretic methods, such as Shapley values, fundamentally fail to detect redundancy. We implement this framework via the DISCO (Design-Integrated Synthetic Control) estimator and deploy it across diverse ecosystems, including computer vision models (ResNet/ConvNeXt/ViT), large language models (BERT/RoBERTa), and city-scale traffic forecasters. These results move trustworthy AI beyond explaining single models: they establish a principled, intervention-based science of auditing and governing heterogeneous model ecosystems.

Related papers

Hierarchical Inference and Closure Learning via Adaptive Surrogates for ODEs and PDEs [15.38864225184245]
Inverse problems are the task of calibrating models to match data.<n>We develop a principled methodology for leveraging data from collections of distinct yet related physical systems.<n>We learn the shared unknown dynamics in the form of an ML-based closure model.
arXiv Detail & Related papers (2026-03-04T10:30:08Z)
The Emergence of Lab-Driven Alignment Signatures: A Psychometric Framework for Auditing Latent Bias and Compounding Risk in Generative AI [0.0]
This paper introduces a novel auditing framework to quantify latent trait estimation under ordinal uncertainty.<n>The research audits nine leading models across dimensions including Optimization Bias, Sycophancy, and Status-Quo Legitimization.
arXiv Detail & Related papers (2026-02-19T06:56:01Z)
AI-NativeBench: An Open-Source White-Box Agentic Benchmark Suite for AI-Native Systems [52.65695508605237]
We introduce AI-NativeBench, the first application-centric and white-box AI-Native benchmark suite grounded in Model Context Protocol (MCP) and Agent-to-Agent (A2A) standards.<n>By treating agentic spans as first-class citizens within distributed traces, our methodology enables granular analysis of engineering characteristics beyond simple capabilities.<n>This work provides the first systematic evidence to guide the transition from measuring model capability to engineering reliable AI-Native systems.
arXiv Detail & Related papers (2026-01-14T11:32:07Z)
On the Limits of Self-Improving in LLMs and Why AGI, ASI and the Singularity Are Not Near Without Symbolic Model Synthesis [0.01269104766024433]
We formalise self-training in Large Language Models (LLMs) and Generative AI as a discrete-time dynamical system.<n>We derive two fundamental failure modes: (1) Entropy Decay, where finite sampling effects cause a monotonic loss of distributional diversity (mode collapse), and (2) Variance Amplification, where the loss of external grounding causes the model's representation of truth to drift as a random walk.
arXiv Detail & Related papers (2026-01-05T19:50:49Z)
STORE: Semantic Tokenization, Orthogonal Rotation and Efficient Attention for Scaling Up Ranking Models [11.965535230928372]
Store is a unified and scalable token-based ranking framework built upon three core innovations.<n>Our framework consistently improves prediction accuracy(online CTR by 2.71%, AUC by 1.195%) and training effeciency (1.84 throughput)
arXiv Detail & Related papers (2025-11-24T06:20:02Z)
RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark [71.3555284685426]
We introduce RealUnify, a benchmark designed to evaluate bidirectional capability synergy.<n>RealUnify comprises 1,000 meticulously human-annotated instances spanning 10 categories and 32 subtasks.<n>We find that current unified models still struggle to achieve effective synergy, indicating that architectural unification alone is insufficient.
arXiv Detail & Related papers (2025-09-29T15:07:28Z)
Model Correlation Detection via Random Selection Probing [62.093777777813756]
Existing similarity-based methods require access to model parameters or produce scores without thresholds.<n>We introduce Random Selection Probing (RSP), a hypothesis-testing framework that formulates model correlation detection as a statistical test.<n>RSP produces rigorous p-values that quantify evidence of correlation.
arXiv Detail & Related papers (2025-09-29T01:40:26Z)
A2R: An Asymmetric Two-Stage Reasoning Framework for Parallel Reasoning [57.727084580884075]
Asymmetric Two-Stage Reasoning framework designed to bridge gap between a model's potential and its actual performance.<n>A2R-Efficient is a "small-to-big" variant that combines a Qwen3-4B explorer with a Qwen3-8B synthesizer.<n>Results show A2R is not only a performance-boosting framework but also an efficient and practical solution for real-world applications.
arXiv Detail & Related papers (2025-09-26T08:27:03Z)
On the Reasoning Capacity of AI Models and How to Quantify It [0.0]
Large Language Models (LLMs) have intensified the debate surrounding the fundamental nature of their reasoning capabilities.<n>While achieving high performance on benchmarks such as GPQA and MMLU, these models exhibit limitations in more complex reasoning tasks.<n>We propose a novel phenomenological approach that goes beyond traditional accuracy metrics to probe the underlying mechanisms of model behavior.
arXiv Detail & Related papers (2025-01-23T16:58:18Z)
Identifiable Representation and Model Learning for Latent Dynamic Systems [0.0]
We study the problem of identifiable representation and model learning for latent dynamic systems.<n>We prove that, for linear and affine nonlinear latent dynamic systems with sparse input matrices, it is possible to identify the latent variables up to scaling.
arXiv Detail & Related papers (2024-10-23T13:55:42Z)
Correct-by-Construction Control for Stochastic and Uncertain Dynamical Models via Formal Abstractions [44.99833362998488]
We develop an abstraction framework that can be used to solve this problem under various modeling assumptions. We use state-of-the-art verification techniques to compute an optimal policy on the iMDP with guarantees for satisfying the given specification. We then show that, by construction, we can refine this policy into a feedback controller for which these guarantees carry over to the dynamical model.
arXiv Detail & Related papers (2023-11-16T11:03:54Z)
Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL. We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training. For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z)
Explaining a Series of Models by Propagating Local Feature Attributions [9.66840768820136]
Pipelines involving several machine learning models improve performance in many domains but are difficult to understand. We introduce a framework to propagate local feature attributions through complex pipelines of models based on a connection to the Shapley value. Our framework enables us to draw higher-level conclusions based on groups of gene expression features for Alzheimer's and breast cancer histologic grade prediction.
arXiv Detail & Related papers (2021-04-30T22:20:58Z)
How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion. Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.