The Interpolating Information Criterion for Overparameterized Models
- URL: http://arxiv.org/abs/2307.07785v1
- Date: Sat, 15 Jul 2023 12:09:54 GMT
- Title: The Interpolating Information Criterion for Overparameterized Models
- Authors: Liam Hodgkinson, Chris van der Heide, Robert Salomone, Fred Roosta,
Michael W. Mahoney
- Abstract summary: We show that the Interpolating Information Criterion is a measure of model quality that naturally incorporates the choice of prior into the model selection.
Our new information criterion accounts for prior misspecification, geometric and spectral properties of the model, and is numerically consistent with known empirical and theoretical behavior.
- Score: 49.283527214211446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The problem of model selection is considered for the setting of interpolating
estimators, where the number of model parameters exceeds the size of the
dataset. Classical information criteria typically consider the large-data
limit, penalizing model size. However, these criteria are not appropriate in
modern settings where overparameterized models tend to perform well. For any
overparameterized model, we show that there exists a dual underparameterized
model that possesses the same marginal likelihood, thus establishing a form of
Bayesian duality. This enables more classical methods to be used in the
overparameterized setting, revealing the Interpolating Information Criterion, a
measure of model quality that naturally incorporates the choice of prior into
the model selection. Our new information criterion accounts for prior
misspecification, geometric and spectral properties of the model, and is
numerically consistent with known empirical and theoretical behavior in this
regime.
Related papers
- SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Continuous Language Model Interpolation for Dynamic and Controllable Text Generation [7.535219325248997]
We focus on the challenging case where the model must dynamically adapt to diverse -- and often changing -- user preferences.
We leverage adaptation methods based on linear weight, casting them as continuous multi-domain interpolators.
We show that varying the weights yields predictable and consistent change in the model outputs.
arXiv Detail & Related papers (2024-04-10T15:55:07Z) - Conditional Korhunen-Lo\'{e}ve regression model with Basis Adaptation
for high-dimensional problems: uncertainty quantification and inverse
modeling [62.997667081978825]
We propose a methodology for improving the accuracy of surrogate models of the observable response of physical systems.
We apply the proposed methodology to constructing surrogate models via the Basis Adaptation (BA) method of the stationary hydraulic head response.
arXiv Detail & Related papers (2023-07-05T18:14:38Z) - Active-Learning-Driven Surrogate Modeling for Efficient Simulation of
Parametric Nonlinear Systems [0.0]
In absence of governing equations, we need to construct the parametric reduced-order surrogate model in a non-intrusive fashion.
Our work provides a non-intrusive optimality criterion to efficiently populate the parameter snapshots.
We propose an active-learning-driven surrogate model using kernel-based shallow neural networks.
arXiv Detail & Related papers (2023-06-09T18:01:14Z) - Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models.
We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z) - Likelihood-free Model Choice for Simulator-based Models with the
Jensen--Shannon Divergence [0.9884867402204268]
We derive a consistent model scoring criterion for the likelihood-free setting called JSD-Razor.
Relationships of JSD-Razor with established scoring criteria for the likelihood-based approach are analyzed.
arXiv Detail & Related papers (2022-06-08T18:16:00Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z) - Semi-nonparametric Latent Class Choice Model with a Flexible Class
Membership Component: A Mixture Model Approach [6.509758931804479]
The proposed model formulates the latent classes using mixture models as an alternative approach to the traditional random utility specification.
Results show that mixture models improve the overall performance of latent class choice models.
arXiv Detail & Related papers (2020-07-06T13:19:26Z) - Assumption-lean inference for generalised linear model parameters [0.0]
We propose nonparametric definitions of main effect estimands and effect modification estimands.
These reduce to standard main effect and effect modification parameters in generalised linear models when these models are correctly specified.
We achieve an assumption-lean inference for these estimands.
arXiv Detail & Related papers (2020-06-15T13:49:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.