A Two-Scale Complexity Measure for Deep Learning Models
- URL: http://arxiv.org/abs/2401.09184v1
- Date: Wed, 17 Jan 2024 12:50:50 GMT
- Title: A Two-Scale Complexity Measure for Deep Learning Models
- Authors: Massimiliano Datres, Gian Paolo Leonardi, Alessio Figalli, David
Sutter
- Abstract summary: We introduce a novel capacity measure 2sED for statistical models based on the effective dimension.
The new quantity provably bounds the generalization error under mild assumptions on the model.
simulations on standard data sets and popular model architectures show that 2sED correlates well with the training error.
- Score: 2.7446241148152257
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a novel capacity measure 2sED for statistical models based on
the effective dimension. The new quantity provably bounds the generalization
error under mild assumptions on the model. Furthermore, simulations on standard
data sets and popular model architectures show that 2sED correlates well with
the training error. For Markovian models, we show how to efficiently
approximate 2sED from below through a layerwise iterative approach, which
allows us to tackle deep learning models with a large number of parameters.
Simulation results suggest that the approximation is good for different
prominent models and data sets.
Related papers
- EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Towards Learning Stochastic Population Models by Gradient Descent [0.0]
We show that simultaneous estimation of parameters and structure poses major challenges for optimization procedures.
We demonstrate accurate estimation of models but find that enforcing the inference of parsimonious, interpretable models drastically increases the difficulty.
arXiv Detail & Related papers (2024-04-10T14:38:58Z) - Representer Point Selection for Explaining Regularized High-dimensional
Models [105.75758452952357]
We introduce a class of sample-based explanations we term high-dimensional representers.
Our workhorse is a novel representer theorem for general regularized high-dimensional models.
We study the empirical performance of our proposed methods on three real-world binary classification datasets and two recommender system datasets.
arXiv Detail & Related papers (2023-05-31T16:23:58Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Dynamically-Scaled Deep Canonical Correlation Analysis [77.34726150561087]
Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them.
We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model.
arXiv Detail & Related papers (2022-03-23T12:52:49Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Using machine learning to correct model error in data assimilation and
forecast applications [0.0]
We propose to use this method to correct the error of an existent, knowledge-based model.
The resulting surrogate model is an hybrid model between the original (knowledge-based) model and the ML model.
Using the hybrid surrogate models for DA yields a significantly better analysis than using the original model.
arXiv Detail & Related papers (2020-10-23T18:30:45Z) - Combining data assimilation and machine learning to infer unresolved
scale parametrisation [0.0]
In recent years, machine learning has been proposed to devise data-driven parametrisations of unresolved processes in dynamical numerical models.
Our goal is to go beyond the use of high-resolution simulations and train ML-based parametrisation using direct data.
We show that in both cases the hybrid model yields forecasts with better skill than the truncated model.
arXiv Detail & Related papers (2020-09-09T14:12:11Z) - Amortized Bayesian model comparison with evidential deep learning [0.12314765641075436]
We propose a novel method for performing Bayesian model comparison using specialized deep learning architectures.
Our method is purely simulation-based and circumvents the step of explicitly fitting all alternative models under consideration to each observed dataset.
We show that our method achieves excellent results in terms of accuracy, calibration, and efficiency across the examples considered in this work.
arXiv Detail & Related papers (2020-04-22T15:15:46Z) - Predicting Multidimensional Data via Tensor Learning [0.0]
We develop a model that retains the intrinsic multidimensional structure of the dataset.
To estimate the model parameters, an Alternating Least Squares algorithm is developed.
The proposed model is able to outperform benchmark models present in the forecasting literature.
arXiv Detail & Related papers (2020-02-11T11:57:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.