Statistical Inference of Minimally Complex Models
- URL: http://arxiv.org/abs/2008.00520v2
- Date: Mon, 27 Sep 2021 22:32:38 GMT
- Title: Statistical Inference of Minimally Complex Models
- Authors: Cl\'elia de Mulatier, Paolo P. Mazza, Matteo Marsili
- Abstract summary: Minimally Complex Models (MCMs) are spin models with interactions of arbitrary order.
We show that Bayesian model selection restricted to these models is computationally feasible.
Their evidence, which trades off goodness-of-fit against model complexity, can be computed easily without any parameter fitting.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Finding the model that best describes a high dimensional dataset is a
daunting task. For binary data, we show that this becomes feasible when
restricting the search to a family of simple models, that we call Minimally
Complex Models (MCMs). These are spin models, with interactions of arbitrary
order, that are composed of independent components of minimal complexity
(Beretta et al., 2018). They tend to be simple in information theoretic terms,
which means that they are well-fitted to specific types of data, and are
therefore easy to falsify. We show that Bayesian model selection restricted to
these models is computationally feasible and has many other advantages. First,
their evidence, which trades off goodness-of-fit against model complexity, can
be computed easily without any parameter fitting. This allows selecting the
best MCM among all, even though the number of models is astronomically large.
Furthermore, MCMs can be inferred and sampled from without any computational
effort. Finally, model selection among MCMs is invariant with respect to
changes in the representation of the data. MCMs portray the structure of
dependencies among variables in a simple way, as illustrated in several
examples, and thus provide robust predictions on dependencies in the data. MCMs
contain interactions of any order between variables, and thus may reveal the
presence of interactions of order higher than pairwise.
Related papers
- UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens.
Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Induced Model Matching: How Restricted Models Can Help Larger Ones [1.7676816383911753]
We consider scenarios where a very accurate predictive model using restricted features is available at the time of training of a larger, full-featured, model.
How can the restricted model be useful to the full model?
We propose an approach for transferring the knowledge of the restricted model to the full model, by aligning the full model's context-restricted performance with that of the restricted model's.
arXiv Detail & Related papers (2024-02-19T20:21:09Z) - Representation Surgery for Multi-Task Model Merging [57.63643005215592]
Multi-task learning (MTL) compresses the information from multiple tasks into a unified backbone to improve computational efficiency and generalization.
Recent work directly merges multiple independently trained models to perform MTL instead of collecting their raw data for joint training.
By visualizing the representation distribution of existing model merging schemes, we find that the merged model often suffers from the dilemma of representation bias.
arXiv Detail & Related papers (2024-02-05T03:39:39Z) - Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable.
CMDPs serve as an important framework to model many real-world applications with time-varying environments.
We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z) - Exact and general decoupled solutions of the LMC Multitask Gaussian Process model [28.32223907511862]
The Linear Model of Co-regionalization (LMC) is a very general model of multitask gaussian process for regression or classification.
Recent work has shown that under some conditions the latent processes of the model can be decoupled, leading to a complexity that is only linear in the number of said processes.
We here extend these results, showing from the most general assumptions that the only condition necessary to an efficient exact computation of the LMC is a mild hypothesis on the noise model.
arXiv Detail & Related papers (2023-10-18T15:16:24Z) - Low-Rank Constraints for Fast Inference in Structured Models [110.38427965904266]
This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models.
Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces.
arXiv Detail & Related papers (2022-01-08T00:47:50Z) - Revisiting minimum description length complexity in overparameterized
models [38.21167656112762]
We provide an extensive theoretical characterization of MDL-COMP for linear models and kernel methods.
For kernel methods, we show that MDL-COMP informs minimax in-sample error, and can decrease as the dimensionality of the input increases.
We also prove that MDL-COMP bounds the in-sample mean squared error (MSE)
arXiv Detail & Related papers (2020-06-17T22:45:14Z) - When Ensembling Smaller Models is More Efficient than Single Large
Models [52.38997176317532]
We show that ensembles can outperform single models with both higher accuracy and requiring fewer total FLOPs to compute.
This presents an interesting observation that output diversity in ensembling can often be more efficient than training larger models.
arXiv Detail & Related papers (2020-05-01T18:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.