Scaling Laws with Hidden Structure
- URL: http://arxiv.org/abs/2411.01375v2
- Date: Tue, 05 Nov 2024 09:57:44 GMT
- Title: Scaling Laws with Hidden Structure
- Authors: Charles Arnal, Clement Berenfeld, Simon Rosenberg, Vivien Cabannes,
- Abstract summary: Recent advances suggest that text and image data contain such hidden structures, which help mitigate the curse of dimensionality.
In this paper, we present a controlled experimental framework to test whether neural networks can indeed exploit such hidden factorial structures''
We find that they do leverage these latent patterns to learn discrete distributions more efficiently, and derive scaling laws linking model sizes, hidden factorizations, and accuracy.
- Score: 2.474908349649168
- License:
- Abstract: Statistical learning in high-dimensional spaces is challenging without a strong underlying data structure. Recent advances with foundational models suggest that text and image data contain such hidden structures, which help mitigate the curse of dimensionality. Inspired by results from nonparametric statistics, we hypothesize that this phenomenon can be partially explained in terms of decomposition of complex tasks into simpler subtasks. In this paper, we present a controlled experimental framework to test whether neural networks can indeed exploit such ``hidden factorial structures.'' We find that they do leverage these latent patterns to learn discrete distributions more efficiently, and derive scaling laws linking model sizes, hidden factorizations, and accuracy. We also study the interplay between our structural assumptions and the models' capacity for generalization.
Related papers
- Probing the Latent Hierarchical Structure of Data via Diffusion Models [47.56642214162824]
We show that experiments in diffusion-based models are a promising tool to probe the latent structure of data.
We confirm this prediction in both text and image datasets using state-of-the-art diffusion models.
Our results show how latent variable changes manifest in the data and establish how to measure these effects in real data.
arXiv Detail & Related papers (2024-10-17T17:08:39Z) - Shallow diffusion networks provably learn hidden low-dimensional structure [17.563546018565468]
Diffusion-based generative models provide a powerful framework for learning to sample from a complex target distribution.
We show that these models provably adapt to simple forms of low dimensional structure, thereby avoiding the curse of dimensionality.
We combine our results with recent analyses of sampling with diffusion models to provide an end-to-end sample complexity bound for learning to sample from structured distributions.
arXiv Detail & Related papers (2024-10-15T04:55:56Z) - ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models [65.82630283336051]
We show that the space spanned by the combination of dimensions and attributes is insufficiently sampled by existing training scheme of diffusion generative models.
We present a simple fix to this problem by constructing processes that fully exploit the structures, hence the name ComboStoc.
arXiv Detail & Related papers (2024-05-22T15:23:10Z) - Bayesian Semi-structured Subspace Inference [0.0]
Semi-structured regression models enable the joint modeling of interpretable structured and complex unstructured feature effects.
We present a Bayesian approximation for semi-structured regression models using subspace inference.
Our approach exhibits competitive predictive performance across simulated and real-world datasets.
arXiv Detail & Related papers (2024-01-23T18:15:58Z) - Homological Convolutional Neural Networks [4.615338063719135]
We propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations.
We test our model on 18 benchmark datasets against 5 classic machine learning and 3 deep learning models.
arXiv Detail & Related papers (2023-08-26T08:48:51Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Low-Rank Constraints for Fast Inference in Structured Models [110.38427965904266]
This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models.
Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces.
arXiv Detail & Related papers (2022-01-08T00:47:50Z) - Structural Causal Models Are (Solvable by) Credal Networks [70.45873402967297]
Causal inferences can be obtained by standard algorithms for the updating of credal nets.
This contribution should be regarded as a systematic approach to represent structural causal models by credal networks.
Experiments show that approximate algorithms for credal networks can immediately be used to do causal inference in real-size problems.
arXiv Detail & Related papers (2020-08-02T11:19:36Z) - Generalising Recursive Neural Models by Tensor Decomposition [12.069862650316262]
We introduce a general approach to model aggregation of structural context leveraging a tensor-based formulation.
We show how the exponential growth in the size of the parameter space can be controlled through an approximation based on the Tucker decomposition.
By this means, we can effectively regulate the trade-off between expressivity of the encoding, controlled by the hidden size, computational complexity and model generalisation.
arXiv Detail & Related papers (2020-06-17T17:28:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.