A Minimum Description Length Approach to Regularization in Neural Networks
- URL: http://arxiv.org/abs/2505.13398v1
- Date: Mon, 19 May 2025 17:34:56 GMT
- Title: A Minimum Description Length Approach to Regularization in Neural Networks
- Authors: Matan Abudy, Orr Well, Emmanuel Chemla, Roni Katzir, Nur Lan,
- Abstract summary: We show that the choice of regularization method plays a crucial role when trained on formal languages.<n>We propose that unlike existing regularization techniques, MDL introduces the appropriate inductive bias to effectively counteract overfitting and promote generalization.
- Score: 2.446672595462589
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art neural networks can be trained to become remarkable solutions to many problems. But while these architectures can express symbolic, perfect solutions, trained models often arrive at approximations instead. We show that the choice of regularization method plays a crucial role: when trained on formal languages with standard regularization ($L_1$, $L_2$, or none), expressive architectures not only fail to converge to correct solutions but are actively pushed away from perfect initializations. In contrast, applying the Minimum Description Length (MDL) principle to balance model complexity with data fit provides a theoretically grounded regularization method. Using MDL, perfect solutions are selected over approximations, independently of the optimization algorithm. We propose that unlike existing regularization techniques, MDL introduces the appropriate inductive bias to effectively counteract overfitting and promote generalization.
Related papers
- Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models [50.19188692497892]
Traditional alignment methods often require retraining large pretrained models.<n>We propose a novel textitResidual Alignment Model (textitRAM) that formalizes the alignment process as a type of importance sampling.<n>We develop a resampling algorithm with iterative token-level decoding to address the common first-token latency issue in comparable methods.
arXiv Detail & Related papers (2025-05-26T08:53:02Z) - Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball.<n>We propose a new family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems.
arXiv Detail & Related papers (2025-02-11T13:10:34Z) - Learning Constrained Optimization with Deep Augmented Lagrangian Methods [54.22290715244502]
A machine learning (ML) model is trained to emulate a constrained optimization solver.
This paper proposes an alternative approach, in which the ML model is trained to predict dual solution estimates directly.
It enables an end-to-end training scheme is which the dual objective is as a loss function, and solution estimates toward primal feasibility, emulating a Dual Ascent method.
arXiv Detail & Related papers (2024-03-06T04:43:22Z) - Bridging the Empirical-Theoretical Gap in Neural Network Formal Language Learning Using Minimum Description Length [2.867517731896504]
We show that the theoretically correct solution is in fact not an optimum of commonly used objectives.
We focus on one simple formal language and show that the theoretically correct solution is in fact not an optimum of commonly used objectives.
arXiv Detail & Related papers (2024-02-15T15:25:30Z) - Reducing the Need for Backpropagation and Discovering Better Optima With
Explicit Optimizations of Neural Networks [4.807347156077897]
We propose a computationally efficient alternative for optimizing neural networks.
We derive an explicit solution to a simple feed-forward language model.
We show that explicit solutions perform near-optimality in experiments.
arXiv Detail & Related papers (2023-11-13T17:38:07Z) - Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z) - Reinforcement Learning for Adaptive Mesh Refinement [63.7867809197671]
We propose a novel formulation of AMR as a Markov decision process and apply deep reinforcement learning to train refinement policies directly from simulation.
The model sizes of these policy architectures are independent of the mesh size and hence scale to arbitrarily large and complex simulations.
arXiv Detail & Related papers (2021-03-01T22:55:48Z) - Joint Continuous and Discrete Model Selection via Submodularity [1.332560004325655]
In model selection problems for machine learning, the desire for a well-performing model with meaningful structure is typically expressed through a regularized optimization problem.
In many scenarios, however, numerically meaningful structure is specified in some discrete space leading to difficult non optimization problems.
We show how simple continuous or discrete constraints can also be handled for certain problem classes, motivated by robust optimization.
arXiv Detail & Related papers (2021-02-17T21:14:47Z) - Offline Model-Based Optimization via Normalized Maximum Likelihood
Estimation [101.22379613810881]
We consider data-driven optimization problems where one must maximize a function given only queries at a fixed set of points.
This problem setting emerges in many domains where function evaluation is a complex and expensive process.
We propose a tractable approximation that allows us to scale our method to high-capacity neural network models.
arXiv Detail & Related papers (2021-02-16T06:04:27Z) - Meta-learning based Alternating Minimization Algorithm for Non-convex
Optimization [9.774392581946108]
We propose a novel solution for challenging non-problems of multiple variables.
Our proposed approach is able to achieve effective iterations in cases while other methods would typically fail.
arXiv Detail & Related papers (2020-09-09T10:45:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.