Bridging the Empirical-Theoretical Gap in Neural Network Formal Language Learning Using Minimum Description Length
- URL: http://arxiv.org/abs/2402.10013v2
- Date: Thu, 6 Jun 2024 16:16:12 GMT
- Title: Bridging the Empirical-Theoretical Gap in Neural Network Formal Language Learning Using Minimum Description Length
- Authors: Nur Lan, Emmanuel Chemla, Roni Katzir,
- Abstract summary: We show that the theoretically correct solution is in fact not an optimum of commonly used objectives.
We focus on one simple formal language and show that the theoretically correct solution is in fact not an optimum of commonly used objectives.
- Score: 2.867517731896504
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks offer good approximation to many tasks but consistently fail to reach perfect generalization, even when theoretical work shows that such perfect solutions can be expressed by certain architectures. Using the task of formal language learning, we focus on one simple formal language and show that the theoretically correct solution is in fact not an optimum of commonly used objectives -- even with regularization techniques that according to common wisdom should lead to simple weights and good generalization (L1, L2) or other meta-heuristics (early-stopping, dropout). On the other hand, replacing standard targets with the Minimum Description Length objective (MDL) results in the correct solution being an optimum.
Related papers
- AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking in Large Language Models [32.51746551988431]
AdaReasoner is an LLM-agnostic plugin designed for any LLM to automate adaptive reasoning configurations.<n>AdaReasoner is trained using a reinforcement learning (RL) framework, combining a factorized action space with a targeted exploration strategy.<n>It consistently outperforms standard baselines, preserves out-of-distribution robustness, and yield gains on knowledge-intensive tasks through tailored prompts.
arXiv Detail & Related papers (2025-05-22T22:06:11Z) - A MIND for Reasoning: Meta-learning for In-context Deduction [3.4383794581359184]
We propose Meta-learning for In-context Deduction (MIND), a novel few-shot meta-learning fine-tuning approach.<n>Our results show that MIND significantly improves generalization in small LMs ranging from 1.5B to 7B parameters.<n>Remarkably, small models fine-tuned with MIND outperform state-of-the-art LLMs, such as GPT-4o and o3-mini, on this task.
arXiv Detail & Related papers (2025-05-20T13:00:48Z) - A Minimum Description Length Approach to Regularization in Neural Networks [2.446672595462589]
We show that the choice of regularization method plays a crucial role when trained on formal languages.<n>We propose that unlike existing regularization techniques, MDL introduces the appropriate inductive bias to effectively counteract overfitting and promote generalization.
arXiv Detail & Related papers (2025-05-19T17:34:56Z) - Rethinking Multi-Objective Learning through Goal-Conditioned Supervised Learning [8.593384839118658]
Multi-objective learning aims to optimize multiple objectives simultaneously with a single model.
It suffers from the difficulty to formalize and conduct the exact learning process.
We propose a general framework for automatically learning to achieve multiple objectives based on the existing sequential data.
arXiv Detail & Related papers (2024-12-12T03:47:40Z) - Autoformalization of Game Descriptions using Large Language Models [3.5083201638203154]
We introduce a framework for the autoformalization of game-theoretic scenarios.
This translates natural language descriptions into formal logic representations suitable for formal solvers.
We evaluate the framework using GPT-4o and a dataset of natural language problem descriptions.
arXiv Detail & Related papers (2024-09-18T20:18:53Z) - Benchmarking Neural Network Generalization for Grammar Induction [3.2228025627337864]
We provide a measure of neural network generalization based on fully specified formal languages.
The benchmark includes languages such as $anbn$, $anbncn$, $anbmcn+m$, and Dyck-1 and 2.
arXiv Detail & Related papers (2023-08-16T09:45:06Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - Understanding Robust Generalization in Learning Regular Languages [85.95124524975202]
We study robust generalization in the context of using recurrent neural networks to learn regular languages.
We propose a compositional strategy to address this.
We theoretically prove that the compositional strategy generalizes significantly better than the end-to-end strategy.
arXiv Detail & Related papers (2022-02-20T02:50:09Z) - Learning Proximal Operators to Discover Multiple Optima [66.98045013486794]
We present an end-to-end method to learn the proximal operator across non-family problems.
We show that for weakly-ized objectives and under mild conditions, the method converges globally.
arXiv Detail & Related papers (2022-01-28T05:53:28Z) - Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK)
Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework.
We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z) - On the Global Optimality of Model-Agnostic Meta-Learning [133.16370011229776]
Model-a meta-learning (MAML) formulates meta-learning as a bilevel optimization problem, where the inner level solves each subtask based on a shared prior.
We characterize optimality of the stationary points attained by MAML for both learning and supervised learning, where the inner-level outer-level problems are solved via first-order optimization methods.
arXiv Detail & Related papers (2020-06-23T17:33:14Z) - Parallel processor scheduling: formulation as multi-objective linguistic
optimization and solution using Perceptual Reasoning based methodology [13.548237279353408]
The aim of the scheduling policy is to achieve the optimal value of an objective, like production time, cost, etc.
The experts generally provide their opinions, about various scheduling criteria (pertaining to the scheduling policies) in linguistic terms or words.
We have also compared the results of the PR based solution methodology with those obtained from the 2-tuple based solution methodology.
arXiv Detail & Related papers (2020-04-30T17:04:49Z) - Local Nonparametric Meta-Learning [28.563015766188478]
A central goal of meta-learning is to find a learning rule that enables fast adaptation across a set of tasks.
We show that global, fixed-size representations often fail when confronted with certain types of out-of-distribution tasks.
We propose a novel nonparametric meta-learning algorithm that utilizes a meta-trained local learning rule.
arXiv Detail & Related papers (2020-02-09T03:28:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.