Generalization Through the Lens of Learning Dynamics
- URL: http://arxiv.org/abs/2212.05377v1
- Date: Sun, 11 Dec 2022 00:07:24 GMT
- Title: Generalization Through the Lens of Learning Dynamics
- Authors: Clare Lyle
- Abstract summary: A machine learning (ML) system must learn to generalize to novel situations in order to yield accurate predictions at deployment.
The impressive generalization performance of deep neural networks has stymied theoreticians.
This thesis will study the learning dynamics of deep neural networks in both supervised and reinforcement learning tasks.
- Score: 11.009483845261958
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A machine learning (ML) system must learn not only to match the output of a
target function on a training set, but also to generalize to novel situations
in order to yield accurate predictions at deployment. In most practical
applications, the user cannot exhaustively enumerate every possible input to
the model; strong generalization performance is therefore crucial to the
development of ML systems which are performant and reliable enough to be
deployed in the real world. While generalization is well-understood
theoretically in a number of hypothesis classes, the impressive generalization
performance of deep neural networks has stymied theoreticians. In deep
reinforcement learning (RL), our understanding of generalization is further
complicated by the conflict between generalization and stability in widely-used
RL algorithms. This thesis will provide insight into generalization by studying
the learning dynamics of deep neural networks in both supervised and
reinforcement learning tasks.
Related papers
- Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize [5.642322814965062]
Learning representations that generalize under distribution shifts is critical for building robust machine learning models.
We show that even allowing a neural network to explicitly fit the representations obtained from a teacher network that can generalize out-of-distribution is insufficient for the generalization of the student network.
arXiv Detail & Related papers (2024-06-05T15:04:27Z) - Gaussian Universality in Neural Network Dynamics with Generalized Structured Input Distributions [2.3020018305241337]
We analyze the behavior of a deep learning system trained on inputs modeled as Gaussian mixtures to better simulate more general structured inputs.
Under certain standardization schemes, the deep learning model converges toward Gaussian setting behavior, even when the input data follow more complex or real-world distributions.
arXiv Detail & Related papers (2024-05-01T17:10:55Z) - On the Generalization Ability of Unsupervised Pretraining [53.06175754026037]
Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization.
This paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase.
Our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
arXiv Detail & Related papers (2024-03-11T16:23:42Z) - Machine Learning vs Deep Learning: The Generalization Problem [0.0]
This study investigates the comparative abilities of traditional machine learning (ML) models and deep learning (DL) algorithms in terms of extrapolation.
We present an empirical analysis where both ML and DL models are trained on an exponentially growing function and then tested on values outside the training domain.
Our findings suggest that deep learning models possess inherent capabilities to generalize beyond the training scope.
arXiv Detail & Related papers (2024-03-03T21:42:55Z) - A General Framework for Learning from Weak Supervision [93.89870459388185]
This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm.
Central to GLWS is an Expectation-Maximization (EM) formulation, adeptly accommodating various weak supervision sources.
We also present an advanced algorithm that significantly simplifies the EM computational demands.
arXiv Detail & Related papers (2024-02-02T21:48:50Z) - Neural Networks and the Chomsky Hierarchy [27.470857324448136]
We study whether insights from the theory of Chomsky can predict the limits of neural network generalization in practice.
We show negative results where even extensive amounts of data and training time never led to any non-trivial generalization.
Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, and only networks augmented with structured memory can successfully generalize on context-free and context-sensitive tasks.
arXiv Detail & Related papers (2022-07-05T15:06:11Z) - Understanding Robust Generalization in Learning Regular Languages [85.95124524975202]
We study robust generalization in the context of using recurrent neural networks to learn regular languages.
We propose a compositional strategy to address this.
We theoretically prove that the compositional strategy generalizes significantly better than the end-to-end strategy.
arXiv Detail & Related papers (2022-02-20T02:50:09Z) - Deep Active Learning by Leveraging Training Dynamics [57.95155565319465]
We propose a theory-driven deep active learning method (dynamicAL) which selects samples to maximize training dynamics.
We show that dynamicAL not only outperforms other baselines consistently but also scales well on large deep learning models.
arXiv Detail & Related papers (2021-10-16T16:51:05Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z) - Self-organizing Democratized Learning: Towards Large-scale Distributed
Learning Systems [71.14339738190202]
democratized learning (Dem-AI) lays out a holistic philosophy with underlying principles for building large-scale distributed and democratized machine learning systems.
Inspired by Dem-AI philosophy, a novel distributed learning approach is proposed in this paper.
The proposed algorithms demonstrate better results in the generalization performance of learning models in agents compared to the conventional FL algorithms.
arXiv Detail & Related papers (2020-07-07T08:34:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.