Related papers: Generalization Through the Lens of Learning Dynamics

Generalization Through the Lens of Learning Dynamics

URL: http://arxiv.org/abs/2212.05377v1
Date: Sun, 11 Dec 2022 00:07:24 GMT
Title: Generalization Through the Lens of Learning Dynamics
Authors: Clare Lyle
Abstract summary: A machine learning (ML) system must learn to generalize to novel situations in order to yield accurate predictions at deployment. The impressive generalization performance of deep neural networks has stymied theoreticians. This thesis will study the learning dynamics of deep neural networks in both supervised and reinforcement learning tasks.
Score: 11.009483845261958
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A machine learning (ML) system must learn not only to match the output of a target function on a training set, but also to generalize to novel situations in order to yield accurate predictions at deployment. In most practical applications, the user cannot exhaustively enumerate every possible input to the model; strong generalization performance is therefore crucial to the development of ML systems which are performant and reliable enough to be deployed in the real world. While generalization is well-understood theoretically in a number of hypothesis classes, the impressive generalization performance of deep neural networks has stymied theoreticians. In deep reinforcement learning (RL), our understanding of generalization is further complicated by the conflict between generalization and stability in widely-used RL algorithms. This thesis will provide insight into generalization by studying the learning dynamics of deep neural networks in both supervised and reinforcement learning tasks.

Related papers

Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z)
Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training [105.74524789405514]
adversarial training (AT) is currently the most effective defense against neural networks.<n>We propose to partition the overall generalization goal into multiple sub-tasks, each assigned to a dedicated base learner.<n>In the later stages of training, we interpolate their parameters to form a knowledgeable global learner.<n>We term this framework Generalist and introduce three variants tailored to different application scenarios.
arXiv Detail & Related papers (2025-10-15T09:47:54Z)
Generalizability of Neural Networks Minimizing Empirical Risk Based on Expressive Ability [20.371836553400232]
This paper investigates the generalizability of neural networks that minimize or approximately minimize empirical risk. We provide theoretical insights into several phenomena in deep learning, including robust generalization.
arXiv Detail & Related papers (2025-03-06T05:36:35Z)
Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize [5.642322814965062]
Learning representations that generalize under distribution shifts is critical for building robust machine learning models. We show that even allowing a neural network to explicitly fit the representations obtained from a teacher network that can generalize out-of-distribution is insufficient for the generalization of the student network.
arXiv Detail & Related papers (2024-06-05T15:04:27Z)
Gaussian Universality in Neural Network Dynamics with Generalized Structured Input Distributions [2.3020018305241337]
We analyze the behavior of a deep learning system trained on inputs modeled as Gaussian mixtures to better simulate more general structured inputs. Under certain standardization schemes, the deep learning model converges toward Gaussian setting behavior, even when the input data follow more complex or real-world distributions.
arXiv Detail & Related papers (2024-05-01T17:10:55Z)
On the Generalization Ability of Unsupervised Pretraining [53.06175754026037]
Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization. This paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase. Our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
arXiv Detail & Related papers (2024-03-11T16:23:42Z)
Machine Learning vs Deep Learning: The Generalization Problem [0.0]
This study investigates the comparative abilities of traditional machine learning (ML) models and deep learning (DL) algorithms in terms of extrapolation. We present an empirical analysis where both ML and DL models are trained on an exponentially growing function and then tested on values outside the training domain. Our findings suggest that deep learning models possess inherent capabilities to generalize beyond the training scope.
arXiv Detail & Related papers (2024-03-03T21:42:55Z)
A General Framework for Learning from Weak Supervision [93.89870459388185]
This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm. Central to GLWS is an Expectation-Maximization (EM) formulation, adeptly accommodating various weak supervision sources. We also present an advanced algorithm that significantly simplifies the EM computational demands.
arXiv Detail & Related papers (2024-02-02T21:48:50Z)
Neural Networks and the Chomsky Hierarchy [27.470857324448136]
We study whether insights from the theory of Chomsky can predict the limits of neural network generalization in practice. We show negative results where even extensive amounts of data and training time never led to any non-trivial generalization. Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, and only networks augmented with structured memory can successfully generalize on context-free and context-sensitive tasks.
arXiv Detail & Related papers (2022-07-05T15:06:11Z)
Understanding Robust Generalization in Learning Regular Languages [85.95124524975202]
We study robust generalization in the context of using recurrent neural networks to learn regular languages. We propose a compositional strategy to address this. We theoretically prove that the compositional strategy generalizes significantly better than the end-to-end strategy.
arXiv Detail & Related papers (2022-02-20T02:50:09Z)
Deep Active Learning by Leveraging Training Dynamics [57.95155565319465]
We propose a theory-driven deep active learning method (dynamicAL) which selects samples to maximize training dynamics. We show that dynamicAL not only outperforms other baselines consistently but also scales well on large deep learning models.
arXiv Detail & Related papers (2021-10-16T16:51:05Z)
Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents. We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks. This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
Self-organizing Democratized Learning: Towards Large-scale Distributed Learning Systems [71.14339738190202]
democratized learning (Dem-AI) lays out a holistic philosophy with underlying principles for building large-scale distributed and democratized machine learning systems. Inspired by Dem-AI philosophy, a novel distributed learning approach is proposed in this paper. The proposed algorithms demonstrate better results in the generalization performance of learning models in agents compared to the conventional FL algorithms.
arXiv Detail & Related papers (2020-07-07T08:34:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.