Empirical Tests of Optimization Assumptions in Deep Learning
- URL: http://arxiv.org/abs/2407.01825v1
- Date: Mon, 1 Jul 2024 21:56:54 GMT
- Title: Empirical Tests of Optimization Assumptions in Deep Learning
- Authors: Hoang Tran, Qinzi Zhang, Ashok Cutkosky,
- Abstract summary: This paper develops new empirical metrics to track the key quantities that must be controlled in theoretical analysis.
All of our tested assumptions fail to reliably capture optimization performance.
This highlights a need for new empirical verification of analytical assumptions used in theoretical analysis.
- Score: 41.05664717242051
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is a significant gap between our theoretical understanding of optimization algorithms used in deep learning and their practical performance. Theoretical development usually focuses on proving convergence guarantees under a variety of different assumptions, which are themselves often chosen based on a rough combination of intuitive match to practice and analytical convenience. The theory/practice gap may then arise because of the failure to prove a theorem under such assumptions, or because the assumptions do not reflect reality. In this paper, we carefully measure the degree to which these assumptions are capable of explaining modern optimization algorithms by developing new empirical metrics that closely track the key quantities that must be controlled in theoretical analysis. All of our tested assumptions (including typical modern assumptions based on bounds on the Hessian) fail to reliably capture optimization performance. This highlights a need for new empirical verification of analytical assumptions used in theoretical analysis.
Related papers
- A Practical Theory of Generalization in Selectivity Learning [8.268822578361824]
query-driven machine learning models have emerged as a promising estimation technique for query selectivities.
We bridge the gaps between state-of-the-art (SOTA) theory based on the Probably Approximately Correct (PAC) learning framework.
We show that selectivity predictors induced by signed measures are learnable, which relaxes the reliance on probability measures in SOTA theory.
arXiv Detail & Related papers (2024-09-11T05:10:32Z) - Uncertainty Regularized Evidential Regression [5.874234972285304]
The Evidential Regression Network (ERN) represents a novel approach that integrates deep learning with Dempster-Shafer's theory.
Specific activation functions must be employed to enforce non-negative values, which is a constraint that compromises model performance.
This paper provides a theoretical analysis of this limitation and introduces an improvement to overcome it.
arXiv Detail & Related papers (2024-01-03T01:18:18Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - On the generalization of learning algorithms that do not converge [54.122745736433856]
Generalization analyses of deep learning typically assume that the training converges to a fixed point.
Recent results indicate that in practice, the weights of deep neural networks optimized with gradient descent often oscillate indefinitely.
arXiv Detail & Related papers (2022-08-16T21:22:34Z) - A theoretical and empirical study of new adaptive algorithms with
additional momentum steps and shifted updates for stochastic non-convex
optimization [0.0]
It is thought that adaptive optimization algorithms represent the key pillar behind the of the Learning field.
In this paper we introduce adaptive momentum techniques for different non-smooth objective problems.
arXiv Detail & Related papers (2021-10-16T09:47:57Z) - Can convolutional ResNets approximately preserve input distances? A
frequency analysis perspective [31.897568775099558]
We show that the theoretical link between the regularisation scheme used and bi-Lipschitzness is only valid under conditions which do not hold in practice.
We present a simple constructive algorithm to search for counter examples to the distance preservation condition.
arXiv Detail & Related papers (2021-06-04T13:12:42Z) - Beyond variance reduction: Understanding the true impact of baselines on
policy optimization [24.09670734037029]
We show that learning dynamics are governed by the curvature of the loss function and the noise of the gradient estimates.
We present theoretical results showing that, at least for bandit problems, curvature and noise are not sufficient to explain the learning dynamics.
arXiv Detail & Related papers (2020-08-31T17:52:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.