On Training Implicit Meta-Learning With Applications to Inductive
Weighing in Consistency Regularization
- URL: http://arxiv.org/abs/2310.18741v1
- Date: Sat, 28 Oct 2023 15:50:03 GMT
- Title: On Training Implicit Meta-Learning With Applications to Inductive
Weighing in Consistency Regularization
- Authors: Fady Rezk
- Abstract summary: Implicit meta-learning (IML) requires computing $2nd$ order gradients, particularly the Hessian.
Various approximations for the Hessian were proposed but a systematic comparison of their compute cost, stability, generalization of solution found and estimation accuracy were largely overlooked.
We show how training a "Confidence Network" to extract domain specific features can learn to up-weigh useful images and down-weigh out-of-distribution samples.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Meta-learning that uses implicit gradient have provided an exciting
alternative to standard techniques which depend on the trajectory of the inner
loop training. Implicit meta-learning (IML), however, require computing
$2^{nd}$ order gradients, particularly the Hessian which is impractical to
compute for modern deep learning models. Various approximations for the Hessian
were proposed but a systematic comparison of their compute cost, stability,
generalization of solution found and estimation accuracy were largely
overlooked. In this study, we start by conducting a systematic comparative
analysis of the various approximation methods and their effect when
incorporated into IML training routines. We establish situations where
catastrophic forgetting is exhibited in IML and explain their cause in terms of
the inability of the approximations to estimate the curvature at convergence
points. Sources of IML training instability are demonstrated and remedied. A
detailed analysis of the effeciency of various inverse Hessian-vector product
approximation methods is also provided. Subsequently, we use the insights
gained to propose and evaluate a novel semi-supervised learning algorithm that
learns to inductively weigh consistency regularization losses. We show how
training a "Confidence Network" to extract domain specific features can learn
to up-weigh useful images and down-weigh out-of-distribution samples. Results
outperform the baseline FixMatch performance.
Related papers
- Scalable Bayesian Meta-Learning through Generalized Implicit Gradients [64.21628447579772]
Implicit Bayesian meta-learning (iBaML) method broadens the scope of learnable priors, but also quantifies the associated uncertainty.
Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one.
arXiv Detail & Related papers (2023-03-31T02:10:30Z) - Variational Linearized Laplace Approximation for Bayesian Deep Learning [11.22428369342346]
We propose a new method for approximating Linearized Laplace Approximation (LLA) using a variational sparse Gaussian Process (GP)
Our method is based on the dual RKHS formulation of GPs and retains, as the predictive mean, the output of the original DNN.
It allows for efficient optimization, which results in sub-linear training time in the size of the training dataset.
arXiv Detail & Related papers (2023-02-24T10:32:30Z) - Adaptive Meta-learner via Gradient Similarity for Few-shot Text
Classification [11.035878821365149]
We propose a novel Adaptive Meta-learner via Gradient Similarity (AMGS) to improve the model generalization ability to a new task.
Experimental results on several benchmarks demonstrate that the proposed AMGS consistently improves few-shot text classification performance.
arXiv Detail & Related papers (2022-09-10T16:14:53Z) - Adaptive neighborhood Metric learning [184.95321334661898]
We propose a novel distance metric learning algorithm, named adaptive neighborhood metric learning (ANML)
ANML can be used to learn both the linear and deep embeddings.
The emphlog-exp mean function proposed in our method gives a new perspective to review the deep metric learning methods.
arXiv Detail & Related papers (2022-01-20T17:26:37Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective.
In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems.
We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z) - Reintroducing Straight-Through Estimators as Principled Methods for
Stochastic Binary Networks [85.94999581306827]
Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights.
Many successful experimental results have been achieved with empirical straight-through (ST) approaches.
At the same time, ST methods can be truly derived as estimators in the binary network (SBN) model with Bernoulli weights.
arXiv Detail & Related papers (2020-06-11T23:58:18Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - Sparse Methods for Automatic Relevance Determination [0.0]
We first review automatic relevance determination (ARD) and analytically demonstrate the need to additional regularization or thresholding to achieve sparse models.
We then discuss two classes of methods, regularization based and thresholding based, which build on ARD to learn parsimonious solutions to linear problems.
arXiv Detail & Related papers (2020-05-18T14:08:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.