Related papers: Learning Hyperparameters via a Data-Emphasized Variational Objective

Learning Hyperparameters via a Data-Emphasized Variational Objective

URL: http://arxiv.org/abs/2502.01861v1
Date: Mon, 03 Feb 2025 22:19:35 GMT
Title: Learning Hyperparameters via a Data-Emphasized Variational Objective
Authors: Ethan Harvey, Mikhail Petrov, Michael C. Hughes,
Abstract summary: grid search is computationally expensive, requires carving out a validation set, and requires users to specify candidate values.<n>We propose an alternative: directly learning regularization hyperparameters on the full training set via the evidence lower bound ("ELBo") objective.<n>We show how our method reduces the 88+ hour grid search of past work to under 3 hours while delivering comparable accuracy.
Score: 4.453137996095194
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When training large flexible models, practitioners often rely on grid search to select hyperparameters that control over-fitting. This grid search has several disadvantages: the search is computationally expensive, requires carving out a validation set that reduces the available data for training, and requires users to specify candidate values. In this paper, we propose an alternative: directly learning regularization hyperparameters on the full training set via the evidence lower bound ("ELBo") objective from variational methods. For deep neural networks with millions of parameters, we recommend a modified ELBo that upweights the influence of the data likelihood relative to the prior. Our proposed technique overcomes all three disadvantages of grid search. In a case study on transfer learning of image classifiers, we show how our method reduces the 88+ hour grid search of past work to under 3 hours while delivering comparable accuracy. We further demonstrate how our approach enables efficient yet accurate approximations of Gaussian processes with learnable length-scale kernels.

Related papers

Learning the Regularization Strength for Deep Fine-Tuning via a Data-Emphasized Variational Objective [4.453137996095194]
grid search is computationally expensive, requires carving out a validation set, and requires practitioners to specify candidate values.<n>Our proposed technique overcomes all three disadvantages of grid search.<n>We demonstrate effectiveness on image classification tasks on several datasets, yielding heldout accuracy comparable to existing approaches.
arXiv Detail & Related papers (2024-10-25T16:32:11Z)
Kolmogorov Arnold Networks in Fraud Detection: Bridging the Gap Between Theory and Practice [3.692410936160711]
This study evaluates the applicability of Kolmogorov-Arnold Networks (KAN) in fraud detection, finding that their effectiveness is context-dependent. We propose a quick decision rule using Principal Component Analysis (PCA) to assess the suitability of KAN: if data can be effectively separated in two dimensions using splines, KAN may outperform traditional models; otherwise, other methods could be more appropriate.
arXiv Detail & Related papers (2024-08-15T18:58:21Z)
Improving Hyperparameter Optimization with Checkpointed Model Weights [16.509585437768063]
In this work, we propose an HPO method for neural networks using logged checkpoints of the trained weights. Our method, Forecasting Model Search (FMS), embeds weights into a Gaussian process deep kernel surrogate model.
arXiv Detail & Related papers (2024-06-26T17:59:54Z)
Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters. In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z)
Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets. Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly. FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z)
Hyperparameter Optimization through Neural Network Partitioning [11.6941692990626]
We propose a simple and efficient way for optimizing hyper parameters in neural networks. Our method partitions the training data and a neural network model into $K$ data shards and parameter partitions. We demonstrate that we can apply this objective to optimize a variety of different hyper parameters in a single training run.
arXiv Detail & Related papers (2023-04-28T11:24:41Z)
DetOFA: Efficient Training of Once-for-All Networks for Object Detection Using Path Filter [4.487368901635045]
We propose an efficient supernet-based neural architecture search (NAS) method that uses search space pruning. Our proposed method reduces the computational cost of the optimal network architecture by 30% and 63%.
arXiv Detail & Related papers (2023-03-23T09:23:11Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
AdaGrid: Adaptive Grid Search for Link Prediction Training Objective [58.79804082133998]
Training objective crucially influences the model's performance and generalization capabilities. We propose Adaptive Grid Search (AdaGrid) which dynamically adjusts the edge message ratio during training. We show that AdaGrid can boost the performance of the models up to $1.9%$ while being nine times more time-efficient than a complete search.
arXiv Detail & Related papers (2022-03-30T09:24:17Z)
Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations [76.82124752950148]
We develop a convenient gradient-based method for selecting the data augmentation. We use a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective.
arXiv Detail & Related papers (2022-02-22T02:51:11Z)
Training Neural Networks with Fixed Sparse Masks [19.58969772430058]
Recent work has shown that it is possible to update only a small subset of the model's parameters during training. We show that it is possible to induce a fixed sparse mask on the model's parameters that selects a subset to update over many iterations.
arXiv Detail & Related papers (2021-11-18T18:06:01Z)
How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers [86.36020260204302]
We propose a new benchmarking protocol to evaluate both end-to-end efficiency and data-addition training efficiency. A human study is conducted to show that our evaluation protocol matches human tuning behavior better than the random search. We then apply the proposed benchmarking framework to 7s and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining.
arXiv Detail & Related papers (2020-10-19T21:46:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.