Related papers: Learning to Rank Learning Curves

Learning to Rank Learning Curves

URL: http://arxiv.org/abs/2006.03361v1
Date: Fri, 5 Jun 2020 10:49:52 GMT
Title: Learning to Rank Learning Curves
Authors: Martin Wistuba and Tejaswini Pedapati
Abstract summary: We present a new method that saves computational budget by terminating poor configurations early on in the training. We show that our model is able to effectively rank learning curves without having to observe many or very long learning curves.
Score: 15.976034696758148
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many automated machine learning methods, such as those for hyperparameter and neural architecture optimization, are computationally expensive because they involve training many different model configurations. In this work, we present a new method that saves computational budget by terminating poor configurations early on in the training. In contrast to existing methods, we consider this task as a ranking and transfer learning problem. We qualitatively show that by optimizing a pairwise ranking loss and leveraging learning curves from other datasets, our model is able to effectively rank learning curves without having to observe many or very long learning curves. We further demonstrate that our method can be used to accelerate a neural architecture search by a factor of up to 100 without a significant performance degradation of the discovered architecture. In further experiments we analyze the quality of ranking, the influence of different model components as well as the predictive behavior of the model.

Related papers

Efficient Machine Unlearning via Influence Approximation [75.31015485113993]
Influence-based unlearning has emerged as a prominent approach to estimate the impact of individual training samples on model parameters without retraining.<n>This paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning)<n>We introduce the Influence Approximation Unlearning algorithm for efficient machine unlearning from the incremental perspective.
arXiv Detail & Related papers (2025-07-31T05:34:27Z)
Optimizing ML Training with Metagradient Descent [69.89631748402377]
We introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a "smooth model training" framework that enables effective optimization using metagradients.
arXiv Detail & Related papers (2025-03-17T22:18:24Z)
Architecture-Aware Learning Curve Extrapolation via Graph Ordinary Differential Equation [33.63030304318472]
We propose a novel architecture-aware neural differential equation model to forecast learning curves continuously. Our model outperforms current state-of-the-art learning curve methods and extrapolation approaches for both pure time-series modeling and CNN-based learning curves.
arXiv Detail & Related papers (2024-12-20T04:28:02Z)
Optimizing Dense Feed-Forward Neural Networks [0.0]
We propose a novel feed-forward neural network constructing method based on pruning and transfer learning. Our approach can compress the number of parameters by more than 70%. We also evaluate the transfer learning level comparing the refined model and the original one training from scratch a neural network.
arXiv Detail & Related papers (2023-12-16T23:23:16Z)
Accelerating Multiframe Blind Deconvolution via Deep Learning [0.0]
Ground-based solar image restoration is a computationally expensive procedure. We propose a new method to accelerate the restoration based on algorithm unrolling. We show that both methods significantly reduce the restoration time compared to the standard optimization procedure.
arXiv Detail & Related papers (2023-06-21T07:53:00Z)
Learning Large-scale Neural Fields via Context Pruned Meta-Learning [60.93679437452872]
We introduce an efficient optimization-based meta-learning technique for large-scale neural field training. We show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields. Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals.
arXiv Detail & Related papers (2023-02-01T17:32:16Z)
Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem. Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z)
Gradient-Based Trajectory Optimization With Learned Dynamics [80.41791191022139]
We use machine learning techniques to learn a differentiable dynamics model of the system from data. We show that a neural network can model highly nonlinear behaviors accurately for large time horizons. In our hardware experiments, we demonstrate that our learned model can represent complex dynamics for both the Spot and Radio-controlled (RC) car.
arXiv Detail & Related papers (2022-04-09T22:07:34Z)
Training Efficiency and Robustness in Deep Learning [2.6451769337566406]
We study approaches to improve the training efficiency and robustness of deep learning models. We find that prioritizing learning on more informative training data increases convergence speed and improves generalization performance on test data. We show that a redundancy-aware modification to the sampling of training data improves the training speed and develops an efficient method for detecting the diversity of training signal.
arXiv Detail & Related papers (2021-12-02T17:11:33Z)
Learning from Mistakes based on Class Weighting with Application to Neural Architecture Search [12.317568257671427]
We propose a simple and effective multi-level optimization framework called learning from mistakes (LFM) The primary objective is to train a model to perform effectively on target tasks by using a re-weighting technique to prevent similar mistakes in the future. In this formulation, we learn the class weights by minimizing the validation loss of the model and re-train the model with the synthetic data from the image generator weighted by class-wise performance and real data.
arXiv Detail & Related papers (2021-12-01T04:56:49Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Interleaving Learning, with Application to Neural Architecture Search [12.317568257671427]
We propose a novel machine learning framework referred to as interleaving learning (IL) In our framework, a set of models collaboratively learn a data encoder in an interleaving fashion. We apply interleaving learning to search neural architectures for image classification on CIFAR-10, CIFAR-100, and ImageNet.
arXiv Detail & Related papers (2021-03-12T00:54:22Z)
Model-Based Deep Learning [155.063817656602]
Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques. Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance. We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches.
arXiv Detail & Related papers (2020-12-15T16:29:49Z)
Multi-fidelity Neural Architecture Search with Knowledge Distillation [69.09782590880367]
We propose a bayesian multi-fidelity method for neural architecture search: MF-KD. Knowledge distillation adds to a loss function a term forcing a network to mimic some teacher network. We show that training for a few epochs with such a modified loss function leads to a better selection of neural architectures than training for a few epochs with a logistic loss.
arXiv Detail & Related papers (2020-06-15T12:32:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.