Training Over-parameterized Models with Non-decomposable Objectives
- URL: http://arxiv.org/abs/2107.04641v1
- Date: Fri, 9 Jul 2021 19:29:33 GMT
- Title: Training Over-parameterized Models with Non-decomposable Objectives
- Authors: Harikrishna Narasimhan, Aditya Krishna Menon
- Abstract summary: We propose new cost-sensitive losses that extend the classical idea of logit adjustment to handle more general cost matrices.
Our losses are calibrated, and can be further improved with distilled labels from a teacher model.
- Score: 46.62273918807789
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many modern machine learning applications come with complex and nuanced
design goals such as minimizing the worst-case error, satisfying a given
precision or recall target, or enforcing group-fairness constraints. Popular
techniques for optimizing such non-decomposable objectives reduce the problem
into a sequence of cost-sensitive learning tasks, each of which is then solved
by re-weighting the training loss with example-specific costs. We point out
that the standard approach of re-weighting the loss to incorporate label costs
can produce unsatisfactory results when used to train over-parameterized
models. As a remedy, we propose new cost-sensitive losses that extend the
classical idea of logit adjustment to handle more general cost matrices. Our
losses are calibrated, and can be further improved with distilled labels from a
teacher model. Through experiments on benchmark image datasets, we showcase the
effectiveness of our approach in training ResNet models with common robust and
constrained optimization objectives.
Related papers
- EsaCL: Efficient Continual Learning of Sparse Models [10.227171407348326]
Key challenge in the continual learning setting is to efficiently learn a sequence of tasks without forgetting how to perform previously learned tasks.
We propose a new method for efficient continual learning of sparse models (EsaCL) that can automatically prune redundant parameters without adversely impacting the model's predictive power.
arXiv Detail & Related papers (2024-01-11T04:59:44Z) - Cost-Effective Retraining of Machine Learning Models [2.9461360639852914]
It is important to retrain a machine learning (ML) model in order to maintain its performance as the data changes over time.
This creates a trade-off between retraining too frequently, which leads to unnecessary computing costs, and not retraining often enough.
We propose ML systems that make automated and cost-effective decisions about when to retrain an ML model.
arXiv Detail & Related papers (2023-10-06T13:02:29Z) - Gradient constrained sharpness-aware prompt learning for vision-language
models [99.74832984957025]
This paper targets a novel trade-off problem in generalizable prompt learning for vision-language models (VLM)
By analyzing the loss landscapes of the state-of-the-art method and vanilla Sharpness-aware Minimization (SAM) based method, we conclude that the trade-off performance correlates to both loss value and loss sharpness.
We propose a novel SAM-based method for prompt learning, denoted as Gradient Constrained Sharpness-aware Context Optimization (GCSCoOp)
arXiv Detail & Related papers (2023-09-14T17:13:54Z) - Optimizing Data Collection for Machine Learning [87.37252958806856]
Modern deep learning systems require huge data sets to achieve impressive performance.
Over-collecting data incurs unnecessary present costs, while under-collecting may incur future costs and delay.
We propose a new paradigm for modeling the data collection as a formal optimal data collection problem.
arXiv Detail & Related papers (2022-10-03T21:19:05Z) - Rethinking Cost-sensitive Classification in Deep Learning via
Adversarial Data Augmentation [4.479834103607382]
Cost-sensitive classification is critical in applications where misclassification errors widely vary in cost.
This paper proposes a cost-sensitive adversarial data augmentation framework to make over- parameterized models cost-sensitive.
Our method can effectively minimize the overall cost and reduce critical errors, while achieving comparable performance in terms of overall accuracy.
arXiv Detail & Related papers (2022-08-24T19:00:30Z) - Controlled Sparsity via Constrained Optimization or: How I Learned to
Stop Tuning Penalties and Love Constraints [81.46143788046892]
We focus on the task of controlling the level of sparsity when performing sparse learning.
Existing methods based on sparsity-inducing penalties involve expensive trial-and-error tuning of the penalty factor.
We propose a constrained formulation where sparsification is guided by the training objective and the desired sparsity target in an end-to-end fashion.
arXiv Detail & Related papers (2022-08-08T21:24:20Z) - Sharpness-Aware Minimization for Efficiently Improving Generalization [36.87818971067698]
We introduce a novel, effective procedure for simultaneously minimizing loss value and loss sharpness.
Sharpness-Aware Minimization (SAM) seeks parameters that lie in neighborhoods having uniformly low loss.
We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets.
arXiv Detail & Related papers (2020-10-03T19:02:10Z) - Automatically Learning Compact Quality-aware Surrogates for Optimization
Problems [55.94450542785096]
Solving optimization problems with unknown parameters requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values.
Recent work has shown that including the optimization problem as a layer in a complex training model pipeline results in predictions of iteration of unobserved decision making.
We show that we can improve solution quality by learning a low-dimensional surrogate model of a large optimization problem.
arXiv Detail & Related papers (2020-06-18T19:11:54Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.