Related papers: Training Over-parameterized Models with Non-decomposable Objectives

Training Over-parameterized Models with Non-decomposable Objectives

URL: http://arxiv.org/abs/2107.04641v1
Date: Fri, 9 Jul 2021 19:29:33 GMT
Title: Training Over-parameterized Models with Non-decomposable Objectives
Authors: Harikrishna Narasimhan, Aditya Krishna Menon
Abstract summary: We propose new cost-sensitive losses that extend the classical idea of logit adjustment to handle more general cost matrices. Our losses are calibrated, and can be further improved with distilled labels from a teacher model.
Score: 46.62273918807789
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many modern machine learning applications come with complex and nuanced design goals such as minimizing the worst-case error, satisfying a given precision or recall target, or enforcing group-fairness constraints. Popular techniques for optimizing such non-decomposable objectives reduce the problem into a sequence of cost-sensitive learning tasks, each of which is then solved by re-weighting the training loss with example-specific costs. We point out that the standard approach of re-weighting the loss to incorporate label costs can produce unsatisfactory results when used to train over-parameterized models. As a remedy, we propose new cost-sensitive losses that extend the classical idea of logit adjustment to handle more general cost matrices. Our losses are calibrated, and can be further improved with distilled labels from a teacher model. Through experiments on benchmark image datasets, we showcase the effectiveness of our approach in training ResNet models with common robust and constrained optimization objectives.

Related papers

You Only Train Once [11.97836331714694]
You Only Train Once (YOTO) contributes to limiting training to one shot for the latter aspect of losses selection and weighting.<n>We leverage the differentiability of the composite loss formulation which is widely used for optimizing multiple empirical losses simultaneously.<n>We show that YOTO consistently outperforms the best grid-search model on unseen test data.
arXiv Detail & Related papers (2025-06-04T18:04:58Z)
EsaCL: Efficient Continual Learning of Sparse Models [10.227171407348326]
Key challenge in the continual learning setting is to efficiently learn a sequence of tasks without forgetting how to perform previously learned tasks. We propose a new method for efficient continual learning of sparse models (EsaCL) that can automatically prune redundant parameters without adversely impacting the model's predictive power.
arXiv Detail & Related papers (2024-01-11T04:59:44Z)
Cost-Effective Retraining of Machine Learning Models [2.9461360639852914]
It is important to retrain a machine learning (ML) model in order to maintain its performance as the data changes over time. This creates a trade-off between retraining too frequently, which leads to unnecessary computing costs, and not retraining often enough. We propose ML systems that make automated and cost-effective decisions about when to retrain an ML model.
arXiv Detail & Related papers (2023-10-06T13:02:29Z)
Gradient constrained sharpness-aware prompt learning for vision-language models [99.74832984957025]
This paper targets a novel trade-off problem in generalizable prompt learning for vision-language models (VLM) By analyzing the loss landscapes of the state-of-the-art method and vanilla Sharpness-aware Minimization (SAM) based method, we conclude that the trade-off performance correlates to both loss value and loss sharpness. We propose a novel SAM-based method for prompt learning, denoted as Gradient Constrained Sharpness-aware Context Optimization (GCSCoOp)
arXiv Detail & Related papers (2023-09-14T17:13:54Z)
Optimizing Data Collection for Machine Learning [87.37252958806856]
Modern deep learning systems require huge data sets to achieve impressive performance. Over-collecting data incurs unnecessary present costs, while under-collecting may incur future costs and delay. We propose a new paradigm for modeling the data collection as a formal optimal data collection problem.
arXiv Detail & Related papers (2022-10-03T21:19:05Z)
Rethinking Cost-sensitive Classification in Deep Learning via Adversarial Data Augmentation [4.479834103607382]
Cost-sensitive classification is critical in applications where misclassification errors widely vary in cost. This paper proposes a cost-sensitive adversarial data augmentation framework to make over- parameterized models cost-sensitive. Our method can effectively minimize the overall cost and reduce critical errors, while achieving comparable performance in terms of overall accuracy.
arXiv Detail & Related papers (2022-08-24T19:00:30Z)
Controlled Sparsity via Constrained Optimization or: How I Learned to Stop Tuning Penalties and Love Constraints [81.46143788046892]
We focus on the task of controlling the level of sparsity when performing sparse learning. Existing methods based on sparsity-inducing penalties involve expensive trial-and-error tuning of the penalty factor. We propose a constrained formulation where sparsification is guided by the training objective and the desired sparsity target in an end-to-end fashion.
arXiv Detail & Related papers (2022-08-08T21:24:20Z)
Sharpness-Aware Minimization for Efficiently Improving Generalization [36.87818971067698]
We introduce a novel, effective procedure for simultaneously minimizing loss value and loss sharpness. Sharpness-Aware Minimization (SAM) seeks parameters that lie in neighborhoods having uniformly low loss. We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets.
arXiv Detail & Related papers (2020-10-03T19:02:10Z)
Automatically Learning Compact Quality-aware Surrogates for Optimization Problems [55.94450542785096]
Solving optimization problems with unknown parameters requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values. Recent work has shown that including the optimization problem as a layer in a complex training model pipeline results in predictions of iteration of unobserved decision making. We show that we can improve solution quality by learning a low-dimensional surrogate model of a large optimization problem.
arXiv Detail & Related papers (2020-06-18T19:11:54Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.