Relational Surrogate Loss Learning
- URL: http://arxiv.org/abs/2202.13197v1
- Date: Sat, 26 Feb 2022 17:32:57 GMT
- Title: Relational Surrogate Loss Learning
- Authors: Tao Huang, Zekang Li, Hua Lu, Yong Shan, Shusheng Yang, Yang Feng, Fei
Wang, Shan You, Chang Xu
- Abstract summary: This paper revisits the surrogate loss learning, where a deep neural network is employed to approximate the evaluation metrics.
In this paper, we show that directly maintaining the relation of models between surrogate losses and metrics suffices.
Our method is much easier to optimize and enjoys significant efficiency and performance gains.
- Score: 41.61184221367546
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Evaluation metrics in machine learning are often hardly taken as loss
functions, as they could be non-differentiable and non-decomposable, e.g.,
average precision and F1 score. This paper aims to address this problem by
revisiting the surrogate loss learning, where a deep neural network is employed
to approximate the evaluation metrics. Instead of pursuing an exact recovery of
the evaluation metric through a deep neural network, we are reminded of the
purpose of the existence of these evaluation metrics, which is to distinguish
whether one model is better or worse than another. In this paper, we show that
directly maintaining the relation of models between surrogate losses and
metrics suffices, and propose a rank correlation-based optimization method to
maximize this relation and learn surrogate losses. Compared to previous works,
our method is much easier to optimize and enjoys significant efficiency and
performance gains. Extensive experiments show that our method achieves
improvements on various tasks including image classification and neural machine
translation, and even outperforms state-of-the-art methods on human pose
estimation and machine reading comprehension tasks. Code is available at:
https://github.com/hunto/ReLoss.
Related papers
- Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms [80.37846867546517]
We show how to train eight different neural networks with custom objectives.
We exploit their second-order information via their empirical Fisherssian matrices.
We apply Loss Lossiable algorithms to achieve significant improvements for less differentiable algorithms.
arXiv Detail & Related papers (2024-10-24T18:02:11Z) - AnyLoss: Transforming Classification Metrics into Loss Functions [21.34290540936501]
evaluation metrics can be used to assess the performance of models in binary classification tasks.
Most metrics are derived from a confusion matrix in a non-differentiable form, making it difficult to generate a differentiable loss function that could directly optimize them.
We propose a general-purpose approach that transforms any confusion matrix-based metric into a loss function, textitAnyLoss, that is available in optimization processes.
arXiv Detail & Related papers (2024-05-23T16:14:16Z) - Reduced Jeffries-Matusita distance: A Novel Loss Function to Improve
Generalization Performance of Deep Classification Models [0.0]
We introduce a distance called Reduced Jeffries-Matusita as a loss function for training deep classification models to reduce the over-fitting issue.
The results show that the new distance measure stabilizes the training process significantly, enhances the generalization ability, and improves the performance of the models in the Accuracy and F1-score metrics.
arXiv Detail & Related papers (2024-03-13T10:51:38Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - SuSana Distancia is all you need: Enforcing class separability in metric
learning via two novel distance-based loss functions for few-shot image
classification [0.9236074230806579]
We propose two loss functions which consider the importance of the embedding vectors by looking at the intra-class and inter-class distance between the few data.
Our results show a significant improvement in accuracy in the miniImagenNet benchmark compared to other metric-based few-shot learning methods by a margin of 2%.
arXiv Detail & Related papers (2023-05-15T23:12:09Z) - Recall@k Surrogate Loss with Large Batches and Similarity Mixup [62.67458021725227]
Direct optimization, by gradient descent, of an evaluation metric is not possible when it is non-differentiable.
In this work, a differentiable surrogate loss for the recall is proposed.
The proposed method achieves state-of-the-art results in several image retrieval benchmarks.
arXiv Detail & Related papers (2021-08-25T11:09:11Z) - Learning with Multiclass AUC: Theory and Algorithms [141.63211412386283]
Area under the ROC curve (AUC) is a well-known ranking metric for problems such as imbalanced learning and recommender systems.
In this paper, we start an early trial to consider the problem of learning multiclass scoring functions via optimizing multiclass AUC metrics.
arXiv Detail & Related papers (2021-07-28T05:18:10Z) - MetricOpt: Learning to Optimize Black-Box Evaluation Metrics [21.608384691401238]
We study the problem of optimizing arbitrary non-differentiable task evaluation metrics such as misclassification rate and recall.
Our method, named MetricOpt, operates in a black-box setting where the computational details of the target metric are unknown.
We achieve this by learning a differentiable value function, which maps compact task-specific model parameters to metric observations.
arXiv Detail & Related papers (2021-04-21T16:50:01Z) - A Unified Framework of Surrogate Loss by Refactoring and Interpolation [65.60014616444623]
We introduce UniLoss, a unified framework to generate surrogate losses for training deep networks with gradient descent.
We validate the effectiveness of UniLoss on three tasks and four datasets.
arXiv Detail & Related papers (2020-07-27T21:16:51Z) - On the Outsized Importance of Learning Rates in Local Update Methods [2.094022863940315]
We study a family of algorithms, which we refer to as local update methods, that generalize many federated learning and meta-learning algorithms.
We prove that for quadratic objectives, local update methods perform gradient descent on a surrogate loss function which we exactly characterize.
We show that the choice of client learning rate controls the condition number of that surrogate loss, as well as the distance between the minimizers of the surrogate and true loss functions.
arXiv Detail & Related papers (2020-07-02T04:45:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.