Related papers: EnsLoss: Stochastic Calibrated Loss Ensembles for Preventing Overfitting in Classification

EnsLoss: Stochastic Calibrated Loss Ensembles for Preventing Overfitting in Classification

URL: http://arxiv.org/abs/2409.00908v2
Date: Wed, 4 Sep 2024 03:26:58 GMT
Title: EnsLoss: Stochastic Calibrated Loss Ensembles for Preventing Overfitting in Classification
Authors: Ben Dai,
Abstract summary: We propose a novel ensemble method, namely EnsLoss, to combine loss functions within the Empirical risk minimization framework. We first transform the CC conditions of losses into loss-derivatives, thereby bypassing the need for explicit loss functions. We theoretically establish the statistical consistency of our approach and provide insights into its benefits.
Score: 1.3778851745408134
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Empirical risk minimization (ERM) with a computationally feasible surrogate loss is a widely accepted approach for classification. Notably, the convexity and calibration (CC) properties of a loss function ensure consistency of ERM in maximizing accuracy, thereby offering a wide range of options for surrogate losses. In this article, we propose a novel ensemble method, namely EnsLoss, which extends the ensemble learning concept to combine loss functions within the ERM framework. A key feature of our method is the consideration on preserving the "legitimacy" of the combined losses, i.e., ensuring the CC properties. Specifically, we first transform the CC conditions of losses into loss-derivatives, thereby bypassing the need for explicit loss functions and directly generating calibrated loss-derivatives. Therefore, inspired by Dropout, EnsLoss enables loss ensembles through one training process with doubly stochastic gradient descent (i.e., random batch samples and random calibrated loss-derivatives). We theoretically establish the statistical consistency of our approach and provide insights into its benefits. The numerical effectiveness of EnsLoss compared to fixed loss methods is demonstrated through experiments on a broad range of 14 OpenML tabular datasets and 46 image datasets with various deep learning architectures. Python repository and source code are available on GitHub at https://github.com/statmlben/ensloss.

Related papers

Generalized Kullback-Leibler Divergence Loss [105.66549870868971]
We prove that the Kullback-Leibler (KL) Divergence loss is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss. Thanks to the decoupled structure of DKL loss, we have identified two areas for improvement.
arXiv Detail & Related papers (2025-03-11T04:43:33Z)
Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems [89.35169042718739]
collaborative inference enables end users to leverage powerful deep learning models without exposure of sensitive raw data to cloud servers. Recent studies have revealed that these intermediate features may not sufficiently preserve privacy, as information can be leaked and raw data can be reconstructed via model inversion attacks (MIAs) This work first theoretically proves that the conditional entropy of inputs given intermediate features provides a guaranteed lower bound on the reconstruction mean square error (MSE) under any MIA. Then, we derive a differentiable and solvable measure for bounding this conditional entropy based on the Gaussian mixture estimation and propose a conditional entropy algorithm to enhance the inversion robustness
arXiv Detail & Related papers (2025-03-01T07:15:21Z)
A Unified Contrastive Loss for Self-Training [3.3454373538792552]
Self-training methods have proven to be effective in exploiting abundant unlabeled data in semi-supervised learning. We propose a general framework to enhance self-training methods, which replaces all instances of CE losses with a unique contrastive loss. Our framework results in significant performance improvements across three different datasets with a limited number of labeled data.
arXiv Detail & Related papers (2024-09-11T14:22:41Z)
LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization [56.67706781191521]
An adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown to the learner. We present a robust online rounds optimization framework, where an adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown.
arXiv Detail & Related papers (2024-08-12T17:08:31Z)
Mitigating Privacy Risk in Membership Inference by Convex-Concave Loss [16.399746814823025]
Machine learning models are susceptible to membership inference attacks (MIAs), which aim to infer whether a sample is in the training set. Existing work utilizes gradient ascent to enlarge the loss variance of training data, alleviating the privacy risk. We propose a novel method -- Convex-Concave Loss, which enables a high variance of training loss distribution by gradient descent.
arXiv Detail & Related papers (2024-02-08T07:14:17Z)
Label Distributionally Robust Losses for Multi-class Classification: Consistency, Robustness and Adaptivity [55.29408396918968]
We study a family of loss functions named label-distributionally robust (LDR) losses for multi-class classification. Our contributions include both consistency and robustness by establishing top-$k$ consistency of LDR losses for multi-class classification. We propose a new adaptive LDR loss that automatically adapts the individualized temperature parameter to the noise degree of class label of each instance.
arXiv Detail & Related papers (2021-12-30T00:27:30Z)
Ensemble of Loss Functions to Improve Generalizability of Deep Metric Learning methods [0.609170287691728]
We propose novel approaches to combine different losses built on top of a shared deep feature extractor. We evaluate our methods on some popular datasets from the machine vision domain in conventional Zero-Shot-Learning (ZSL) settings.
arXiv Detail & Related papers (2021-07-02T15:19:46Z)
Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class. For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z)
A Symmetric Loss Perspective of Reliable Machine Learning [87.68601212686086]
We review how a symmetric loss can yield robust classification from corrupted labels in balanced error rate (BER) minimization. We demonstrate how the robust AUC method can benefit natural language processing in the problem where we want to learn only from relevant keywords.
arXiv Detail & Related papers (2021-01-05T06:25:47Z)
Shaping Deep Feature Space towards Gaussian Mixture for Visual Classification [74.48695037007306]
We propose a Gaussian mixture (GM) loss function for deep neural networks for visual classification. With a classification margin and a likelihood regularization, the GM loss facilitates both high classification performance and accurate modeling of the feature distribution. The proposed model can be implemented easily and efficiently without using extra trainable parameters.
arXiv Detail & Related papers (2020-11-18T03:32:27Z)
Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels. We show that the quality of gradient estimation matters more in risk minimization. We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z)
All your loss are belong to Bayes [28.393499629583786]
Loss functions are a cornerstone of machine learning and the starting point of most algorithms. We introduce a trick on squared Gaussian Processes to obtain a random process whose paths are compliant source functions. Experimental results demonstrate substantial improvements over the state of the art.
arXiv Detail & Related papers (2020-06-08T14:31:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.