Classification vs regression in overparameterized regimes: Does the loss
function matter?
- URL: http://arxiv.org/abs/2005.08054v2
- Date: Thu, 14 Oct 2021 15:33:01 GMT
- Title: Classification vs regression in overparameterized regimes: Does the loss
function matter?
- Authors: Vidya Muthukumar, Adhyyan Narang, Vignesh Subramanian, Mikhail Belkin,
Daniel Hsu, Anant Sahai
- Abstract summary: We show that solutions obtained by least-squares minimum-norm, typically used for regression, are identical to those produced by the hard-margin support vector machine (SVM)
Our results demonstrate the very different roles and properties of loss functions used at the training phase (optimization) and the testing phase (generalization)
- Score: 21.75115239010008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We compare classification and regression tasks in an overparameterized linear
model with Gaussian features. On the one hand, we show that with sufficient
overparameterization all training points are support vectors: solutions
obtained by least-squares minimum-norm interpolation, typically used for
regression, are identical to those produced by the hard-margin support vector
machine (SVM) that minimizes the hinge loss, typically used for training
classifiers. On the other hand, we show that there exist regimes where these
interpolating solutions generalize well when evaluated by the 0-1 test loss
function, but do not generalize if evaluated by the square loss function, i.e.
they approach the null risk. Our results demonstrate the very different roles
and properties of loss functions used at the training phase (optimization) and
the testing phase (generalization).
Related papers
- Generalization bounds for regression and classification on adaptive covering input domains [1.4141453107129398]
We focus on the generalization bound, which serves as an upper limit for the generalization error.
In the case of classification tasks, we treat the target function as a one-hot, a piece-wise constant function, and employ 0/1 loss for error measurement.
arXiv Detail & Related papers (2024-07-29T05:40:08Z) - A unified law of robustness for Bregman divergence losses [2.014089835498735]
We show that Bregman divergence losses form a common generalization of square loss and cross-entropy loss.
Our generalization relies on identifying a bias-variance type decomposition that lies at the heart of the proof and Bubeck and Sellke.
arXiv Detail & Related papers (2024-05-26T17:30:44Z) - Robust Capped lp-Norm Support Vector Ordinal Regression [85.84718111830752]
Ordinal regression is a specialized supervised problem where the labels show an inherent order.
Support Vector Ordinal Regression, as an outstanding ordinal regression model, is widely used in many ordinal regression tasks.
We introduce a new model, Capped $ell_p$-Norm Support Vector Ordinal Regression(CSVOR), that is robust to outliers.
arXiv Detail & Related papers (2024-04-25T13:56:05Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Cut your Losses with Squentropy [19.924900110707284]
We propose the "squentropy" loss, which is the sum of two terms: the cross-entropy loss and the average square loss over the incorrect classes.
We show that the squentropy loss outperforms both the pure cross entropy and rescaled square losses in terms of the classification accuracy.
arXiv Detail & Related papers (2023-02-08T09:21:13Z) - Regularized ERM on random subspaces [17.927376388967144]
We consider possibly data dependent subspaces spanned by a random subset of the data, recovering as a special case Nystrom approaches for kernel methods.
Considering random subspaces naturally leads to computational savings, but the question is whether the corresponding learning accuracy is degraded.
arXiv Detail & Related papers (2022-12-04T16:12:11Z) - Do Lessons from Metric Learning Generalize to Image-Caption Retrieval? [67.45267657995748]
The triplet loss with semi-hard negatives has become the de facto choice for image-caption retrieval (ICR) methods that are optimized from scratch.
Recent progress in metric learning has given rise to new loss functions that outperform the triplet loss on tasks such as image retrieval and representation learning.
We ask whether these findings generalize to the setting of ICR by comparing three loss functions on two ICR methods.
arXiv Detail & Related papers (2022-02-14T15:18:00Z) - A Precise Performance Analysis of Support Vector Regression [105.94855998235232]
We study the hard and soft support vector regression techniques applied to a set of $n$ linear measurements.
Our results are then used to optimally tune the parameters intervening in the design of hard and soft support vector regression algorithms.
arXiv Detail & Related papers (2021-05-21T14:26:28Z) - Learning by Minimizing the Sum of Ranked Range [58.24935359348289]
We introduce the sum of ranked range (SoRR) as a general approach to form learning objectives.
A ranked range is a consecutive sequence of sorted values of a set of real numbers.
We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary classification and the TKML individual loss for multi-label/multi-class classification.
arXiv Detail & Related papers (2020-10-05T01:58:32Z) - Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data.
We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z) - A Loss-Function for Causal Machine-Learning [0.0]
Causal machine-learning is about predicting the net-effect (true-lift) of treatments.
There is no similarly well-defined loss function due to the lack of point-wise true values in the data.
We propose a novel method to define a loss function in this context, which is equal to mean-square-error (MSE) in a standard regression problem.
arXiv Detail & Related papers (2020-01-02T21:22:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.