On the Implicit Geometry of Cross-Entropy Parameterizations for
Label-Imbalanced Data
- URL: http://arxiv.org/abs/2303.07608v1
- Date: Tue, 14 Mar 2023 03:04:37 GMT
- Title: On the Implicit Geometry of Cross-Entropy Parameterizations for
Label-Imbalanced Data
- Authors: Tina Behnia, Ganesh Ramachandra Kini, Vala Vakilian, Christos
Thrampoulidis
- Abstract summary: Various logit-adjusted parameterizations of the cross-entropy (CE) loss have been proposed as alternatives to weighted CE large models on labelimbalanced data.
We show that logit-adjusted parameterizations can be appropriately tuned to learn to learn irrespective of the minority imbalance ratio.
- Score: 26.310275682709776
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Various logit-adjusted parameterizations of the cross-entropy (CE) loss have
been proposed as alternatives to weighted CE for training large models on
label-imbalanced data far beyond the zero train error regime. The driving force
behind those designs has been the theory of implicit bias, which for
linear(ized) models, explains why they successfully induce bias on the
optimization path towards solutions that favor minorities. Aiming to extend
this theory to non-linear models, we investigate the implicit geometry of
classifiers and embeddings that are learned by different CE parameterizations.
Our main result characterizes the global minimizers of a non-convex
cost-sensitive SVM classifier for the unconstrained features model, which
serves as an abstraction of deep nets. We derive closed-form formulas for the
angles and norms of classifiers and embeddings as a function of the number of
classes, the imbalance and the minority ratios, and the loss hyperparameters.
Using these, we show that logit-adjusted parameterizations can be appropriately
tuned to learn symmetric geometries irrespective of the imbalance ratio. We
complement our analysis with experiments and an empirical study of convergence
accuracy in deep-nets.
Related papers
- Lie Algebra Canonicalization: Equivariant Neural Operators under arbitrary Lie Groups [11.572188414440436]
We propose Lie aLgebrA Canonicalization (LieLAC), a novel approach that exploits only the action of infinitesimal generators of the symmetry group.
operating within the framework of canonicalization, LieLAC can easily be integrated with unconstrained pre-trained models.
arXiv Detail & Related papers (2024-10-03T17:21:30Z) - Variational Bayesian surrogate modelling with application to robust design optimisation [0.9626666671366836]
Surrogate models provide a quick-to-evaluate approximation to complex computational models.
We consider Bayesian inference for constructing statistical surrogates with input uncertainties and dimensionality reduction.
We demonstrate intrinsic and robust structural optimisation problems where cost functions depend on a weighted sum of the mean and standard deviation of model outputs.
arXiv Detail & Related papers (2024-04-23T09:22:35Z) - Symmetry Breaking and Equivariant Neural Networks [17.740760773905986]
We introduce a novel notion of'relaxed equiinjection'
We show how to incorporate this relaxation into equivariant multilayer perceptronrons (E-MLPs)
The relevance of symmetry breaking is then discussed in various application domains.
arXiv Detail & Related papers (2023-12-14T15:06:48Z) - Learning Layer-wise Equivariances Automatically using Gradients [66.81218780702125]
Convolutions encode equivariance symmetries into neural networks leading to better generalisation performance.
symmetries provide fixed hard constraints on the functions a network can represent, need to be specified in advance, and can not be adapted.
Our goal is to allow flexible symmetry constraints that can automatically be learned from data using gradients.
arXiv Detail & Related papers (2023-10-09T20:22:43Z) - An iterative multi-fidelity approach for model order reduction of
multi-dimensional input parametric PDE systems [0.0]
We propose a sampling parametric strategy for the reduction of large-scale PDE systems with multidimensional input parametric spaces.
It is achieved by exploiting low-fidelity models throughout the parametric space to sample points using an efficient sampling strategy.
Since the proposed methodology leverages the use of low-fidelity models to assimilate the solution database, it significantly reduces the computational cost in the offline stage.
arXiv Detail & Related papers (2023-01-23T15:25:58Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - On how to avoid exacerbating spurious correlations when models are
overparameterized [33.315813572333745]
We show that VS-loss learns a model that is fair towards minorities even when spurious features are strong.
Compared to previous works, our bounds hold for more general models, they are non-asymptotic, and, they apply even at scenarios of extreme imbalance.
arXiv Detail & Related papers (2022-06-25T21:53:44Z) - Evaluating Sensitivity to the Stick-Breaking Prior in Bayesian
Nonparametrics [85.31247588089686]
We show that variational Bayesian methods can yield sensitivities with respect to parametric and nonparametric aspects of Bayesian models.
We provide both theoretical and empirical support for our variational approach to Bayesian sensitivity analysis.
arXiv Detail & Related papers (2021-07-08T03:40:18Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - Understanding Overparameterization in Generative Adversarial Networks [56.57403335510056]
Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems.
A theory has shown the importance of the gradient descent (GD) to globally optimal solutions.
We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
arXiv Detail & Related papers (2021-04-12T16:23:37Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.