Related papers: Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model

Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model

URL: http://arxiv.org/abs/2506.05801v1
Date: Fri, 06 Jun 2025 06:57:02 GMT
Title: Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model
Authors: Chuang Ma, Tomoyuki Obuchi, Toshiyuki Tanaka,
Abstract summary: We show that a phenomenon we call Ordinal Neural Collapse (ONC) indeed emerges and is characterized by the following three properties.<n>In particular, in the zero-regularization limit, a highly local and simple geometric relationship emerges between the latent variables and the threshold values.
Score: 4.958659914612866
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: A phenomenon known as ''Neural Collapse (NC)'' in deep classification tasks, in which the penultimate-layer features and the final classifiers exhibit an extremely simple geometric structure, has recently attracted considerable attention, with the expectation that it can deepen our understanding of how deep neural networks behave. The Unconstrained Feature Model (UFM) has been proposed to explain NC theoretically, and there emerges a growing body of work that extends NC to tasks other than classification and leverages it for practical applications. In this study, we investigate whether a similar phenomenon arises in deep Ordinal Regression (OR) tasks, via combining the cumulative link model for OR and UFM. We show that a phenomenon we call Ordinal Neural Collapse (ONC) indeed emerges and is characterized by the following three properties: (ONC1) all optimal features in the same class collapse to their within-class mean when regularization is applied; (ONC2) these class means align with the classifier, meaning that they collapse onto a one-dimensional subspace; (ONC3) the optimal latent variables (corresponding to logits or preactivations in classification tasks) are aligned according to the class order, and in particular, in the zero-regularization limit, a highly local and simple geometric relationship emerges between the latent variables and the threshold values. We prove these properties analytically within the UFM framework with fixed threshold values and corroborate them empirically across a variety of datasets. We also discuss how these insights can be leveraged in OR, highlighting the use of fixed thresholds.

Related papers

Sequential-Parallel Duality in Prefix Scannable Models [68.39855814099997]
Recent developments have given rise to various models, such as Gated Linear Attention (GLA) and Mamba.<n>This raises a natural question: can we characterize the full class of neural sequence models that support near-constant-time parallel evaluation and linear-time, constant-space sequential inference?
arXiv Detail & Related papers (2025-06-12T17:32:02Z)
Towards Demystifying the Generalization Behaviors When Neural Collapse Emerges [132.62934175555145]
Neural Collapse (NC) is a well-known phenomenon of deep neural networks in the terminal phase of training (TPT) We propose a theoretical explanation for why continuing training can still lead to accuracy improvement on test set, even after the train accuracy has reached 100%. We refer to this newly discovered property as "non-conservative generalization"
arXiv Detail & Related papers (2023-10-12T14:29:02Z)
Exploring Learned Representations of Neural Networks with Principal Component Analysis [1.0923877073891446]
In certain layers, as little as 20% of the intermediate feature-space variance is necessary for high-accuracy classification. We relate our findings to neural collapse and provide partial evidence for the related phenomenon of intermediate neural collapse.
arXiv Detail & Related papers (2023-09-27T00:18:25Z)
A Neural Collapse Perspective on Feature Evolution in Graph Neural Networks [44.31777384413466]
Graph neural networks (GNNs) have become increasingly popular for classification tasks on graph-structured data. In this paper, we focus on node-wise classification and explore the feature evolution through the lens of the "Neural Collapse" phenomenon. We show that even an "optimistic" mathematical model requires that the graphs obey a strict structural condition in order to possess a minimizer with exact collapse.
arXiv Detail & Related papers (2023-07-04T23:03:21Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
Understanding Imbalanced Semantic Segmentation Through Neural Collapse [81.89121711426951]
We show that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes. We introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure. Our method ranks 1st and sets a new record on the ScanNet200 test leaderboard.
arXiv Detail & Related papers (2023-01-03T13:51:51Z)
Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data [12.225207401994737]
We show that complex systems with massive amounts of parameters exhibit the same structural properties when training until convergence. In particular, it has been observed that the last-layer features collapse to their class-means. Our results demonstrate the convergence of the last-layer features and classifiers to a geometry consisting of vectors.
arXiv Detail & Related papers (2023-01-01T16:29:56Z)
Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold [30.3185037354742]
When training over normalized deep networks for classification tasks, the learned features exhibit a so-called "neural collapse" phenomenon. We show that better representations can be learned faster via feature normalization.
arXiv Detail & Related papers (2022-09-19T17:26:32Z)
Do We Really Need a Learnable Classifier at the End of Deep Neural Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training. Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z)
Unifying supervised learning and VAEs -- coverage, systematics and goodness-of-fit in normalizing-flow based neural network models for astro-particle reconstructions [0.0]
Statistical uncertainties, coverage, systematic uncertainties or a goodness-of-fit measure are often not calculated. We show that a KL-divergence objective of the joint distribution of data and labels allows to unify supervised learning and variational autoencoders. We discuss how to calculate coverage probabilities without numerical integration for specific "base-ordered" contours.
arXiv Detail & Related papers (2020-08-13T11:28:57Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.