Geometric Properties of Neural Multivariate Regression
- URL: http://arxiv.org/abs/2510.01105v1
- Date: Wed, 01 Oct 2025 16:50:57 GMT
- Title: Geometric Properties of Neural Multivariate Regression
- Authors: George Andriopoulos, Zixuan Dong, Bimarsha Adhikari, Keith Ross,
- Abstract summary: Collapsed models exhibit ID_H ID_Y, leading to over-compression and poor generalization.<n>We identify two regimes that determine when expanding or reducing feature dimensionality improves performance.
- Score: 3.259067345005505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural multivariate regression underpins a wide range of domains such as control, robotics, and finance, yet the geometry of its learned representations remains poorly characterized. While neural collapse has been shown to benefit generalization in classification, we find that analogous collapse in regression consistently degrades performance. To explain this contrast, we analyze models through the lens of intrinsic dimension. Across control tasks and synthetic datasets, we estimate the intrinsic dimension of last-layer features (ID_H) and compare it with that of the regression targets (ID_Y). Collapsed models exhibit ID_H < ID_Y, leading to over-compression and poor generalization, whereas non-collapsed models typically maintain ID_H > ID_Y. For the non-collapsed models, performance with respect to ID_H depends on the data quantity and noise levels. From these observations, we identify two regimes (over-compressed and under-compressed) that determine when expanding or reducing feature dimensionality improves performance. Our results provide new geometric insights into neural regression and suggest practical strategies for enhancing generalization.
Related papers
- Balanced Sharpness-Aware Minimization for Imbalanced Regression [29.03225426559032]
Real-world data often exhibits imbalanced distribution, making regression models perform poorly especially for target values with rare observations.<n>We propose Balanced Sharpness-Aware Minimization(BSAM) to enforce the uniform generalization ability of regression models.<n>In particular, we start from the traditional sharpness-aware minimization and then introduce a novel targeted reweighting strategy to homogenize the generalization ability across the observation space.
arXiv Detail & Related papers (2025-08-23T09:57:07Z) - Augmented Regression Models using Neurochaos Learning [1.534667887016089]
We present novel Augmented Regression Models using Neurochaos Learning (NL), where Tracemean features derived from the Neurochaos Learning framework are integrated with traditional regression algorithms.<n>Our approach was evaluated using ten diverse real-life datasets and a synthetically generated dataset of the form $y = mx + c + epsilon$.
arXiv Detail & Related papers (2025-05-19T11:02:14Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
We present a unifying perspective on recent results on ridge regression.<n>We use the basic tools of random matrix theory and free probability, aimed at readers with backgrounds in physics and deep learning.<n>Our results extend and provide a unifying perspective on earlier models of scaling laws.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing
Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates.
We formulate the regression-free model updates into a constrained optimization problem.
We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z) - Sparse Symmetric Tensor Regression for Functional Connectivity Analysis [13.482969034243581]
We propose a sparse symmetric tensor regression that further reduces the number of free parameters and achieves superior performance over symmetrized and ordinary CP regression.
We apply the proposed method to a study of Alzheimer's disease (AD) and normal ageing from the Berkeley Aging Cohort Study (BACS) and detect two regions of interest that have been identified important to AD.
arXiv Detail & Related papers (2020-10-28T02:07:39Z) - The Neural Tangent Kernel in High Dimensions: Triple Descent and a
Multi-Scale Theory of Generalization [34.235007566913396]
Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably well.
An emerging paradigm for describing this unexpected behavior is in terms of a emphdouble descent curve.
We provide a precise high-dimensional analysis of generalization with the Neural Tangent Kernel, which characterizes the behavior of wide neural networks with gradient descent.
arXiv Detail & Related papers (2020-08-15T20:55:40Z) - Hyperbolic Neural Networks++ [66.16106727715061]
We generalize the fundamental components of neural networks in a single hyperbolic geometry model, namely, the Poincar'e ball model.
Experiments show the superior parameter efficiency of our methods compared to conventional hyperbolic components, and stability and outperformance over their Euclidean counterparts.
arXiv Detail & Related papers (2020-06-15T08:23:20Z) - Dimension Independent Generalization Error by Stochastic Gradient
Descent [12.474236773219067]
We present a theory on the generalization error of descent (SGD) solutions for both and locally convex loss functions.
We show that the generalization error does not depend on the $p$ dimension or depends on the low effective $p$logarithmic factor.
arXiv Detail & Related papers (2020-03-25T03:08:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.