Joint Optimization of Neural Autoregressors via Scoring rules
- URL: http://arxiv.org/abs/2601.05683v1
- Date: Fri, 09 Jan 2026 10:05:07 GMT
- Title: Joint Optimization of Neural Autoregressors via Scoring rules
- Authors: Jonas Landsgesell,
- Abstract summary: Tabular Prior-Data Fitted Network (TabPFN) has demonstrated state-of-the-art performance on various benchmarks.<n>In a naive non-parametric discretization with $N$ bins per dimension, the complexity of an explicit joint grid scales exponentially.
- Score: 0.7877961820015923
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Non-parametric distributional regression has achieved significant milestones in recent years. Among these, the Tabular Prior-Data Fitted Network (TabPFN) has demonstrated state-of-the-art performance on various benchmarks. However, a challenge remains in extending these grid-based approaches to a truly multivariate setting. In a naive non-parametric discretization with $N$ bins per dimension, the complexity of an explicit joint grid scales exponentially and the paramer count of the neural networks rise sharply. This scaling is particularly detrimental in low-data regimes, as the final projection layer would require many parameters, leading to severe overfitting and intractability.
Related papers
- From Overfitting to Reliability: Introducing the Hierarchical Approximate Bayesian Neural Network [3.632251954989679]
HABNN is a novel approach that uses a Gaussian-inverse-Wishart distribution as a hyperprior of the network's weights.<n>Results indicate that HABNN not only matches but often outperforms state-of-the-art models.
arXiv Detail & Related papers (2025-12-15T09:08:42Z) - Closed-Form Last Layer Optimization [72.49151473937319]
Under a squared loss, the optimal solution to the linear last layer weights is known in closed-form.<n>We show this is equivalent to alternating between gradient descent steps on the backbone and closed-form updates on the last layer.
arXiv Detail & Related papers (2025-10-06T09:14:39Z) - Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.<n>Non-smooth regularization is often incorporated into machine learning tasks.<n>We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z) - Covering Numbers for Deep ReLU Networks with Applications to Function Approximation and Nonparametric Regression [4.297070083645049]
We develop tight (up to a multiplicative constant) lower and upper bounds on the covering numbers of fully-connected networks.
Thanks to the tightness of the bounds, a fundamental understanding of the impact of sparsity, quantization, bounded vs. unbounded weights, and network output truncation can be developed.
arXiv Detail & Related papers (2024-10-08T21:23:14Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Generalization Error Bounds for Deep Neural Networks Trained by SGD [3.148524502470734]
Generalization error bounds for deep trained by gradient descent (SGD) are derived.
The bounds explicitly depend on the loss along the training trajectory.
Results show that our bounds are non-vacuous and robust with the change of neural networks and network hypers.
arXiv Detail & Related papers (2022-06-07T13:46:10Z) - A Biased Graph Neural Network Sampler with Near-Optimal Regret [57.70126763759996]
Graph neural networks (GNN) have emerged as a vehicle for applying deep network architectures to graph and relational data.
In this paper, we build upon existing work and treat GNN neighbor sampling as a multi-armed bandit problem.
We introduce a newly-designed reward function that introduces some degree of bias designed to reduce variance and avoid unstable, possibly-unbounded payouts.
arXiv Detail & Related papers (2021-03-01T15:55:58Z) - A Fully Tensorized Recurrent Neural Network [48.50376453324581]
We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell.
This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
arXiv Detail & Related papers (2020-10-08T18:24:12Z) - The Heavy-Tail Phenomenon in SGD [7.366405857677226]
We show that depending on the structure of the Hessian of the loss at the minimum, the SGD iterates will converge to a emphheavy-tailed stationary distribution.
We translate our results into insights about the behavior of SGD in deep learning.
arXiv Detail & Related papers (2020-06-08T16:43:56Z) - Beyond Dropout: Feature Map Distortion to Regularize Deep Neural
Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks.
We propose a feature distortion method (Disout) for addressing the aforementioned problem.
The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.