Related papers: Investigating the Histogram Loss in Regression

Investigating the Histogram Loss in Regression

URL: http://arxiv.org/abs/2402.13425v2
Date: Sat, 19 Oct 2024 21:53:25 GMT
Title: Investigating the Histogram Loss in Regression
Authors: Ehsan Imani, Kai Luedemann, Sam Scholnick-Hughes, Esraa Elelimy, Martha White,
Abstract summary: Histogram Loss is a regression approach to learning the conditional distribution of a target variable. We show that the benefits of learning distributions in this setup come from improvements in optimization rather than modelling extra information.
Score: 16.83443393563771
License: http://creativecommons.org/licenses/by/4.0/
Abstract: It is becoming increasingly common in regression to train neural networks that model the entire distribution even if only the mean is required for prediction. This additional modeling often comes with performance gain and the reasons behind the improvement are not fully known. This paper investigates a recent approach to regression, the Histogram Loss, which involves learning the conditional distribution of the target variable by minimizing the cross-entropy between a target distribution and a flexible histogram prediction. We design theoretical and empirical analyses to determine why and when this performance gain appears, and how different components of the loss contribute to it. Our results suggest that the benefits of learning distributions in this setup come from improvements in optimization rather than modelling extra information. We then demonstrate the viability of the Histogram Loss in common deep learning applications without a need for costly hyperparameter tuning.

Related papers

Rectifying Regression in Reinforcement Learning [51.28909745713678]
We show that mean absolute error is a better prediction objective than the traditional mean squared error for controlling the learned policy's suboptimality gap.<n>We present results that different loss functions are better aligned with these different regression objectives.
arXiv Detail & Related papers (2025-10-01T13:32:07Z)
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
Continuous Visual Autoregressive Generation via Score Maximization [69.67438563485887]
We introduce a Continuous VAR framework that enables direct visual autoregressive generation without vector quantization.<n>Within this framework, all we need is to select a strictly proper score and set it as the training objective to optimize.
arXiv Detail & Related papers (2025-05-12T17:58:14Z)
Towards Robust Out-of-Distribution Generalization: Data Augmentation and Neural Architecture Search Approaches [4.577842191730992]
We study ways toward robust OoD generalization for deep learning. We first propose a novel and effective approach to disentangle the spurious correlation between features that are not essential for recognition. We then study the problem of strengthening neural architecture search in OoD scenarios.
arXiv Detail & Related papers (2024-10-25T20:50:32Z)
Learning Latent Graph Structures and their Uncertainty [63.95971478893842]
Graph Neural Networks (GNNs) use relational information as an inductive bias to enhance the model's accuracy. As task-relevant relations might be unknown, graph structure learning approaches have been proposed to learn them while solving the downstream prediction task.
arXiv Detail & Related papers (2024-05-30T10:49:22Z)
Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance. Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z)
Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift [12.770658031721435]
We propose a method for adapting the weights of the last layer of a pre-trained neural regression model to perform better on input data originating from a different distribution. We demonstrate how this lightweight spectral adaptation procedure can improve out-of-distribution performance for synthetic and real-world datasets.
arXiv Detail & Related papers (2023-12-29T04:15:58Z)
Learning to Reweight for Graph Neural Network [63.978102332612906]
Graph Neural Networks (GNNs) show promising results for graph tasks. Existing GNNs' generalization ability will degrade when there exist distribution shifts between testing and training graph data. We propose a novel nonlinear graph decorrelation method, which can substantially improve the out-of-distribution generalization ability.
arXiv Detail & Related papers (2023-12-19T12:25:10Z)
Learning Rate Schedules in the Presence of Distribution Shift [18.310336156637774]
We design learning schedules that regret networks cumulatively learning in the presence of a changing data distribution. We provide experiments for high-dimensional regression models to increase regret models.
arXiv Detail & Related papers (2023-03-27T23:29:02Z)
Deep Autoregressive Regression [5.257719744958367]
We show that a major limitation of regression using a mean-squared error loss is its sensitivity to the scale of its targets. We propose a novel approach to training deep learning models on real-valued regression targets, autoregressive regression.
arXiv Detail & Related papers (2022-11-14T15:22:20Z)
Optimal Propagation for Graph Neural Networks [51.08426265813481]
We propose a bi-level optimization approach for learning the optimal graph structure. We also explore a low-rank approximation model for further reducing the time complexity.
arXiv Detail & Related papers (2022-05-06T03:37:00Z)
OOD-GNN: Out-of-Distribution Generalized Graph Neural Network [73.67049248445277]
Graph neural networks (GNNs) have achieved impressive performance when testing and training graph data come from identical distribution. Existing GNNs lack out-of-distribution generalization abilities so that their performance substantially degrades when there exist distribution shifts between testing and training graph data. We propose an out-of-distribution generalized graph neural network (OOD-GNN) for achieving satisfactory performance on unseen testing graphs that have different distributions with training graphs.
arXiv Detail & Related papers (2021-12-07T16:29:10Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
A Locally Adaptive Interpretable Regression [7.4267694612331905]
Linear regression is one of the most interpretable prediction models. In this work, we introduce a locally adaptive interpretable regression (LoAIR) Our model achieves comparable or better predictive performance than the other state-of-the-art baselines.
arXiv Detail & Related papers (2020-05-07T09:26:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.