Deep Autoregressive Regression
- URL: http://arxiv.org/abs/2211.07447v2
- Date: Wed, 16 Nov 2022 00:25:11 GMT
- Title: Deep Autoregressive Regression
- Authors: Adam Khakhar, Jacob Buckman
- Abstract summary: We show that a major limitation of regression using a mean-squared error loss is its sensitivity to the scale of its targets.
We propose a novel approach to training deep learning models on real-valued regression targets, autoregressive regression.
- Score: 5.257719744958367
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we demonstrate that a major limitation of regression using a
mean-squared error loss is its sensitivity to the scale of its targets. This
makes learning settings consisting of several subtasks with differently-scaled
targets challenging, and causes algorithms to require task-specific learning
rate tuning. A recently-proposed alternative loss function, known as histogram
loss, avoids this issue. However, its computational cost grows linearly with
the number of buckets in the histogram, which renders prediction with
real-valued targets intractable. To address this issue, we propose a novel
approach to training deep learning models on real-valued regression targets,
autoregressive regression, which learns a high-fidelity distribution by
utilizing an autoregressive target decomposition. We demonstrate that this
training objective allows us to solve regression tasks involving multiple
targets with different scales.
Related papers
- A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks [81.2624272756733]
In dense retrieval, deep encoders provide embeddings for both inputs and targets.
We train a small parametric corrector network that adjusts stale cached target embeddings.
Our approach matches state-of-the-art results even when no target embedding updates are made during training.
arXiv Detail & Related papers (2024-09-03T13:29:13Z) - Investigating the Histogram Loss in Regression [16.83443393563771]
Histogram Loss is a regression approach to learning the conditional distribution of a target variable.
We show that the benefits of learning distributions in this setup come from improvements in optimization rather than modelling extra information.
arXiv Detail & Related papers (2024-02-20T23:29:41Z) - Deep Imbalanced Regression via Hierarchical Classification Adjustment [50.19438850112964]
Regression tasks in computer vision are often formulated into classification by quantizing the target space into classes.
The majority of training samples lie in a head range of target values, while a minority of samples span a usually larger tail range.
We propose to construct hierarchical classifiers for solving imbalanced regression tasks.
Our novel hierarchical classification adjustment (HCA) for imbalanced regression shows superior results on three diverse tasks.
arXiv Detail & Related papers (2023-10-26T04:54:39Z) - RegExplainer: Generating Explanations for Graph Neural Networks in Regression Tasks [10.473178462412584]
We propose a novel explanation method to interpret the graph regression models (XAIG-R)
Our method addresses the distribution shifting problem and continuously ordered decision boundary issues.
We present a self-supervised learning strategy to tackle the continuously ordered labels in regression tasks.
arXiv Detail & Related papers (2023-07-15T16:16:22Z) - Semi-Supervised Deep Regression with Uncertainty Consistency and
Variational Model Ensembling via Bayesian Neural Networks [31.67508478764597]
We propose a novel approach to semi-supervised regression, namely Uncertainty-Consistent Variational Model Ensembling (UCVME)
Our consistency loss significantly improves uncertainty estimates and allows higher quality pseudo-labels to be assigned greater importance under heteroscedastic regression.
Experiments show that our method outperforms state-of-the-art alternatives on different tasks and can be competitive with supervised methods that use full labels.
arXiv Detail & Related papers (2023-02-15T10:40:51Z) - Training trajectories, mini-batch losses and the curious role of the
learning rate [13.848916053916618]
We show that validated gradient descent plays a fundamental role in nearly all applications of deep learning.
We propose a simple model and a geometric interpretation that allows to analyze the relationship between the gradients of mini-batches and the full batch.
In particular, a very low loss value can be reached just one step of descent with large enough learning rate.
arXiv Detail & Related papers (2023-01-05T21:58:46Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - ReLU Regression with Massart Noise [52.10842036932169]
We study the fundamental problem of ReLU regression, where the goal is to fit Rectified Linear Units (ReLUs) to data.
We focus on ReLU regression in the Massart noise model, a natural and well-studied semi-random noise model.
We develop an efficient algorithm that achieves exact parameter recovery in this model.
arXiv Detail & Related papers (2021-09-10T02:13:22Z) - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing
Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates.
We formulate the regression-free model updates into a constrained optimization problem.
We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z) - Regressive Domain Adaptation for Unsupervised Keypoint Detection [67.2950306888855]
Domain adaptation (DA) aims at transferring knowledge from a labeled source domain to an unlabeled target domain.
We present a method of regressive domain adaptation (RegDA) for unsupervised keypoint detection.
Our method brings large improvement by 8% to 11% in terms of PCK on different datasets.
arXiv Detail & Related papers (2021-03-10T16:45:22Z) - Deep Ordinal Regression with Label Diversity [19.89482062012177]
We propose that using several discrete data representations simultaneously can improve neural network learning.
Our approach is end-to-end differentiable and can be added as a simple extension to conventional learning methods.
arXiv Detail & Related papers (2020-06-29T08:23:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.