Learning Randomly Perturbed Structured Predictors for Direct Loss
Minimization
- URL: http://arxiv.org/abs/2007.05724v2
- Date: Mon, 14 Jun 2021 08:55:45 GMT
- Title: Learning Randomly Perturbed Structured Predictors for Direct Loss
Minimization
- Authors: Hedda Cohen Indelman, Tamir Hazan
- Abstract summary: Direct loss minimization is a popular approach for learning predictors over structured label spaces.
We show that it balances better between the learned score function and the randomized noise in structured prediction.
- Score: 18.981576950505442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Direct loss minimization is a popular approach for learning predictors over
structured label spaces. This approach is computationally appealing as it
replaces integration with optimization and allows to propagate gradients in a
deep net using loss-perturbed prediction. Recently, this technique was extended
to generative models, while introducing a randomized predictor that samples a
structure from a randomly perturbed score function. In this work, we learn the
variance of these randomized structured predictors and show that it balances
better between the learned score function and the randomized noise in
structured prediction. We demonstrate empirically the effectiveness of learning
the balance between the signal and the random noise in structured discrete
spaces.
Related papers
- Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so.
We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed.
Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge
Ensembles [34.32021888691789]
We develop a theory of feature-bagging in noisy least-squares ridge ensembles.
We demonstrate that subsampling shifts the double-descent peak of a linear predictor.
We compare the performance of a feature-subsampling ensemble to a single linear predictor.
arXiv Detail & Related papers (2023-07-06T17:56:06Z) - Learning Structured Gaussians to Approximate Deep Ensembles [10.055143995729415]
This paper proposes using a sparse-structured multivariate Gaussian to provide a closed-form approxorimator for dense image prediction tasks.
We capture the uncertainty and structured correlations in the predictions explicitly in a formal distribution, rather than implicitly through sampling alone.
We demonstrate the merits of our approach on monocular depth estimation and show that the advantages of our approach are obtained with comparable quantitative performance.
arXiv Detail & Related papers (2022-03-29T12:34:43Z) - Efficient and Differentiable Conformal Prediction with General Function
Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters.
We show that it achieves approximate valid population coverage and near-optimal efficiency within class.
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z) - Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient
for Out-of-Distribution Generalization [52.7137956951533]
We argue that devising simpler methods for learning predictors on existing features is a promising direction for future research.
We introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift.
Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions.
arXiv Detail & Related papers (2022-02-14T16:42:16Z) - Probabilistic Forecasting with Generative Networks via Scoring Rule
Minimization [5.5643498845134545]
We use generative neural networks to parametrize distributions on high-dimensional spaces by transforming draws from a latent variable.
We train generative networks to minimize a predictive-sequential (or prequential) scoring rule on a recorded temporal sequence of the phenomenon of interest.
Our method outperforms state-of-the-art adversarial approaches, especially in probabilistic calibration.
arXiv Detail & Related papers (2021-12-15T15:51:12Z) - Prediction intervals for Deep Neural Networks [0.0]
We adapt the randomized trees method originally developed for random forests to construct ensembles of neural networks.
The extra-randomness introduced in the ensemble reduces the variance of the predictions and yields gains in out-of-sample accuracy.
arXiv Detail & Related papers (2020-10-08T15:11:28Z) - CASTLE: Regularization via Auxiliary Causal Graph Discovery [89.74800176981842]
We introduce Causal Structure Learning (CASTLE) regularization and propose to regularize a neural network by jointly learning the causal relationships between variables.
CASTLE efficiently reconstructs only the features in the causal DAG that have a causal neighbor, whereas reconstruction-based regularizers suboptimally reconstruct all input features.
arXiv Detail & Related papers (2020-09-28T09:49:38Z) - Learning Output Embeddings in Structured Prediction [73.99064151691597]
A powerful and flexible approach to structured prediction consists in embedding the structured objects to be predicted into a feature space of possibly infinite dimension.
A prediction in the original space is computed by solving a pre-image problem.
In this work, we propose to jointly learn a finite approximation of the output embedding and the regression function into the new feature space.
arXiv Detail & Related papers (2020-07-29T09:32:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.