Related papers: Learning Randomly Perturbed Structured Predictors for Direct Loss Minimization

Learning Randomly Perturbed Structured Predictors for Direct Loss Minimization

URL: http://arxiv.org/abs/2007.05724v2
Date: Mon, 14 Jun 2021 08:55:45 GMT
Title: Learning Randomly Perturbed Structured Predictors for Direct Loss Minimization
Authors: Hedda Cohen Indelman, Tamir Hazan
Abstract summary: Direct loss minimization is a popular approach for learning predictors over structured label spaces. We show that it balances better between the learned score function and the randomized noise in structured prediction.
Score: 18.981576950505442
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Direct loss minimization is a popular approach for learning predictors over structured label spaces. This approach is computationally appealing as it replaces integration with optimization and allows to propagate gradients in a deep net using loss-perturbed prediction. Recently, this technique was extended to generative models, while introducing a randomized predictor that samples a structure from a randomly perturbed score function. In this work, we learn the variance of these randomized structured predictors and show that it balances better between the learned score function and the randomized noise in structured prediction. We demonstrate empirically the effectiveness of learning the balance between the signal and the random noise in structured discrete spaces.

Related papers

Revisiting Randomization in Greedy Model Search [16.15551706774035]
We propose and analyze an ensemble of greedy forward selection estimators that are randomized by feature subsampling.<n>We design a novel implementation based on dynamic programming that greatly improves its computational efficiency.<n>Contrary to prevailing belief that randomized ensembling is analogous to shrinkage, we show that it can simultaneously reduce training error and degrees of freedom.
arXiv Detail & Related papers (2025-06-18T17:13:53Z)
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
Minimum Volume Conformal Sets for Multivariate Regression [44.99833362998488]
Conformal prediction provides a principled framework for constructing predictive sets with finite-sample validity. We propose an optimization-driven framework based on a novel loss function that directly learns minimum-conformity covering sets. Our approach optimize over prediction sets defined by arbitrary norm balls, including single and multi-norm formulations.
arXiv Detail & Related papers (2025-03-24T18:54:22Z)
Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so. We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed. Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z)
Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions. A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems. It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z)
Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles [34.32021888691789]
We develop a theory of feature-bagging in noisy least-squares ridge ensembles. We demonstrate that subsampling shifts the double-descent peak of a linear predictor. We compare the performance of a feature-subsampling ensemble to a single linear predictor.
arXiv Detail & Related papers (2023-07-06T17:56:06Z)
Learning Structured Gaussians to Approximate Deep Ensembles [10.055143995729415]
This paper proposes using a sparse-structured multivariate Gaussian to provide a closed-form approxorimator for dense image prediction tasks. We capture the uncertainty and structured correlations in the predictions explicitly in a formal distribution, rather than implicitly through sampling alone. We demonstrate the merits of our approach on monocular depth estimation and show that the advantages of our approach are obtained with comparable quantitative performance.
arXiv Detail & Related papers (2022-03-29T12:34:43Z)
Efficient and Differentiable Conformal Prediction with General Function Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters. We show that it achieves approximate valid population coverage and near-optimal efficiency within class. Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z)
Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient for Out-of-Distribution Generalization [52.7137956951533]
We argue that devising simpler methods for learning predictors on existing features is a promising direction for future research. We introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift. Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions.
arXiv Detail & Related papers (2022-02-14T16:42:16Z)
Probabilistic Forecasting with Generative Networks via Scoring Rule Minimization [5.5643498845134545]
We use generative neural networks to parametrize distributions on high-dimensional spaces by transforming draws from a latent variable. We train generative networks to minimize a predictive-sequential (or prequential) scoring rule on a recorded temporal sequence of the phenomenon of interest. Our method outperforms state-of-the-art adversarial approaches, especially in probabilistic calibration.
arXiv Detail & Related papers (2021-12-15T15:51:12Z)
Prediction intervals for Deep Neural Networks [0.0]
We adapt the randomized trees method originally developed for random forests to construct ensembles of neural networks. The extra-randomness introduced in the ensemble reduces the variance of the predictions and yields gains in out-of-sample accuracy.
arXiv Detail & Related papers (2020-10-08T15:11:28Z)
CASTLE: Regularization via Auxiliary Causal Graph Discovery [89.74800176981842]
We introduce Causal Structure Learning (CASTLE) regularization and propose to regularize a neural network by jointly learning the causal relationships between variables. CASTLE efficiently reconstructs only the features in the causal DAG that have a causal neighbor, whereas reconstruction-based regularizers suboptimally reconstruct all input features.
arXiv Detail & Related papers (2020-09-28T09:49:38Z)
Learning Output Embeddings in Structured Prediction [73.99064151691597]
A powerful and flexible approach to structured prediction consists in embedding the structured objects to be predicted into a feature space of possibly infinite dimension. A prediction in the original space is computed by solving a pre-image problem. In this work, we propose to jointly learn a finite approximation of the output embedding and the regression function into the new feature space.
arXiv Detail & Related papers (2020-07-29T09:32:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.