First-order Optimization for Superquantile-based Supervised Learning
- URL: http://arxiv.org/abs/2009.14575v2
- Date: Thu, 1 Oct 2020 09:11:20 GMT
- Title: First-order Optimization for Superquantile-based Supervised Learning
- Authors: Yassine Laguel, J\'er\^ome Malick and Zaid Harchaoui
- Abstract summary: We propose a first-order optimization algorithm to minimize a superquantile-based learning objective.
The proposed algorithm is based on smoothing the superquantile function by infimal convolution.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classical supervised learning via empirical risk (or negative log-likelihood)
minimization hinges upon the assumption that the testing distribution coincides
with the training distribution. This assumption can be challenged in modern
applications of machine learning in which learning machines may operate at
prediction time with testing data whose distribution departs from the one of
the training data. We revisit the superquantile regression method by proposing
a first-order optimization algorithm to minimize a superquantile-based learning
objective. The proposed algorithm is based on smoothing the superquantile
function by infimal convolution. Promising numerical results illustrate the
interest of the approach towards safer supervised learning.
Related papers
- Learning-Augmented Algorithms with Explicit Predictors [67.02156211760415]
Recent advances in algorithmic design show how to utilize predictions obtained by machine learning models from past and present data.
Prior research in this context was focused on a paradigm where the predictor is pre-trained on past data and then used as a black box.
In this work, we unpack the predictor and integrate the learning problem it gives rise for within the algorithmic challenge.
arXiv Detail & Related papers (2024-03-12T08:40:21Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Less is More: Rethinking Few-Shot Learning and Recurrent Neural Nets [2.824895388993495]
We provide theoretical guarantees for reliable learning under the information-theoretic AEP.
We then focus on a highly efficient recurrent neural net (RNN) framework and propose a reduced-entropy algorithm for few-shot learning.
Our experimental results demonstrate significant potential for improving learning models' sample efficiency, generalization, and time complexity.
arXiv Detail & Related papers (2022-09-28T17:33:11Z) - Non-Clairvoyant Scheduling with Predictions Revisited [77.86290991564829]
In non-clairvoyant scheduling, the task is to find an online strategy for scheduling jobs with a priori unknown processing requirements.
We revisit this well-studied problem in a recently popular learning-augmented setting that integrates (untrusted) predictions in algorithm design.
We show that these predictions have desired properties, admit a natural error measure as well as algorithms with strong performance guarantees.
arXiv Detail & Related papers (2022-02-21T13:18:11Z) - Learning Predictions for Algorithms with Predictions [49.341241064279714]
We introduce a general design approach for algorithms that learn predictors.
We apply techniques from online learning to learn against adversarial instances, tune robustness-consistency trade-offs, and obtain new statistical guarantees.
We demonstrate the effectiveness of our approach at deriving learning algorithms by analyzing methods for bipartite matching, page migration, ski-rental, and job scheduling.
arXiv Detail & Related papers (2022-02-18T17:25:43Z) - Predictive machine learning for prescriptive applications: a coupled
training-validating approach [77.34726150561087]
We propose a new method for training predictive machine learning models for prescriptive applications.
This approach is based on tweaking the validation step in the standard training-validating-testing scheme.
Several experiments with synthetic data demonstrate promising results in reducing the prescription costs in both deterministic and real models.
arXiv Detail & Related papers (2021-10-22T15:03:20Z) - Last Layer Marginal Likelihood for Invariance Learning [12.00078928875924]
We introduce a new lower bound to the marginal likelihood, which allows us to perform inference for a larger class of likelihood functions.
We work towards bringing this approach to neural networks by using an architecture with a Gaussian process in the last layer.
arXiv Detail & Related papers (2021-06-14T15:40:51Z) - A Probabilistically Motivated Learning Rate Adaptation for Stochastic
Optimization [20.77923050735746]
We provide a probabilistic motivation, in terms of Gaussian inference, for popular first-order methods.
The inference allows us to relate the learning rate to a dimensionless quantity that can be automatically adapted during training.
The resulting meta-algorithm is shown to adapt learning rates in a robust manner across a large range of initial values.
arXiv Detail & Related papers (2021-02-22T10:26:31Z) - Cross-Validation and Uncertainty Determination for Randomized Neural
Networks with Applications to Mobile Sensors [0.0]
Extreme learning machines provide an attractive and efficient method for supervised learning under limited computing ressources and green machine learning.
Results are discussed about supervised learning with such networks and regression methods in terms of consistency and bounds for the generalization and prediction error.
arXiv Detail & Related papers (2021-01-06T12:28:06Z) - Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector.
We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.