Practical considerations for variable screening in the Super Learner
- URL: http://arxiv.org/abs/2311.03313v1
- Date: Mon, 6 Nov 2023 18:04:39 GMT
- Title: Practical considerations for variable screening in the Super Learner
- Authors: Brian D. Williamson, Drew King, Ying Huang
- Abstract summary: The Super Learner ensemble has desirable theoretical properties and has been used successfully in many applications.
Dimension reduction can be accomplished by using variable screening algorithms, including the lasso, within the ensemble prior to fitting other prediction algorithms.
We provide empirical results that suggest that a diverse set of candidate screening algorithms should be used to protect against poor performance of any one screen.
- Score: 2.9337734440124232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating a prediction function is a fundamental component of many data
analyses. The Super Learner ensemble, a particular implementation of stacking,
has desirable theoretical properties and has been used successfully in many
applications. Dimension reduction can be accomplished by using variable
screening algorithms, including the lasso, within the ensemble prior to fitting
other prediction algorithms. However, the performance of a Super Learner using
the lasso for dimension reduction has not been fully explored in cases where
the lasso is known to perform poorly. We provide empirical results that suggest
that a diverse set of candidate screening algorithms should be used to protect
against poor performance of any one screen, similar to the guidance for
choosing a library of prediction algorithms for the Super Learner.
Related papers
- A General Online Algorithm for Optimizing Complex Performance Metrics [5.726378955570775]
We introduce and analyze a general online algorithm that can be used in a straightforward way with a variety of complex performance metrics in binary, multi-class, and multi-label classification problems.
The algorithm's update and prediction rules are appealingly simple and computationally efficient without the need to store any past data.
arXiv Detail & Related papers (2024-06-20T21:24:47Z) - Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a new framework based on large language models (LLMs) and decision Tree reasoning (OCTree)
Our key idea is to leverage LLMs' reasoning capabilities to find good feature generation rules without manually specifying the search space.
Our empirical results demonstrate that this simple framework consistently enhances the performance of various prediction models.
arXiv Detail & Related papers (2024-06-12T08:31:34Z) - Learning-Augmented Algorithms with Explicit Predictors [67.02156211760415]
Recent advances in algorithmic design show how to utilize predictions obtained by machine learning models from past and present data.
Prior research in this context was focused on a paradigm where the predictor is pre-trained on past data and then used as a black box.
In this work, we unpack the predictor and integrate the learning problem it gives rise for within the algorithmic challenge.
arXiv Detail & Related papers (2024-03-12T08:40:21Z) - Faster Discrete Convex Function Minimization with Predictions: The
M-Convex Case [15.191184049312467]
Methods can improve time upon the best results by using predictions and even have potential to go beyond a lower-than-expected result.
Our framework is particularly effective for an important called the lamina minimization convex, which appears in research applications.
arXiv Detail & Related papers (2023-06-09T12:58:47Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Algorithms with Prediction Portfolios [23.703372221079306]
We study the use of multiple predictors for a number of fundamental problems, including matching, load balancing, and non-clairvoyant scheduling.
For each of these problems we introduce new algorithms that take advantage of multiple predictors, and prove bounds on the resulting performance.
arXiv Detail & Related papers (2022-10-22T12:58:07Z) - Efficient and Differentiable Conformal Prediction with General Function
Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters.
We show that it achieves approximate valid population coverage and near-optimal efficiency within class.
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z) - Non-Clairvoyant Scheduling with Predictions Revisited [77.86290991564829]
In non-clairvoyant scheduling, the task is to find an online strategy for scheduling jobs with a priori unknown processing requirements.
We revisit this well-studied problem in a recently popular learning-augmented setting that integrates (untrusted) predictions in algorithm design.
We show that these predictions have desired properties, admit a natural error measure as well as algorithms with strong performance guarantees.
arXiv Detail & Related papers (2022-02-21T13:18:11Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Towards Feature-Based Performance Regression Using Trajectory Data [0.9281671380673306]
Black-box optimization is a very active area of research, with many new algorithms being developed every year.
The variety of algorithms poses a meta-problem: which algorithm to choose for a given problem at hand?
Past research has shown that per-instance algorithm selection based on exploratory landscape analysis can be an efficient mean to tackle this meta-problem.
arXiv Detail & Related papers (2021-02-10T10:19:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.