Subset selection for linear mixed models
- URL: http://arxiv.org/abs/2107.12890v1
- Date: Tue, 27 Jul 2021 15:47:44 GMT
- Title: Subset selection for linear mixed models
- Authors: Daniel R. Kowal
- Abstract summary: Linear mixed models (LMMs) are instrumental for regression analysis with structured dependence.
We introduce a Bayesian decision analysis for subset selection with LMMs.
These tools are applied to simulated data and a longitudinal physical activity dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Linear mixed models (LMMs) are instrumental for regression analysis with
structured dependence, such as grouped, clustered, or multilevel data. However,
selection among the covariates--while accounting for this structured
dependence--remains a challenge. We introduce a Bayesian decision analysis for
subset selection with LMMs. Using a Mahalanobis loss function that incorporates
the structured dependence, we derive optimal linear actions for any subset of
covariates and under any Bayesian LMM. Crucially, these actions inherit
shrinkage or regularization and uncertainty quantification from the underlying
Bayesian LMM. Rather than selecting a single "best" subset, which is often
unstable and limited in its information content, we collect the acceptable
family of subsets that nearly match the predictive ability of the "best"
subset. The acceptable family is summarized by its smallest member and key
variable importance metrics. Customized subset search and out-of-sample
approximation algorithms are provided for more scalable computing. These tools
are applied to simulated data and a longitudinal physical activity dataset, and
in both cases demonstrate excellent prediction, estimation, and selection
ability.
Related papers
- An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Agnostic Learning of Mixed Linear Regressions with EM and AM Algorithms [22.79595679373698]
Mixed linear regression is a well-studied problem in statistics and machine learning.
In this paper, we consider the more general problem of learning of mixed linear regression from samples.
We show that the AM and EM algorithms lead to learning in mixed linear regression by converging to the population loss minimizers.
arXiv Detail & Related papers (2024-06-03T09:43:24Z) - Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data.
One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is.
This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - A Consistent and Scalable Algorithm for Best Subset Selection in Single
Index Models [1.3236116985407258]
Best subset selection in high-dimensional models is known to be computationally intractable.
We propose the first provably scalable algorithm for best subset selection in high-dimensional SIMs.
Our algorithm enjoys the subset selection consistency and has the oracle property with a high probability.
arXiv Detail & Related papers (2023-09-12T13:48:06Z) - Feature Selection via the Intervened Interpolative Decomposition and its
Application in Diversifying Quantitative Strategies [4.913248451323163]
We propose a probabilistic model for computing an interpolative decomposition (ID) in which each column of the observed matrix has its own priority or importance.
We evaluate the proposed models on real-world datasets, including ten Chinese A-share stocks.
arXiv Detail & Related papers (2022-09-29T03:36:56Z) - Efficient and Near-Optimal Smoothed Online Learning for Generalized
Linear Functions [28.30744223973527]
We give a computationally efficient algorithm that is the first to enjoy the statistically optimal log(T/sigma) regret for realizable K-wise linear classification.
We develop a novel characterization of the geometry of the disagreement region induced by generalized linear classifiers.
arXiv Detail & Related papers (2022-05-25T21:31:36Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Joint Adaptive Graph and Structured Sparsity Regularization for
Unsupervised Feature Selection [6.41804410246642]
We propose a joint adaptive graph and structured sparsity regularization unsupervised feature selection (JASFS) method.
A subset of optimal features will be selected in group, and the number of selected features will be determined automatically.
Experimental results on eight benchmarks demonstrate the effectiveness and efficiency of the proposed method.
arXiv Detail & Related papers (2020-10-09T08:17:04Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.