Tradeoffs of Linear Mixed Models in Genome-wide Association Studies
- URL: http://arxiv.org/abs/2111.03739v1
- Date: Fri, 5 Nov 2021 22:05:59 GMT
- Title: Tradeoffs of Linear Mixed Models in Genome-wide Association Studies
- Authors: Haohan Wang, Bryon Aragam, Eric Xing
- Abstract summary: We study the statistical properties of linear mixed models (LMMs) applied to genome-wide association studies (GWAS)
First, we study the sensitivity of LMMs to the inclusion of a candidate SNP in the kinship matrix, which is often done in practice to speed up computations.
Second, we investigate how mixed models can correct confounders in GWAS, which is widely accepted as an advantage of LMMs over traditional methods.
- Score: 18.560273425572582
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Motivated by empirical arguments that are well-known from the genome-wide
association studies (GWAS) literature, we study the statistical properties of
linear mixed models (LMMs) applied to GWAS. First, we study the sensitivity of
LMMs to the inclusion of a candidate SNP in the kinship matrix, which is often
done in practice to speed up computations. Our results shed light on the size
of the error incurred by including a candidate SNP, providing a justification
to this technique in order to trade-off velocity against veracity. Second, we
investigate how mixed models can correct confounders in GWAS, which is widely
accepted as an advantage of LMMs over traditional methods. We consider two
sources of confounding factors, population stratification and environmental
confounding factors, and study how different methods that are commonly used in
practice trade-off these two confounding factors differently.
Related papers
- Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models [58.58594658683919]
Large multimodal models (LMMs) have shown transformative potential across various research tasks.
Our findings indicate LMMs possess advantages in zero-shot learning, interpretability, and handling uncurated 'in-the-wild' inputs.
We propose a Chain-of-Thought augmented prompting approach, which effectively mitigates the off-target prediction issue.
arXiv Detail & Related papers (2024-05-24T16:26:56Z) - Evaluating tree-based imputation methods as an alternative to MICE PMM
for drawing inference in empirical studies [0.5892638927736115]
Dealing with missing data is an important problem in statistical analysis that is often addressed with imputation procedures.
The prevailing method of Multiple Imputation by Chained Equations with Predictive Mean Matching (PMM) is considered standard in the social science literature.
In particular, tree-based imputation methods have emerged as very competitive approaches.
arXiv Detail & Related papers (2024-01-17T21:28:00Z) - Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z) - Granger Causal Inference in Multivariate Hawkes Processes by Minimum Message Length [0.0]
We propose an optimization criterion and model selection algorithm based on the minimum message length (MML) principle.
While most of the state-of-art methods using lasso-type penalization tend to overfitting in scenarios with short time horizons, the proposed MML-based method achieves high F1 scores in these settings.
arXiv Detail & Related papers (2023-09-05T08:13:34Z) - Reweighted Mixup for Subpopulation Shift [63.1315456651771]
Subpopulation shift exists in many real-world applications, which refers to the training and test distributions that contain the same subpopulation groups but with different subpopulation proportions.
Importance reweighting is a classical and effective way to handle the subpopulation shift.
We propose a simple yet practical framework, called reweighted mixup, to mitigate the overfitting issue.
arXiv Detail & Related papers (2023-04-09T03:44:50Z) - What Makes Forest-Based Heterogeneous Treatment Effect Estimators Work? [1.1050303097572156]
We show that both methods can be understood in terms of the same parameters and confounding assumptions under L2 loss.
In the randomized setting, both approaches performed akin to the new blended versions in a benchmark study.
arXiv Detail & Related papers (2022-06-21T12:45:07Z) - Fast approximations of the Jeffreys divergence between univariate
Gaussian mixture models via exponential polynomial densities [16.069404547401373]
The Jeffreys divergence is a renown symmetrization of the statistical Kullback-Leibler which is often used in machine learning, signal processing, and information sciences.
We propose a simple yet fastarine to approximate the Jeffreys divergence between two GMMs of arbitrary number of components.
arXiv Detail & Related papers (2021-07-13T07:58:01Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Learning Gaussian Graphical Models with Latent Confounders [74.72998362041088]
We compare and contrast two strategies for inference in graphical models with latent confounders.
While these two approaches have similar goals, they are motivated by different assumptions about confounding.
We propose a new method, which combines the strengths of these two approaches.
arXiv Detail & Related papers (2021-05-14T00:53:03Z) - Estimating Linear Mixed Effects Models with Truncated Normally
Distributed Random Effects [5.4052819252055055]
Inference can be conducted using maximum likelihood approach if assuming Normal distributions on the random effects.
In this paper we extend the classical (unconstrained) LME models to allow for sign constraints on its overall coefficients.
arXiv Detail & Related papers (2020-11-09T16:17:35Z) - Learning Causal Semantic Representation for Out-of-Distribution
Prediction [125.38836464226092]
We propose a Causal Semantic Generative model (CSG) based on a causal reasoning so that the two factors are modeled separately.
We show that CSG can identify the semantic factor by fitting training data, and this semantic-identification guarantees the boundedness of OOD generalization error.
arXiv Detail & Related papers (2020-11-03T13:16:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.