Related papers: Double Machine Learning for Partially Linear Mixed-Effects Models with Repeated Measurements

Double Machine Learning for Partially Linear Mixed-Effects Models with Repeated Measurements

URL: http://arxiv.org/abs/2108.13657v1
Date: Tue, 31 Aug 2021 07:41:36 GMT
Title: Double Machine Learning for Partially Linear Mixed-Effects Models with Repeated Measurements
Authors: Corinne Emmenegger and Peter B\"uhlmann
Abstract summary: We use machine learning algorithms to incorporate more complex interaction structures and high-dimensional variables. The adjusted variables satisfy a linear mixed-effects model, where the linear coefficient can be estimated with standard linear mixed-effects techniques.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Traditionally, spline or kernel approaches in combination with parametric estimation are used to infer the linear coefficient (fixed effects) in a partially linear mixed-effects model (PLMM) for repeated measurements. Using machine learning algorithms allows us to incorporate more complex interaction structures and high-dimensional variables. We employ double machine learning to cope with the nonparametric part of the PLMM: the nonlinear variables are regressed out nonparametrically from both the linear variables and the response. This adjustment can be performed with any machine learning algorithm, for instance random forests. The adjusted variables satisfy a linear mixed-effects model, where the linear coefficient can be estimated with standard linear mixed-effects techniques. We prove that the estimated fixed effects coefficient converges at the parametric rate and is asymptotically Gaussian distributed and semiparametrically efficient. Empirical examples demonstrate our proposed algorithm. We present two simulation studies and analyze a dataset with repeated CD4 cell counts from HIV patients. Software code for our method is available in the R-package dmlalg.

Related papers

Disordered Dynamics in High Dimensions: Connections to Random Matrices and Machine Learning [52.26396748560348]
We provide an overview of high dimensional dynamical systems driven by random matrices.<n>We focus on applications to simple models of learning and generalization in machine learning theory.
arXiv Detail & Related papers (2026-01-03T00:12:32Z)
xtdml: Double Machine Learning Estimation to Static Panel Data Models with Fixed Effects in R [0.0]
The paper presents the R package xtdml, which implements DML methods for partially linear panel regression models.<n>The package provides functionalities to: (a) learn nuisance functions with machine learning algorithms from the mlr3 ecosystem.<n>We showcase the use of xtdml with both simulated and real longitudinal data.
arXiv Detail & Related papers (2025-12-17T20:48:40Z)
Explicit Discovery of Nonlinear Symmetries from Dynamic Data [50.20526548924647]
LieNLSD is the first method capable of determining the number of infinitesimal generators with nonlinear terms and their explicit expressions.<n>LieNLSD shows qualitative advantages over existing methods and improves the long rollout accuracy of neural PDE solvers by over 20%.
arXiv Detail & Related papers (2025-10-02T09:54:08Z)
Semiparametric inference for impulse response functions using double/debiased machine learning [49.1574468325115]
We introduce a machine learning estimator for the impulse response function (IRF) in settings where a time series of interest is subjected to multiple discrete treatments. The proposed estimator can rely on fully nonparametric relations between treatment and outcome variables, opening up the possibility to use flexible machine learning approaches to estimate IRFs.
arXiv Detail & Related papers (2024-11-15T07:42:02Z)
Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data. This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z)
Agnostic Learning of Mixed Linear Regressions with EM and AM Algorithms [22.79595679373698]
Mixed linear regression is a well-studied problem in statistics and machine learning. In this paper, we consider the more general problem of learning of mixed linear regression from samples. We show that the AM and EM algorithms lead to learning in mixed linear regression by converging to the population loss minimizers.
arXiv Detail & Related papers (2024-06-03T09:43:24Z)
Gradient-based bilevel optimization for multi-penalty Ridge regression through matrix differential calculus [0.46040036610482665]
We introduce a gradient-based approach to the problem of linear regression with l2-regularization. We show that our approach outperforms LASSO, Ridge, and Elastic Net regression. The analytical of the gradient proves to be more efficient in terms of computational time compared to automatic differentiation.
arXiv Detail & Related papers (2023-11-23T20:03:51Z)
Efficient Interpretable Nonlinear Modeling for Multiple Time Series [5.448070998907116]
This paper proposes an efficient nonlinear modeling approach for multiple time series. It incorporates nonlinear interactions among different time-series variables. Experimental results show that the proposed algorithm improves the identification of the support of the VAR coefficients in a parsimonious manner.
arXiv Detail & Related papers (2023-09-29T11:42:59Z)
Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates. The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z)
Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression. It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise. This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
Asymptotic Analysis of an Ensemble of Randomly Projected Linear Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets. We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator. We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.