Related papers: DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R

DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R

URL: http://arxiv.org/abs/2103.09603v6
Date: Wed, 5 Jun 2024 12:09:23 GMT
Title: DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R
Authors: Philipp Bach, Victor Chernozhukov, Malte S. Kurz, Martin Spindler, Sven Klaassen,
Abstract summary: R package DoubleML implements the double/debiased machine learning framework. It provides functionalities to estimate parameters in causal models based on machine learning methods.
Score: 4.830430752756141
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The R package DoubleML implements the double/debiased machine learning framework of Chernozhukov et al. (2018). It provides functionalities to estimate parameters in causal models based on machine learning methods. The double machine learning framework consist of three key ingredients: Neyman orthogonality, high-quality machine learning estimation and sample splitting. Estimation of nuisance components can be performed by various state-of-the-art machine learning methods that are available in the mlr3 ecosystem. DoubleML makes it possible to perform inference in a variety of causal models, including partially linear and interactive regression models and their extensions to instrumental variable estimation. The object-oriented implementation of DoubleML enables a high flexibility for the model specification and makes it easily extendable. This paper serves as an introduction to the double machine learning framework and the R package DoubleML. In reproducible code examples with simulated and real data sets, we demonstrate how DoubleML users can perform valid inference based on machine learning methods.

Related papers

An Introduction to Double/Debiased Machine Learning [6.847418094278082]
This paper provides a practical introduction to Double/Debiased Machine Learning (DML) We describe DML and its two essential components: Neymanity and cross-fitting. We illustrate its application through three empirical examples that demonstrate DML's applicability in cross-sectional and panel settings.
arXiv Detail & Related papers (2025-04-11T07:48:42Z)
If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs [55.8331366739144]
We introduce LIFESTATE-BENCH, a benchmark designed to assess lifelong learning in large language models (LLMs) Our fact checking evaluation probes models' self-awareness, episodic memory retrieval, and relationship tracking, across both parametric and non-parametric approaches.
arXiv Detail & Related papers (2025-03-30T16:50:57Z)
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models [88.29990536278167]
We introduce SPaR, a self-play framework integrating tree-search self-refinement to yield valid and comparable preference pairs. Our experiments show that a LLaMA3-8B model, trained over three iterations guided by SPaR, surpasses GPT-4-Turbo on the IFEval benchmark without losing general capabilities.
arXiv Detail & Related papers (2024-12-16T09:47:43Z)
Verbalized Machine Learning: Revisiting Machine Learning with Language Models [63.10391314749408]
We introduce the framework of verbalized machine learning (VML) VML constrains the parameter space to be human-interpretable natural language. We empirically verify the effectiveness of VML, and hope that VML can serve as a stepping stone to stronger interpretability.
arXiv Detail & Related papers (2024-06-06T17:59:56Z)
Hyperparameter Tuning for Causal Inference with Double Machine Learning: A Simulation Study [4.526082390949313]
We empirically assess the relationship between the predictive performance of machine learning methods and the resulting causal estimation. We conduct an extensive simulation study using data from the 2019 Atlantic Causal Inference Conference Data Challenge.
arXiv Detail & Related papers (2024-02-07T09:01:51Z)
Model Averaging and Double Machine Learning [2.6436521007616114]
We show that DDML with stacking is more robust to partially unknown functional forms than common alternative approaches. We provide Stata and R software implementing our proposals.
arXiv Detail & Related papers (2024-01-03T09:38:13Z)
Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners. We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting. Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z)
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM. For learning methods, we explore the claim of a "free lunch" hypothesis. For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z)
ddml: Double/debiased machine learning in Stata [2.8880000014100506]
We introduce the package ddml for Double/Debiased Machine Learning (DDML) in Stata. ddml is compatible with many existing supervised machine learning programs in Stata.
arXiv Detail & Related papers (2023-01-23T12:37:34Z)
Dual Learning for Large Vocabulary On-Device ASR [64.10124092250128]
Dual learning is a paradigm for semi-supervised machine learning that seeks to leverage unsupervised data by solving two opposite tasks at once. We provide an analysis of an on-device-sized streaming conformer trained on the entirety of Librispeech, showing relative WER improvements of 10.7%/5.2% without an LM and 11.7%/16.4% with an LM.
arXiv Detail & Related papers (2023-01-11T06:32:28Z)
Certifiable Machine Unlearning for Linear Models [1.484852576248587]
Machine unlearning is the task of updating machine learning (ML) models after a subset of the training data they were trained on is deleted. We present an experimental study of the three state-of-the-art approximate unlearning methods for linear models.
arXiv Detail & Related papers (2021-06-29T05:05:58Z)
DoubleML -- An Object-Oriented Implementation of Double Machine Learning in Python [1.4911092205861822]
DoubleML is an open-source Python library implementing the double machine learning framework of Chernozhukov et al. It contains functionalities for valid statistical inference on causal parameters when the estimation of parameters is based on machine learning methods. The package is distributed under the MIT license and relies on core libraries from the scientific Python ecosystem.
arXiv Detail & Related papers (2021-04-07T16:16:39Z)
A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions. Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data. Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)
Transfer Learning without Knowing: Reprogramming Black-box Machine Learning Models with Scarce Data and Limited Resources [78.72922528736011]
We propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box machine learning model. Using zeroth order optimization and multi-label mapping techniques, BAR can reprogram a black-box ML model solely based on its input-output responses. BAR outperforms state-of-the-art methods and yields comparable performance to the vanilla adversarial reprogramming method.
arXiv Detail & Related papers (2020-07-17T01:52:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.