Task-Agnostic Machine Learning-Assisted Inference
- URL: http://arxiv.org/abs/2405.20039v1
- Date: Thu, 30 May 2024 13:19:49 GMT
- Title: Task-Agnostic Machine Learning-Assisted Inference
- Authors: Jiacheng Miao, Qiongshi Lu,
- Abstract summary: We propose a novel statistical framework for task-agnostic ML-assisted inference.
It delivers valid and efficient inference that is robust to arbitrary choices of ML models.
We showcase the validity, versatility, and superiority of our method compared to existing approaches.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning (ML) is playing an increasingly important role in scientific research. In conjunction with classical statistical approaches, ML-assisted analytical strategies have shown great promise in accelerating research findings. This has also opened up a whole new field of methodological research focusing on integrative approaches that leverage both ML and statistics to tackle data science challenges. One type of study that has quickly gained popularity employs ML to predict unobserved outcomes in massive samples and then uses the predicted outcomes in downstream statistical inference. However, existing methods designed to ensure the validity of this type of post-prediction inference are limited to very basic tasks such as linear regression analysis. This is because any extension of these approaches to new, more sophisticated statistical tasks requires task-specific algebraic derivations and software implementations, which ignores the massive library of existing software tools already developed for complex inference tasks and severely constrains the scope of post-prediction inference in real applications. To address this challenge, we propose a novel statistical framework for task-agnostic ML-assisted inference. It provides a post-prediction inference solution that can be easily plugged into almost any established data analysis routine. It delivers valid and efficient inference that is robust to arbitrary choices of ML models, while allowing nearly all existing analytical frameworks to be incorporated into the analysis of ML-predicted outcomes. Through extensive experiments, we showcase the validity, versatility, and superiority of our method compared to existing approaches.
Related papers
- Leveraging Machine Learning for Official Statistics: A Statistical Manifesto [0.0]
It is important for official statistics production to apply machine learning with statistical rigor.
The Total Machine Learning Error (TMLE) is presented as a framework analogous to the Total Survey Error Model used in survey methodology.
arXiv Detail & Related papers (2024-09-06T15:57:25Z) - Measuring Variable Importance in Individual Treatment Effect Estimation with High Dimensional Data [35.104681814241104]
Causal machine learning (ML) promises to provide powerful tools for estimating individual treatment effects.
ML methods still face the significant challenge of interpretability, which is crucial for medical applications.
We propose a new algorithm based on the Conditional Permutation Importance (CPI) method for statistically rigorous variable importance assessment.
arXiv Detail & Related papers (2024-08-23T11:44:07Z) - Statistical Agnostic Regression: a machine learning method to validate regression models [0.0]
We introduce a method, named Statistical Agnostic Regression (SAR), for evaluating the statistical significance of an ML-based linear regression.
We define a threshold to establish that there is sufficient evidence with a probability of at least 1-eta to conclude that there is a linear relationship in the population between the explanatory (feature) and the response (label) variables.
arXiv Detail & Related papers (2024-02-23T09:19:26Z) - Assumption-Lean and Data-Adaptive Post-Prediction Inference [1.5050365268347254]
We introduce PoSt-Prediction Adaptive inference (PSPA) that allows valid and powerful inference based on ML-predicted data.
We demonstrate the statistical superiority and broad applicability of our method through simulations and real-data applications.
arXiv Detail & Related papers (2023-11-23T22:41:30Z) - Statistical inference using machine learning and classical techniques
based on accumulated local effects (ALE) [0.0]
Accumulated Local Effects (ALE) is a model-agnostic approach for global explanations of machine learning algorithms.
There are at least three challenges with conducting statistical inference based on ALE.
We introduce innovative tools and techniques for statistical inference using ALE.
arXiv Detail & Related papers (2023-10-15T16:17:21Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Hessian-based toolbox for reliable and interpretable machine learning in
physics [58.720142291102135]
We present a toolbox for interpretability and reliability, extrapolation of the model architecture.
It provides a notion of the influence of the input data on the prediction at a given test point, an estimation of the uncertainty of the model predictions, and an agnostic score for the model predictions.
Our work opens the road to the systematic use of interpretability and reliability methods in ML applied to physics and, more generally, science.
arXiv Detail & Related papers (2021-08-04T16:32:59Z) - MAML is a Noisy Contrastive Learner [72.04430033118426]
Model-agnostic meta-learning (MAML) is one of the most popular and widely-adopted meta-learning algorithms nowadays.
We provide a new perspective to the working mechanism of MAML and discover that: MAML is analogous to a meta-learner using a supervised contrastive objective function.
We propose a simple but effective technique, zeroing trick, to alleviate such interference.
arXiv Detail & Related papers (2021-06-29T12:52:26Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Estimating Structural Target Functions using Machine Learning and
Influence Functions [103.47897241856603]
We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models.
This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics.
We put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information.
arXiv Detail & Related papers (2020-08-14T16:48:29Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.