Leveraging Machine Learning for Official Statistics: A Statistical Manifesto
- URL: http://arxiv.org/abs/2409.04365v1
- Date: Fri, 6 Sep 2024 15:57:25 GMT
- Title: Leveraging Machine Learning for Official Statistics: A Statistical Manifesto
- Authors: Marco Puts, David Salgado, Piet Daas,
- Abstract summary: It is important for official statistics production to apply machine learning with statistical rigor.
The Total Machine Learning Error (TMLE) is presented as a framework analogous to the Total Survey Error Model used in survey methodology.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is important for official statistics production to apply ML with statistical rigor, as it presents both opportunities and challenges. Although machine learning has enjoyed rapid technological advances in recent years, its application does not possess the methodological robustness necessary to produce high quality statistical results. In order to account for all sources of error in machine learning models, the Total Machine Learning Error (TMLE) is presented as a framework analogous to the Total Survey Error Model used in survey methodology. As a means of ensuring that ML models are both internally valid as well as externally valid, the TMLE model addresses issues such as representativeness and measurement errors. There are several case studies presented, illustrating the importance of applying more rigor to the application of machine learning in official statistics.
Related papers
- Towards the Best Solution for Complex System Reliability: Can Statistics Outperform Machine Learning? [39.58317527488534]
This study compares the effectiveness of classical statistical techniques and machine learning methods for improving reliability assessments.
We aim to demonstrate that classical statistical algorithms often yield more precise and interpretable results than black-box machine learning approaches.
arXiv Detail & Related papers (2024-10-05T17:31:18Z) - Task-Agnostic Machine-Learning-Assisted Inference [0.0]
We introduce a novel statistical framework named PSPS for task-agnostic ML-assisted inference.
PSPS provides a post-prediction inference solution that can be easily plugged into almost any established data analysis routines.
arXiv Detail & Related papers (2024-05-30T13:19:49Z) - Statistical inference using machine learning and classical techniques
based on accumulated local effects (ALE) [0.0]
Accumulated Local Effects (ALE) is a model-agnostic approach for global explanations of machine learning algorithms.
There are at least three challenges with conducting statistical inference based on ALE.
We introduce innovative tools and techniques for statistical inference using ALE.
arXiv Detail & Related papers (2023-10-15T16:17:21Z) - Measuring Causal Effects of Data Statistics on Language Model's
`Factual' Predictions [59.284907093349425]
Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models.
We provide a language for describing how training data influences predictions, through a causal framework.
Our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone.
arXiv Detail & Related papers (2022-07-28T17:36:24Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Statistical inference for individual fairness [24.622418924551315]
We focus on the problem of detecting violations of individual fairness in machine learning models.
We develop a suite of inference tools for the adversarial cost function.
We demonstrate the utility of our tools in a real-world case study.
arXiv Detail & Related papers (2021-03-30T22:49:25Z) - A Note on High-Probability versus In-Expectation Guarantees of
Generalization Bounds in Machine Learning [95.48744259567837]
Statistical machine learning theory often tries to give generalization guarantees of machine learning models.
Statements made about the performance of machine learning models have to take the sampling process into account.
We show how one may transform one statement to another.
arXiv Detail & Related papers (2020-10-06T09:41:35Z) - Estimating Structural Target Functions using Machine Learning and
Influence Functions [103.47897241856603]
We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models.
This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics.
We put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information.
arXiv Detail & Related papers (2020-08-14T16:48:29Z) - Transfer Learning without Knowing: Reprogramming Black-box Machine
Learning Models with Scarce Data and Limited Resources [78.72922528736011]
We propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box machine learning model.
Using zeroth order optimization and multi-label mapping techniques, BAR can reprogram a black-box ML model solely based on its input-output responses.
BAR outperforms state-of-the-art methods and yields comparable performance to the vanilla adversarial reprogramming method.
arXiv Detail & Related papers (2020-07-17T01:52:34Z) - Insights into Performance Fitness and Error Metrics for Machine Learning [1.827510863075184]
Machine learning (ML) is the field of training machines to achieve high level of cognition and perform human-like analysis.
This paper examines a number of the most commonly-used performance fitness and error metrics for regression and classification algorithms.
arXiv Detail & Related papers (2020-05-17T22:59:04Z) - Towards CRISP-ML(Q): A Machine Learning Process Model with Quality
Assurance Methodology [53.063411515511056]
We propose a process model for the development of machine learning applications.
The first phase combines business and data understanding as data availability oftentimes affects the feasibility of the project.
The sixth phase covers state-of-the-art approaches for monitoring and maintenance of a machine learning applications.
arXiv Detail & Related papers (2020-03-11T08:25:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.