Statistical inference using machine learning and classical techniques
based on accumulated local effects (ALE)
- URL: http://arxiv.org/abs/2310.09877v4
- Date: Tue, 13 Feb 2024 09:38:50 GMT
- Title: Statistical inference using machine learning and classical techniques
based on accumulated local effects (ALE)
- Authors: Chitu Okoli
- Abstract summary: Accumulated Local Effects (ALE) is a model-agnostic approach for global explanations of machine learning algorithms.
There are at least three challenges with conducting statistical inference based on ALE.
We introduce innovative tools and techniques for statistical inference using ALE.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accumulated Local Effects (ALE) is a model-agnostic approach for global
explanations of the results of black-box machine learning (ML) algorithms.
There are at least three challenges with conducting statistical inference based
on ALE: ensuring the reliability of ALE analyses, especially in the context of
small datasets; intuitively characterizing a variable's overall effect in ML;
and making robust inferences from ML data analysis. In response, we introduce
innovative tools and techniques for statistical inference using ALE,
establishing bootstrapped confidence intervals tailored to dataset size and
introducing ALE effect size measures that intuitively indicate effects on both
the outcome variable scale and a normalized scale. Furthermore, we demonstrate
how to use these tools to draw reliable statistical inferences, reflecting the
flexible patterns ALE adeptly highlights, with implementations available in the
'ale' package in R. This work propels the discourse on ALE and its
applicability in ML and statistical analysis forward, offering practical
solutions to prevailing challenges in the field.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - Impact of Missing Values in Machine Learning: A Comprehensive Analysis [0.0]
This paper aims to examine the nuanced impact of missing values on machine learning (ML) models.
Our analysis focuses on the challenges posed by missing values, including biased inferences, reduced predictive power, and increased computational burdens.
The study employs case studies and real-world examples to illustrate the practical implications of addressing missing values.
arXiv Detail & Related papers (2024-10-10T18:31:44Z) - Measuring Variable Importance in Individual Treatment Effect Estimation with High Dimensional Data [35.104681814241104]
Causal machine learning (ML) promises to provide powerful tools for estimating individual treatment effects.
ML methods still face the significant challenge of interpretability, which is crucial for medical applications.
We propose a new algorithm based on the Conditional Permutation Importance (CPI) method for statistically rigorous variable importance assessment.
arXiv Detail & Related papers (2024-08-23T11:44:07Z) - Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - Task-Agnostic Machine-Learning-Assisted Inference [0.0]
We introduce a novel statistical framework named PSPS for task-agnostic ML-assisted inference.
PSPS provides a post-prediction inference solution that can be easily plugged into almost any established data analysis routines.
arXiv Detail & Related papers (2024-05-30T13:19:49Z) - Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning [50.84938730450622]
We propose a trajectory-based method TV score, which uses trajectory volatility for OOD detection in mathematical reasoning.
Our method outperforms all traditional algorithms on GLMs under mathematical reasoning scenarios.
Our method can be extended to more applications with high-density features in output spaces, such as multiple-choice questions.
arXiv Detail & Related papers (2024-05-22T22:22:25Z) - DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning [75.68193159293425]
In-context learning (ICL) allows transformer-based language models to learn a specific task with a few "task demonstrations" without updating their parameters.
We propose an influence function-based attribution technique, DETAIL, that addresses the specific characteristics of ICL.
We experimentally prove the wide applicability of DETAIL by showing our attribution scores obtained on white-box models are transferable to black-box models in improving model performance.
arXiv Detail & Related papers (2024-05-22T15:52:52Z) - A hypothesis-driven method based on machine learning for neuroimaging
data analysis [0.0]
Machine learning approaches for discrimination of spatial patterns of brain images have limited their operation to feature extraction and linear classification tasks.
We show that the estimation of the conventional General linear Model (GLM) has been connected to an univariate classification task.
We derive a refined statistical test with the GLM based on the parameters obtained by a linear Support Vector Regression (SVR) in the emphinverse problem (SVR-iGLM)
Using real data from a multisite initiative the proposed MLE-based inference demonstrates statistical power and the control of false positives, outperforming the regular G
arXiv Detail & Related papers (2022-02-09T11:13:02Z) - Differential privacy and robust statistics in high dimensions [49.50869296871643]
High-dimensional Propose-Test-Release (HPTR) builds upon three crucial components: the exponential mechanism, robust statistics, and the Propose-Test-Release mechanism.
We show that HPTR nearly achieves the optimal sample complexity under several scenarios studied in the literature.
arXiv Detail & Related papers (2021-11-12T06:36:40Z) - Estimating Structural Target Functions using Machine Learning and
Influence Functions [103.47897241856603]
We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models.
This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics.
We put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information.
arXiv Detail & Related papers (2020-08-14T16:48:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.