Related papers: An algorithm-based multiple detection influence measure for high dimensional regression using expectile

An algorithm-based multiple detection influence measure for high dimensional regression using expectile

URL: http://arxiv.org/abs/2105.12286v1
Date: Wed, 26 May 2021 01:16:24 GMT
Title: An algorithm-based multiple detection influence measure for high dimensional regression using expectile
Authors: Amadou Barry, Nikhil Bhagwat, Bratislav Misic, Jean-Baptiste Poline and Celia M. T. Greenwood
Abstract summary: We propose an algorithm-based, multi-step, multiple detection procedure to identify influential observations. Our three-step algorithm to identify and capture undesirable variability in the data, $asymMIP,$ is based on two complementary statistics. The application of our method to the Autism Brain Imaging Data Exchange dataset resulted in a more balanced and accurate prediction of brain maturity.
Score: 0.4999814847776096
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The identification of influential observations is an important part of data analysis that can prevent erroneous conclusions drawn from biased estimators. However, in high dimensional data, this identification is challenging. Classical and recently-developed methods often perform poorly when there are multiple influential observations in the same dataset. In particular, current methods can fail when there is masking several influential observations with similar characteristics, or swamping when the influential observations are near the boundary of the space spanned by well-behaved observations. Therefore, we propose an algorithm-based, multi-step, multiple detection procedure to identify influential observations that addresses current limitations. Our three-step algorithm to identify and capture undesirable variability in the data, $\asymMIP,$ is based on two complementary statistics, inspired by asymmetric correlations, and built on expectiles. Simulations demonstrate higher detection power than competing methods. Use of the resulting asymptotic distribution leads to detection of influential observations without the need for computationally demanding procedures such as the bootstrap. The application of our method to the Autism Brain Imaging Data Exchange neuroimaging dataset resulted in a more balanced and accurate prediction of brain maturity based on cortical thickness. See our GitHub for a free R package that implements our algorithm: \texttt{asymMIP} (\url{github.com/AmBarry/hidetify}).

Related papers

Understanding Data Influence with Differential Approximation [63.817689230826595]
We introduce a new formulation to approximate a sample's influence by accumulating the differences in influence between consecutive learning steps, which we term Diff-In.<n>By employing second-order approximations, we approximate these difference terms with high accuracy while eliminating the need for model convexity required by existing methods.<n>Our theoretical analysis demonstrates that Diff-In achieves significantly lower approximation error compared to existing influence estimators.
arXiv Detail & Related papers (2025-08-20T11:59:32Z)
Estimating Causal Effects from Learned Causal Networks [56.14597641617531]
We propose an alternative paradigm for answering causal-effect queries over discrete observable variables. We learn the causal Bayesian network and its confounding latent variables directly from the observational data. We show that this emphmodel completion learning approach can be more effective than estimand approaches.
arXiv Detail & Related papers (2024-08-26T08:39:09Z)
Approximating Counterfactual Bounds while Fusing Observational, Biased and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies. We show that the likelihood of the available data has no local maxima. We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z)
Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders [16.193776814471768]
We study robust policy evaluation and policy optimization in the presence of sequentially-exogenous unobserved confounders. We provide sample complexity bounds, insights, and show effectiveness both in simulations and on real-world longitudinal healthcare data of treating sepsis.
arXiv Detail & Related papers (2023-02-01T18:40:53Z)
Learning to Bound Counterfactual Inference in Structural Causal Models from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm. The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources. It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z)
MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models [78.72682320019737]
We develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations. MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization framework. We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
arXiv Detail & Related papers (2022-05-27T09:59:46Z)
A Two-Block RNN-based Trajectory Prediction from Incomplete Trajectory [14.725386295605666]
We introduce a two-block RNN model that approximates the inference steps of the Bayesian filtering framework. We show that the proposed model improves the prediction accuracy compared to the three baseline imputation methods. We also show that our proposed method can achieve better prediction compared to the baselines when there is no miss-detection.
arXiv Detail & Related papers (2022-03-14T13:39:44Z)
Combining Observational and Randomized Data for Estimating Heterogeneous Treatment Effects [82.20189909620899]
Estimating heterogeneous treatment effects is an important problem across many domains. Currently, most existing works rely exclusively on observational data. We propose to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data.
arXiv Detail & Related papers (2022-02-25T18:59:54Z)
Coherent False Seizure Prediction in Epilepsy, Coincidence or Providence? [0.2770822269241973]
Seizure forecasting using machine learning is possible, but the performance is far from ideal. Here, we examine false and missing alarms of two algorithms on long-term datasets.
arXiv Detail & Related papers (2021-10-26T10:25:14Z)
Graph Neural Network-Based Anomaly Detection in Multivariate Time Series [17.414474298706416]
We develop a new way to detect anomalies in high-dimensional time series data. Our approach combines a structure learning approach with graph neural networks. We show that our method detects anomalies more accurately than baseline approaches.
arXiv Detail & Related papers (2021-06-13T09:07:30Z)
Interpretable Anomaly Detection with Mondrian P{\'o}lya Forests on Data Streams [6.177270420667713]
Anomaly detection at scale is an extremely challenging problem of great practicality. Recent work has coalesced on variations of (random) $k$emphd-trees to summarise data for anomaly detection. These methods rely on ad-hoc score functions that are not easy to interpret. We contextualise these methods in a probabilistic framework which we call the Mondrian Polya Forest.
arXiv Detail & Related papers (2020-08-04T13:19:07Z)
A Robust Functional EM Algorithm for Incomplete Panel Count Data [66.07942227228014]
We propose a functional EM algorithm to estimate the counting process mean function under a missing completely at random assumption (MCAR) The proposed algorithm wraps several popular panel count inference methods, seamlessly deals with incomplete counts and is robust to misspecification of the Poisson process assumption. We illustrate the utility of the proposed algorithm through numerical experiments and an analysis of smoking cessation data.
arXiv Detail & Related papers (2020-03-02T20:04:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.