An algorithm-based multiple detection influence measure for high
dimensional regression using expectile
- URL: http://arxiv.org/abs/2105.12286v1
- Date: Wed, 26 May 2021 01:16:24 GMT
- Title: An algorithm-based multiple detection influence measure for high
dimensional regression using expectile
- Authors: Amadou Barry, Nikhil Bhagwat, Bratislav Misic, Jean-Baptiste Poline
and Celia M. T. Greenwood
- Abstract summary: We propose an algorithm-based, multi-step, multiple detection procedure to identify influential observations.
Our three-step algorithm to identify and capture undesirable variability in the data, $asymMIP,$ is based on two complementary statistics.
The application of our method to the Autism Brain Imaging Data Exchange dataset resulted in a more balanced and accurate prediction of brain maturity.
- Score: 0.4999814847776096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The identification of influential observations is an important part of data
analysis that can prevent erroneous conclusions drawn from biased estimators.
However, in high dimensional data, this identification is challenging.
Classical and recently-developed methods often perform poorly when there are
multiple influential observations in the same dataset. In particular, current
methods can fail when there is masking several influential observations with
similar characteristics, or swamping when the influential observations are near
the boundary of the space spanned by well-behaved observations. Therefore, we
propose an algorithm-based, multi-step, multiple detection procedure to
identify influential observations that addresses current limitations. Our
three-step algorithm to identify and capture undesirable variability in the
data, $\asymMIP,$ is based on two complementary statistics, inspired by
asymmetric correlations, and built on expectiles. Simulations demonstrate
higher detection power than competing methods. Use of the resulting asymptotic
distribution leads to detection of influential observations without the need
for computationally demanding procedures such as the bootstrap. The application
of our method to the Autism Brain Imaging Data Exchange neuroimaging dataset
resulted in a more balanced and accurate prediction of brain maturity based on
cortical thickness. See our GitHub for a free R package that implements our
algorithm: \texttt{asymMIP} (\url{github.com/AmBarry/hidetify}).
Related papers
- Estimating Causal Effects from Learned Causal Networks [56.14597641617531]
We propose an alternative paradigm for answering causal-effect queries over discrete observable variables.
We learn the causal Bayesian network and its confounding latent variables directly from the observational data.
We show that this emphmodel completion learning approach can be more effective than estimand approaches.
arXiv Detail & Related papers (2024-08-26T08:39:09Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous
Unobserved Confounders [16.193776814471768]
We study robust policy evaluation and policy optimization in the presence of sequentially-exogenous unobserved confounders.
We provide sample complexity bounds, insights, and show effectiveness both in simulations and on real-world longitudinal healthcare data of treating sepsis.
arXiv Detail & Related papers (2023-02-01T18:40:53Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - MissDAG: Causal Discovery in the Presence of Missing Data with
Continuous Additive Noise Models [78.72682320019737]
We develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations.
MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization framework.
We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
arXiv Detail & Related papers (2022-05-27T09:59:46Z) - A Two-Block RNN-based Trajectory Prediction from Incomplete Trajectory [14.725386295605666]
We introduce a two-block RNN model that approximates the inference steps of the Bayesian filtering framework.
We show that the proposed model improves the prediction accuracy compared to the three baseline imputation methods.
We also show that our proposed method can achieve better prediction compared to the baselines when there is no miss-detection.
arXiv Detail & Related papers (2022-03-14T13:39:44Z) - Combining Observational and Randomized Data for Estimating Heterogeneous
Treatment Effects [82.20189909620899]
Estimating heterogeneous treatment effects is an important problem across many domains.
Currently, most existing works rely exclusively on observational data.
We propose to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data.
arXiv Detail & Related papers (2022-02-25T18:59:54Z) - Coherent False Seizure Prediction in Epilepsy, Coincidence or
Providence? [0.2770822269241973]
Seizure forecasting using machine learning is possible, but the performance is far from ideal.
Here, we examine false and missing alarms of two algorithms on long-term datasets.
arXiv Detail & Related papers (2021-10-26T10:25:14Z) - Graph Neural Network-Based Anomaly Detection in Multivariate Time Series [17.414474298706416]
We develop a new way to detect anomalies in high-dimensional time series data.
Our approach combines a structure learning approach with graph neural networks.
We show that our method detects anomalies more accurately than baseline approaches.
arXiv Detail & Related papers (2021-06-13T09:07:30Z) - Interpretable Anomaly Detection with Mondrian P{\'o}lya Forests on Data
Streams [6.177270420667713]
Anomaly detection at scale is an extremely challenging problem of great practicality.
Recent work has coalesced on variations of (random) $k$emphd-trees to summarise data for anomaly detection.
These methods rely on ad-hoc score functions that are not easy to interpret.
We contextualise these methods in a probabilistic framework which we call the Mondrian Polya Forest.
arXiv Detail & Related papers (2020-08-04T13:19:07Z) - A Robust Functional EM Algorithm for Incomplete Panel Count Data [66.07942227228014]
We propose a functional EM algorithm to estimate the counting process mean function under a missing completely at random assumption (MCAR)
The proposed algorithm wraps several popular panel count inference methods, seamlessly deals with incomplete counts and is robust to misspecification of the Poisson process assumption.
We illustrate the utility of the proposed algorithm through numerical experiments and an analysis of smoking cessation data.
arXiv Detail & Related papers (2020-03-02T20:04:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.