Weighted Missing Linear Discriminant Analysis: An Explainable Approach for Classification with Missing Data
- URL: http://arxiv.org/abs/2407.00710v1
- Date: Sun, 30 Jun 2024 14:21:32 GMT
- Title: Weighted Missing Linear Discriminant Analysis: An Explainable Approach for Classification with Missing Data
- Authors: Tuan L. Vo, Uyen Dang, Thu Nguyen,
- Abstract summary: We propose a novel approach to Linear Discriminant Analysis (LDA) under missing data.
We estimate parameters directly on missing data and use a weight matrix for missing values to penalize missing entries during classification.
Experimental results demonstrate that WLDA outperforms conventional methods by a significant margin.
- Score: 1.4840867281815378
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As Artificial Intelligence (AI) models are gradually being adopted in real-life applications, the explainability of the model used is critical, especially in high-stakes areas such as medicine, finance, etc. Among the commonly used models, Linear Discriminant Analysis (LDA) is a widely used classification tool that is also explainable thanks to its ability to model class distributions and maximize class separation through linear feature combinations. Nevertheless, real-world data is frequently incomplete, presenting significant challenges for classification tasks and model explanations. In this paper, we propose a novel approach to LDA under missing data, termed \textbf{\textit{Weighted missing Linear Discriminant Analysis (WLDA)}}, to directly classify observations in data that contains missing values without imputation effectively by estimating the parameters directly on missing data and use a weight matrix for missing values to penalize missing entries during classification. Furthermore, we also analyze the theoretical properties and examine the explainability of the proposed technique in a comprehensive manner. Experimental results demonstrate that WLDA outperforms conventional methods by a significant margin, particularly in scenarios where missing values are present in both training and test sets.
Related papers
- Generalized Criterion for Identifiability of Additive Noise Models Using Majorization [7.448620208767376]
We introduce a novel identifiability criterion for directed acyclic graph (DAG) models.
We demonstrate that this criterion extends and generalizes existing identifiability criteria.
We present a new algorithm for learning a topological ordering of variables.
arXiv Detail & Related papers (2024-04-08T02:18:57Z) - Minimally Informed Linear Discriminant Analysis: training an LDA model
with unlabelled data [51.673443581397954]
We show that it is possible to compute the exact projection vector from LDA models based on unlabelled data.
We show that the MILDA projection vector can be computed in a closed form with a computational cost comparable to LDA.
arXiv Detail & Related papers (2023-10-17T09:50:31Z) - Minimal Assumptions for Optimal Serology Classification: Theory and
Implications for Multidimensional Settings and Impure Training Data [0.0]
Minimizing error in prevalence estimates and diagnostic classifiers remains a challenging task in serology.
We propose a technique that uses empirical training data to classify samples and estimate prevalence in arbitrary dimension without direct access to the conditional PDFs.
We validate our methods in the context of synthetic data and a research-use SARS-CoV-2 enzyme-linked immunosorbent (ELISA) assay.
arXiv Detail & Related papers (2023-08-30T13:26:49Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Nonparametric Functional Analysis of Generalized Linear Models Under
Nonlinear Constraints [0.0]
This article introduces a novel nonparametric methodology for Generalized Linear Models.
It combines the strengths of the binary regression and latent variable formulations for categorical data.
It extends recently published parametric versions of the methodology and generalizes it.
arXiv Detail & Related papers (2021-10-11T04:49:59Z) - Imputation of Missing Data with Class Imbalance using Conditional
Generative Adversarial Networks [24.075691766743702]
We propose a new method for imputing missing data based on its class-specific characteristics.
Our Conditional Generative Adversarial Imputation Network (CGAIN) imputes the missing data using class-specific distributions.
We tested our approach on benchmark datasets and achieved superior performance compared with the state-of-the-art and popular imputation approaches.
arXiv Detail & Related papers (2020-12-01T02:26:54Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Saliency-based Weighted Multi-label Linear Discriminant Analysis [101.12909759844946]
We propose a new variant of Linear Discriminant Analysis (LDA) to solve multi-label classification tasks.
The proposed method is based on a probabilistic model for defining the weights of individual samples.
The Saliency-based weighted Multi-label LDA approach is shown to lead to performance improvements in various multi-label classification problems.
arXiv Detail & Related papers (2020-04-08T19:40:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.