Related papers: Directly Handling Missing Data in Linear Discriminant Analysis for Enhancing Classification Accuracy and Interpretability

Directly Handling Missing Data in Linear Discriminant Analysis for Enhancing Classification Accuracy and Interpretability

URL: http://arxiv.org/abs/2407.00710v3
Date: Wed, 09 Oct 2024 14:51:23 GMT
Title: Directly Handling Missing Data in Linear Discriminant Analysis for Enhancing Classification Accuracy and Interpretability
Authors: Tuan L. Vo, Uyen Dang, Thu Nguyen,
Abstract summary: We introduce a novel and robust classification method, termed weighted missing Linear Discriminant Analysis (WLDA) WLDA extends Linear Discriminant Analysis (LDA) to handle datasets with missing values without the need for imputation. We conduct an in-depth theoretical analysis to establish the properties of WLDA and thoroughly evaluate its explainability.
Score: 1.4840867281815378
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As the adoption of Artificial Intelligence (AI) models expands into critical real-world applications, ensuring the explainability of these models becomes paramount, particularly in sensitive fields such as medicine and finance. Linear Discriminant Analysis (LDA) remains a popular choice for classification due to its interpretable nature, derived from its capacity to model class distributions and enhance class separation through linear combinations of features. However, real-world datasets often suffer from incomplete data, posing substantial challenges for both classification accuracy and model interpretability. In this paper, we introduce a novel and robust classification method, termed Weighted missing Linear Discriminant Analysis (WLDA), which extends LDA to handle datasets with missing values without the need for imputation. Our approach innovatively incorporates a weight matrix that penalizes missing entries, thereby refining parameter estimation directly on incomplete data. This methodology not only preserves the interpretability of LDA but also significantly enhances classification performance in scenarios plagued by missing data. We conduct an in-depth theoretical analysis to establish the properties of WLDA and thoroughly evaluate its explainability. Experimental results across various datasets demonstrate that WLDA consistently outperforms traditional methods, especially in challenging environments where missing values are prevalent in both training and test datasets. This advancement provides a critical tool for improving classification accuracy and maintaining model transparency in the face of incomplete data.

Related papers

Deep Linear Discriminant Analysis Revisited [3.569867801312133]
We show that for unconstrained Deep Linear Discriminant Analysis (LDA) classifiers, maximum-likelihood training admits pathological solutions.<n>We introduce the emphDiscriminative Negative Log-Likelihood (DNLL) loss, which augments the LDA log-likelihood with a simple penalty on the mixture density.
arXiv Detail & Related papers (2026-01-04T17:59:11Z)
Reliable and Reproducible Demographic Inference for Fairness in Face Analysis [63.46525489354455]
We propose a fully reproducible DAI pipeline that replaces conventional end-to-end training with a modular transfer learning approach.<n>We audit this pipeline across three dimensions: accuracy, fairness, and a newly introduced notion of robustness, defined via intra-identity consistency.<n>Our results show that the proposed method outperforms strong baselines, particularly on ethnicity, which is the more challenging attribute.
arXiv Detail & Related papers (2025-10-23T12:22:02Z)
Robust Molecular Property Prediction via Densifying Scarce Labeled Data [51.55434084913129]
In drug discovery, compounds most critical for advancing research often lie beyond the training set.<n>We propose a novel meta-learning-based approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data.<n>We demonstrate significant performance gains on challenging real-world datasets.
arXiv Detail & Related papers (2025-06-13T15:27:40Z)
CALF: A Conditionally Adaptive Loss Function to Mitigate Class-Imbalanced Segmentation [0.2902243522110345]
Imbalanced datasets pose a challenge in training deep learning (DL) models for medical diagnostics. We propose a novel, statistically driven, conditionally adaptive loss function (CALF) tailored to accommodate the conditions of imbalanced datasets in DL training.
arXiv Detail & Related papers (2025-04-06T12:03:33Z)
Partially Blinded Unlearning: Class Unlearning for Deep Networks a Bayesian Perspective [4.31734012105466]
Machine Unlearning is the process of selectively discarding information designated to specific sets or classes of data from a pre-trained model. We propose a methodology tailored for the purposeful elimination of information linked to a specific class of data from a pre-trained classification network. Our novel approach, termed textbfPartially-Blinded Unlearning (PBU), surpasses existing state-of-the-art class unlearning methods, demonstrating superior effectiveness.
arXiv Detail & Related papers (2024-03-24T17:33:22Z)
Enhancing Interpretability and Generalizability in Extended Isolation Forests [5.139809663513828]
Extended Isolation Forest Feature Importance (ExIFFI) is a method that explains predictions made by Extended Isolation Forest (EIF) models. EIF+ is designed to enhance the model's ability to detect unseen anomalies through a revised splitting strategy. ExIFFI outperforms other unsupervised interpretability methods on 8 of 11 real-world datasets.
arXiv Detail & Related papers (2023-10-09T07:24:04Z)
Towards Better Modeling with Missing Data: A Contrastive Learning-based Visual Analytics Perspective [7.577040836988683]
Missing data can pose a challenge for machine learning (ML) modeling. Current approaches are categorized into feature imputation and label prediction. This study proposes a Contrastive Learning framework to model observed data with missing values.
arXiv Detail & Related papers (2023-09-18T13:16:24Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
Uncertainty Estimation by Fisher Information-based Evidential Deep Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications. We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL) In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z)
Variation-Incentive Loss Re-weighting for Regression Analysis on Biased Data [8.115323786541078]
We aim to improve the accuracy of the regression analysis by addressing the data skewness/bias during model training. We propose a Variation-Incentive Loss re-weighting method (VILoss) to optimize the gradient descent-based model training for regression analysis.
arXiv Detail & Related papers (2021-09-14T10:22:21Z)
Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines. Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z)
Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management. We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation. We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)
Saliency-based Weighted Multi-label Linear Discriminant Analysis [101.12909759844946]
We propose a new variant of Linear Discriminant Analysis (LDA) to solve multi-label classification tasks. The proposed method is based on a probabilistic model for defining the weights of individual samples. The Saliency-based weighted Multi-label LDA approach is shown to lead to performance improvements in various multi-label classification problems.
arXiv Detail & Related papers (2020-04-08T19:40:53Z)
Improving Covariance-Regularized Discriminant Analysis for EHR-based Predictive Analytics of Diseases [20.697847129363463]
We study an analytical model that understands the accuracy of LDA for classifying data with arbitrary distribution. We also propose a novel LDA classifier De-Sparse that outperforms state-of-the-art LDA approaches developed for HDLSS data.
arXiv Detail & Related papers (2016-10-18T06:11:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.