Interpretable Models Capable of Handling Systematic Missingness in
Imbalanced Classes and Heterogeneous Datasets
- URL: http://arxiv.org/abs/2206.02056v1
- Date: Sat, 4 Jun 2022 20:20:39 GMT
- Title: Interpretable Models Capable of Handling Systematic Missingness in
Imbalanced Classes and Heterogeneous Datasets
- Authors: Sreejita Ghosh (1, 5,6), Elizabeth S. Baranowski (2), Michael Biehl
(1,2,3), Wiebke Arlt (2), Peter Tino (4), and Kerstin Bunte (1) ((1)
Bernoulli Institute of Mathematics, Computer Science and Artificial
Intelligence, University of Groningen, The Netherlands (2) Institute of
Metabolism and Systems Research, University of Birmingham, the United Kingdom
(3) Systems Modelling and Quantitative Biomedicine, IMSR, University of
Birmingham, the United Kingdom (4) School of Computer Science, University of
Birmingham, the United Kingdom (5) Utrecht University, The Netherlands (6)
University Medical Centrum Utrecht, The Netherlands)
- Abstract summary: Application of interpretable machine learning techniques on medical datasets facilitate early and fast diagnoses, along with getting deeper insight into the data.
Medical datasets face common issues such as heterogeneous measurements, imbalanced classes with limited sample size, and missing data.
We present a family of prototype-based (PB) interpretable models which are capable of handling these issues.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Application of interpretable machine learning techniques on medical datasets
facilitate early and fast diagnoses, along with getting deeper insight into the
data. Furthermore, the transparency of these models increase trust among
application domain experts. Medical datasets face common issues such as
heterogeneous measurements, imbalanced classes with limited sample size, and
missing data, which hinder the straightforward application of machine learning
techniques. In this paper we present a family of prototype-based (PB)
interpretable models which are capable of handling these issues. The models
introduced in this contribution show comparable or superior performance to
alternative techniques applicable in such situations. However, unlike ensemble
based models, which have to compromise on easy interpretation, the PB models
here do not. Moreover we propose a strategy of harnessing the power of
ensembles while maintaining the intrinsic interpretability of the PB models, by
averaging the model parameter manifolds. All the models were evaluated on a
synthetic (publicly available dataset) in addition to detailed analyses of two
real-world medical datasets (one publicly available). Results indicated that
the models and strategies we introduced addressed the challenges of real-world
medical data, while remaining computationally inexpensive and transparent, as
well as similar or superior in performance compared to their alternatives.
Related papers
- Challenging the Performance-Interpretability Trade-off: An Evaluation of Interpretable Machine Learning Models [3.3595341706248876]
Generalized additive models (GAMs) offer promising properties for capturing complex, non-linear patterns while remaining fully interpretable.
This study examines the predictive performance of seven different GAMs in comparison to seven commonly used machine learning models based on a collection of twenty benchmark datasets.
arXiv Detail & Related papers (2024-09-22T12:58:52Z) - Towards Better Modeling with Missing Data: A Contrastive Learning-based
Visual Analytics Perspective [7.577040836988683]
Missing data can pose a challenge for machine learning (ML) modeling.
Current approaches are categorized into feature imputation and label prediction.
This study proposes a Contrastive Learning framework to model observed data with missing values.
arXiv Detail & Related papers (2023-09-18T13:16:24Z) - Exploration of the Rashomon Set Assists Trustworthy Explanations for
Medical Data [4.499833362998488]
This paper introduces a novel process to explore models in the Rashomon set, extending the conventional modeling approach.
We propose the $textttRashomon_DETECT$ algorithm to detect models with different behavior.
To quantify differences in variable effects among models, we introduce the Profile Disparity Index (PDI) based on measures from functional data analysis.
arXiv Detail & Related papers (2023-08-22T13:53:43Z) - A prediction and behavioural analysis of machine learning methods for
modelling travel mode choice [0.26249027950824505]
We conduct a systematic comparison of different modelling approaches, across multiple modelling problems, in terms of the key factors likely to affect model choice.
Results indicate that the models with the highest disaggregate predictive performance provide poorer estimates of behavioural indicators and aggregate mode shares.
It is also observed that the MNL model performs robustly in a variety of situations, though ML techniques can improve the estimates of behavioural indices such as Willingness to Pay.
arXiv Detail & Related papers (2023-01-11T11:10:32Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - Using Explainable Boosting Machine to Compare Idiographic and Nomothetic
Approaches for Ecological Momentary Assessment Data [2.0824228840987447]
This paper explores the use of non-linear interpretable machine learning (ML) models in classification problems.
Various ensembles of trees are compared to linear models using imbalanced synthetic and real-world datasets.
In one of the two real-world datasets, knowledge distillation method achieves improved AUC scores.
arXiv Detail & Related papers (2022-04-04T17:56:37Z) - Beyond Explaining: Opportunities and Challenges of XAI-Based Model
Improvement [75.00655434905417]
Explainable Artificial Intelligence (XAI) is an emerging research field bringing transparency to highly complex machine learning (ML) models.
This paper offers a comprehensive overview over techniques that apply XAI practically for improving various properties of ML models.
We show empirically through experiments on toy and realistic settings how explanations can help improve properties such as model generalization ability or reasoning.
arXiv Detail & Related papers (2022-03-15T15:44:28Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z) - Adversarial Sample Enhanced Domain Adaptation: A Case Study on
Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation.
adversarially generated samples are used during domain adaptation.
Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z) - CHEER: Rich Model Helps Poor Model via Knowledge Infusion [69.23072792708263]
We develop a knowledge infusion framework named CHEER that can succinctly summarize such rich model into transferable representations.
Our empirical results showed that CHEER outperformed baselines by 5.60% to 46.80% in terms of the macro-F1 score on multiple physiological datasets.
arXiv Detail & Related papers (2020-05-21T21:44:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.