Visualisation and knowledge discovery from interpretable models
- URL: http://arxiv.org/abs/2005.03632v2
- Date: Fri, 8 May 2020 08:22:02 GMT
- Title: Visualisation and knowledge discovery from interpretable models
- Authors: Sreejita Ghosh, Peter Tino, Kerstin Bunte
- Abstract summary: We introduce a few intrinsically interpretable models which are also capable of dealing with missing values.
We have demonstrated the algorithms on a synthetic dataset and a real-world one.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Increasing number of sectors which affect human lives, are using Machine
Learning (ML) tools. Hence the need for understanding their working mechanism
and evaluating their fairness in decision-making, are becoming paramount,
ushering in the era of Explainable AI (XAI). In this contribution we introduced
a few intrinsically interpretable models which are also capable of dealing with
missing values, in addition to extracting knowledge from the dataset and about
the problem. These models are also capable of visualisation of the classifier
and decision boundaries: they are the angle based variants of Learning Vector
Quantization. We have demonstrated the algorithms on a synthetic dataset and a
real-world one (heart disease dataset from the UCI repository). The newly
developed classifiers helped in investigating the complexities of the UCI
dataset as a multiclass problem. The performance of the developed classifiers
were comparable to those reported in literature for this dataset, with
additional value of interpretability, when the dataset was treated as a binary
class problem.
Related papers
- iGAiVA: Integrated Generative AI and Visual Analytics in a Machine Learning Workflow for Text Classification [2.0094862015890245]
We present a solution for using visual analytics (VA) to guide the generation of synthetic data using large language models.
We discuss different types of data deficiency, describe different VA techniques for supporting their identification, and demonstrate the effectiveness of targeted data synthesis.
arXiv Detail & Related papers (2024-09-24T08:19:45Z) - Towards Better Modeling with Missing Data: A Contrastive Learning-based
Visual Analytics Perspective [7.577040836988683]
Missing data can pose a challenge for machine learning (ML) modeling.
Current approaches are categorized into feature imputation and label prediction.
This study proposes a Contrastive Learning framework to model observed data with missing values.
arXiv Detail & Related papers (2023-09-18T13:16:24Z) - Machine Learning Based Missing Values Imputation in Categorical Datasets [2.5611256859404983]
This research looked into the use of machine learning algorithms to fill in the gaps in categorical datasets.
The emphasis was on ensemble models constructed using the Error Correction Output Codes framework.
Deep learning for missing data imputation has obstacles despite these encouraging results, including the requirement for large amounts of labeled data.
arXiv Detail & Related papers (2023-06-10T03:29:48Z) - Interpretable ML for Imbalanced Data [22.355966235617014]
Imbalanced data compounds the black-box nature of deep networks because the relationships between classes may be skewed and unclear.
Existing methods that investigate imbalanced data complexity are geared toward binary classification, shallow learning models and low dimensional data.
We propose a set of techniques that can be used by both deep learning model users to identify, visualize and understand class prototypes, sub-concepts and outlier instances.
arXiv Detail & Related papers (2022-12-15T11:50:31Z) - RandomSCM: interpretable ensembles of sparse classifiers tailored for
omics data [59.4141628321618]
We propose an ensemble learning algorithm based on conjunctions or disjunctions of decision rules.
The interpretability of the models makes them useful for biomarker discovery and patterns discovery in high dimensional data.
arXiv Detail & Related papers (2022-08-11T13:55:04Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning.
We propose a novel method of using data augmentations when training autoencoders.
We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z) - Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented.
It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts.
Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.