Explainable Mixed Data Representation and Lossless Visualization Toolkit
for Knowledge Discovery
- URL: http://arxiv.org/abs/2206.06476v1
- Date: Mon, 13 Jun 2022 21:14:58 GMT
- Title: Explainable Mixed Data Representation and Lossless Visualization Toolkit
for Knowledge Discovery
- Authors: Boris Kovalerchuk, Elijah McCoy
- Abstract summary: Developing Machine Learning algorithms for heterogeneous/mixed data is a longstanding problem.
Many ML algorithms are not applicable to mixed data, which include numeric and non-numeric data, text, graphs and so on.
This paper presents a classification of mixed data types, analyzes their importance for ML and present the developed experimental toolkit to deal with mixed data.
- Score: 7.005458308454871
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Developing Machine Learning (ML) algorithms for heterogeneous/mixed data is a
longstanding problem. Many ML algorithms are not applicable to mixed data,
which include numeric and non-numeric data, text, graphs and so on to generate
interpretable models. Another longstanding problem is developing algorithms for
lossless visualization of multidimensional mixed data. The further progress in
ML heavily depends on success interpretable ML algorithms for mixed data and
lossless interpretable visualization of multidimensional data. The later allows
developing interpretable ML models using visual knowledge discovery by
end-users, who can bring valuable domain knowledge which is absent in the
training data. The challenges for mixed data include: (1) generating numeric
coding schemes for non-numeric attributes for numeric ML algorithms to provide
accurate and interpretable ML models, (2) generating methods for lossless
visualization of n-D non-numeric data and visual rule discovery in these
visualizations. This paper presents a classification of mixed data types,
analyzes their importance for ML and present the developed experimental toolkit
to deal with mixed data. It combines the Data Types Editor, VisCanvas data
visualization and rule discovery system which is available on GitHub.
Related papers
- Explainable Machine Learning for Categorical and Mixed Data with
Lossless Visualization [3.4809730725241597]
This study proposes a classification of mixed data types and analyzes their important role in Machine Learning.
It presents a toolkit for enforcing interpretability of all internal operations of ML algorithms on mixed data with a visual data exploration on mixed data.
A new Sequential Rule Generation (SRG) algorithm for explainable rule generation with categorical data is proposed and successfully evaluated in multiple computational experiments.
arXiv Detail & Related papers (2023-05-29T00:41:32Z) - AI Model Disgorgement: Methods and Choices [127.54319351058167]
We introduce a taxonomy of possible disgorgement methods that are applicable to modern machine learning systems.
We investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.
arXiv Detail & Related papers (2023-04-07T08:50:18Z) - Integrating Transformer and Autoencoder Techniques with Spectral Graph
Algorithms for the Prediction of Scarcely Labeled Molecular Data [2.8360662552057323]
This work introduces three graph-based models incorporating Merriman-Bence-Osher (MBO) techniques to tackle this challenge.
Specifically, graph-based modifications of the MBO scheme is integrated with state-of-the-art techniques, including a home-made transformer and an autoencoder.
The proposed models are validated using five benchmark data sets.
arXiv Detail & Related papers (2022-11-12T22:45:32Z) - Learning Mixtures of Linear Dynamical Systems [94.49754087817931]
We develop a two-stage meta-algorithm to efficiently recover each ground-truth LDS model up to error $tildeO(sqrtd/T)$.
We validate our theoretical studies with numerical experiments, confirming the efficacy of the proposed algorithm.
arXiv Detail & Related papers (2022-01-26T22:26:01Z) - Distributionally Robust Semi-Supervised Learning Over Graphs [68.29280230284712]
Semi-supervised learning (SSL) over graph-structured data emerges in many network science applications.
To efficiently manage learning over graphs, variants of graph neural networks (GNNs) have been developed recently.
Despite their success in practice, most of existing methods are unable to handle graphs with uncertain nodal attributes.
Challenges also arise due to distributional uncertainties associated with data acquired by noisy measurements.
A distributionally robust learning framework is developed, where the objective is to train models that exhibit quantifiable robustness against perturbations.
arXiv Detail & Related papers (2021-10-20T14:23:54Z) - PyHard: a novel tool for generating hardness embeddings to support
data-centric analysis [0.38233569758620045]
PyHard produces a hardness embedding of a dataset relating the predictive performance of multiple ML models.
The user can interact with this embedding in multiple ways to obtain useful insights about data and algorithmic performance.
We show in a COVID prognosis dataset how this analysis supported the identification of pockets of hard observations that challenge ML models.
arXiv Detail & Related papers (2021-09-29T14:08:26Z) - An Introduction to Robust Graph Convolutional Networks [71.68610791161355]
We propose a novel Robust Graph Convolutional Neural Networks for possible erroneous single-view or multi-view data.
By incorporating an extra layers via Autoencoders into traditional graph convolutional networks, we characterize and handle typical error models explicitly.
arXiv Detail & Related papers (2021-03-27T04:47:59Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - Visualisation and knowledge discovery from interpretable models [0.0]
We introduce a few intrinsically interpretable models which are also capable of dealing with missing values.
We have demonstrated the algorithms on a synthetic dataset and a real-world one.
arXiv Detail & Related papers (2020-05-07T17:37:06Z) - Injective Domain Knowledge in Neural Networks for Transprecision
Computing [17.300144121921882]
This paper studies the improvements that can be obtained by integrating prior knowledge when dealing with a non-trivial learning task.
The results clearly show that ML models exploiting problem-specific information outperform the purely data-driven ones, with an average accuracy improvement around 38%.
arXiv Detail & Related papers (2020-02-24T12:58:56Z) - Data Augmentation for Histopathological Images Based on
Gaussian-Laplacian Pyramid Blending [59.91656519028334]
Data imbalance is a major problem that affects several machine learning (ML) algorithms.
In this paper, we propose a novel approach capable of not only augmenting HI dataset but also distributing the inter-patient variability.
Experimental results on the BreakHis dataset have shown promising gains vis-a-vis the majority of DA techniques presented in the literature.
arXiv Detail & Related papers (2020-01-31T22:02:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.