Topological Machine Learning for Mixed Numeric and Categorical Data
- URL: http://arxiv.org/abs/2003.04584v2
- Date: Fri, 12 Jun 2020 15:28:17 GMT
- Title: Topological Machine Learning for Mixed Numeric and Categorical Data
- Authors: Chengyuan Wu, Carol Anne Hargreaves
- Abstract summary: Topological data analysis is a new branch of machine learning that excels in studying high dimensional data.
Mixed data objects with mixed numeric and categorical attributes are ubiquitous in real-world applications.
We propose a novel topological machine learning method for mixed data classification.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Topological data analysis is a relatively new branch of machine learning that
excels in studying high dimensional data, and is theoretically known to be
robust against noise. Meanwhile, data objects with mixed numeric and
categorical attributes are ubiquitous in real-world applications. However,
topological methods are usually applied to point cloud data, and to the best of
our knowledge there is no available framework for the classification of mixed
data using topological methods. In this paper, we propose a novel topological
machine learning method for mixed data classification. In the proposed method,
we use theory from topological data analysis such as persistent homology,
persistence diagrams and Wasserstein distance to study mixed data. The
performance of the proposed method is demonstrated by experiments on a
real-world heart disease dataset. Experimental results show that our
topological method outperforms several state-of-the-art algorithms in the
prediction of heart disease.
Related papers
- Topograph: An efficient Graph-Based Framework for Strictly Topology Preserving Image Segmentation [78.54656076915565]
Topological correctness plays a critical role in many image segmentation tasks.
Most networks are trained using pixel-wise loss functions, such as Dice, neglecting topological accuracy.
We propose a novel, graph-based framework for topologically accurate image segmentation.
arXiv Detail & Related papers (2024-11-05T16:20:14Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Synthetic Data Generation and Deep Learning for the Topological Analysis
of 3D Data [0.0]
This research uses deep learning to estimate the topology of sparse, unordered point cloud scenes in 3D.
The experimental results of this pilot study support the hypothesis that, with the aid of sophisticated synthetic data generation, neural networks can perform segmentation-based topological data analysis.
arXiv Detail & Related papers (2023-09-29T04:37:35Z) - Topological Learning in Multi-Class Data Sets [0.3050152425444477]
We study the impact of topological complexity on learning in feedforward deep neural networks (DNNs)
We evaluate our topological classification algorithm on multiple constructed and open source data sets.
arXiv Detail & Related papers (2023-01-23T21:54:25Z) - Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees.
In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets.
It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z) - On topological data analysis for structural dynamics: an introduction to
persistent homology [0.0]
Topological data analysis is a method of quantifying the shape of data over a range of length scales.
Persistent homology is a method of quantifying the shape of data over a range of length scales.
arXiv Detail & Related papers (2022-09-12T10:39:38Z) - RandomSCM: interpretable ensembles of sparse classifiers tailored for
omics data [59.4141628321618]
We propose an ensemble learning algorithm based on conjunctions or disjunctions of decision rules.
The interpretability of the models makes them useful for biomarker discovery and patterns discovery in high dimensional data.
arXiv Detail & Related papers (2022-08-11T13:55:04Z) - A Topological Approach for Semi-Supervised Learning [0.0]
We present new semi-supervised learning methods based on techniques from Topological Data Analysis (TDA)
In particular, we have created two semi-supervised learning methods following two different topological approaches.
The results show that the methods developed in this work outperform both the results obtained with models trained with only manually labelled data, and those obtained with classical semi-supervised learning methods.
arXiv Detail & Related papers (2022-05-19T15:23:39Z) - Predictive Geological Mapping with Convolution Neural Network Using
Statistical Data Augmentation on a 3D Model [0.0]
We develop a data augmentation workflow that uses a 3D geological and magnetic susceptibility model as input.
A Gated Shape Convolutional Neural Network algorithm was trained on a generated synthetic dataset to perform geological mapping.
The validation conducted on a portion of the synthetic dataset and data from adjacent areas shows that the methodology is suitable to segment the surficial geology.
arXiv Detail & Related papers (2021-10-27T13:56:40Z) - Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process.
Our method significantly reduces the required number of interactions compared with random intervention targeting.
We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.