Classification based on Topological Data Analysis
- URL: http://arxiv.org/abs/2102.03709v1
- Date: Sun, 7 Feb 2021 03:47:28 GMT
- Title: Classification based on Topological Data Analysis
- Authors: Rolando Kindelan and Jos\'e Fr\'ias and Mauricio Cerda and Nancy
Hitschfeld
- Abstract summary: Topological Data Analysis (TDA) is an emergent field that aims to discover topological information hidden in a dataset.
This paper proposes an algorithm that applies TDA directly to multi-class classification problems, even imbalanced datasets.
- Score: 1.6668132748773563
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Topological Data Analysis (TDA) is an emergent field that aims to discover
topological information hidden in a dataset. TDA tools have been commonly used
to create filters and topological descriptors to improve Machine Learning (ML)
methods. This paper proposes an algorithm that applies TDA directly to
multi-class classification problems, even imbalanced datasets, without any
further ML stage. The proposed algorithm built a filtered simplicial complex on
the dataset. Persistent homology is then applied to guide choosing a
sub-complex where unlabeled points obtain the label with most votes from
labeled neighboring points. To assess the proposed method, 8 datasets were
selected with several degrees of class entanglement, variability on the samples
per class, and dimensionality. On average, the proposed TDABC method was
capable of overcoming baseline classifiers (wk-NN and k-NN) in each of the
computed metrics, especially on classifying entangled and minority classes.
Related papers
- A Closer Look at Deep Learning on Tabular Data [52.50778536274327]
Tabular data is prevalent across various domains in machine learning.
Deep Neural Network (DNN)-based methods have shown promising performance comparable to tree-based ones.
arXiv Detail & Related papers (2024-07-01T04:24:07Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - A Topological Data Analysis Based Classifier [1.6668132748773563]
This paper proposes an algorithm that applies Topological Data Analysis directly to multi-class classification problems.
The proposed algorithm builds a filtered simplicial complex on the dataset.
On average, the proposed TDABC method was better than KNN and weighted-KNN.
arXiv Detail & Related papers (2021-11-09T15:54:16Z) - Class Introspection: A Novel Technique for Detecting Unlabeled
Subclasses by Leveraging Classifier Explainability Methods [0.0]
latent structure is a crucial step in performing analysis of a dataset.
By leveraging instance explanation methods, an existing classifier can be extended to detect latent classes.
This paper also contains a pipeline for analyzing classifiers automatically, and a web application for interactively exploring the results from this technique.
arXiv Detail & Related papers (2021-07-04T14:58:29Z) - Binary Classification from Multiple Unlabeled Datasets via Surrogate Set
Classification [94.55805516167369]
We propose a new approach for binary classification from m U-sets for $mge2$.
Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC)
arXiv Detail & Related papers (2021-02-01T07:36:38Z) - A Method for Handling Multi-class Imbalanced Data by Geometry based
Information Sampling and Class Prioritized Synthetic Data Generation (GICaPS) [15.433936272310952]
This paper looks into the problem of handling imbalanced data in a multi-label classification problem.
Two novel methods are proposed that exploit the geometric relationship between the feature vectors.
The efficacy of the proposed methods is analyzed by solving a generic multi-class recognition problem.
arXiv Detail & Related papers (2020-10-11T04:04:26Z) - Domain Adaptation with Auxiliary Target Domain-Oriented Classifier [115.39091109079622]
Domain adaptation aims to transfer knowledge from a label-rich but heterogeneous domain to a label-scare domain.
One of the most popular SSL techniques is pseudo-labeling that assigns pseudo labels for each unlabeled data.
We propose a new pseudo-labeling framework called Auxiliary Target Domain-Oriented (ATDOC)
ATDOC alleviates the bias by introducing an auxiliary classifier for target data only, to improve the quality of pseudo labels.
arXiv Detail & Related papers (2020-07-08T15:01:35Z) - Global Multiclass Classification and Dataset Construction via
Heterogeneous Local Experts [37.27708297562079]
We show how to minimize the number of labelers while ensuring the reliability of the resulting dataset.
Experiments with the MNIST and CIFAR-10 datasets demonstrate the favorable accuracy of our aggregation scheme.
arXiv Detail & Related papers (2020-05-21T18:07:42Z) - Saliency-based Weighted Multi-label Linear Discriminant Analysis [101.12909759844946]
We propose a new variant of Linear Discriminant Analysis (LDA) to solve multi-label classification tasks.
The proposed method is based on a probabilistic model for defining the weights of individual samples.
The Saliency-based weighted Multi-label LDA approach is shown to lead to performance improvements in various multi-label classification problems.
arXiv Detail & Related papers (2020-04-08T19:40:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.