A Topological Data Analysis Based Classifier
- URL: http://arxiv.org/abs/2111.05214v2
- Date: Wed, 10 Nov 2021 14:33:02 GMT
- Title: A Topological Data Analysis Based Classifier
- Authors: Rolando Kindelan and Jos\'e Fr\'ias and Mauricio Cerda and Nancy
Hitschfeld
- Abstract summary: This paper proposes an algorithm that applies Topological Data Analysis directly to multi-class classification problems.
The proposed algorithm builds a filtered simplicial complex on the dataset.
On average, the proposed TDABC method was better than KNN and weighted-KNN.
- Score: 1.6668132748773563
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Topological Data Analysis (TDA) is an emergent field that aims to discover
topological information hidden in a dataset. TDA tools have been commonly used
to create filters and topological descriptors to improve Machine Learning (ML)
methods. This paper proposes an algorithm that applies TDA directly to
multi-class classification problems, without any further ML stage, showing
advantages for imbalanced datasets. The proposed algorithm builds a filtered
simplicial complex on the dataset. Persistent Homology (PH) is applied to guide
the selection of a sub-complex where unlabeled points obtain the label with the
majority of votes from labeled neighboring points. We select 8 datasets with
different dimensions, degrees of class overlap and imbalanced samples per
class. On average, the proposed TDABC method was better than KNN and
weighted-KNN. It behaves competitively with Local SVM and Random Forest
baseline classifiers in balanced datasets, and it outperforms all baseline
methods classifying entangled and minority classes.
Related papers
- A Closer Look at Deep Learning on Tabular Data [52.50778536274327]
Tabular data is prevalent across various domains in machine learning.
Deep Neural Network (DNN)-based methods have shown promising performance comparable to tree-based ones.
arXiv Detail & Related papers (2024-07-01T04:24:07Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Image Classification using Combination of Topological Features and
Neural Networks [1.0323063834827417]
We use the persistent homology method, a technique in topological data analysis (TDA), to extract essential topological features from the data space.
This was carried out with the aim of classifying images from multiple classes in the MNIST dataset.
Our approach inserts topological features into deep learning approaches composed by single and two-streams neural networks.
arXiv Detail & Related papers (2023-11-10T20:05:40Z) - Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps.
We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z) - Hybrid Ensemble optimized algorithm based on Genetic Programming for
imbalanced data classification [0.0]
We propose a hybrid ensemble algorithm based on Genetic Programming (GP) for two classes of imbalanced data classification.
Experimental results show the performance of the proposed method on the specified data sets in the size of the training set shows 40% and 50% better accuracy than other dimensions of the minority class prediction.
arXiv Detail & Related papers (2021-06-02T14:14:38Z) - Classification based on Topological Data Analysis [1.6668132748773563]
Topological Data Analysis (TDA) is an emergent field that aims to discover topological information hidden in a dataset.
This paper proposes an algorithm that applies TDA directly to multi-class classification problems, even imbalanced datasets.
arXiv Detail & Related papers (2021-02-07T03:47:28Z) - A Method for Handling Multi-class Imbalanced Data by Geometry based
Information Sampling and Class Prioritized Synthetic Data Generation (GICaPS) [15.433936272310952]
This paper looks into the problem of handling imbalanced data in a multi-label classification problem.
Two novel methods are proposed that exploit the geometric relationship between the feature vectors.
The efficacy of the proposed methods is analyzed by solving a generic multi-class recognition problem.
arXiv Detail & Related papers (2020-10-11T04:04:26Z) - Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance
Segmentation [75.93960390191262]
We exploit prior knowledge of the relations among object categories to cluster fine-grained classes into coarser parent classes.
We propose a simple yet effective resampling method, NMS Resampling, to re-balance the data distribution.
Our method, termed as Forest R-CNN, can serve as a plug-and-play module being applied to most object recognition models.
arXiv Detail & Related papers (2020-08-13T03:52:37Z) - Domain Adaptation with Auxiliary Target Domain-Oriented Classifier [115.39091109079622]
Domain adaptation aims to transfer knowledge from a label-rich but heterogeneous domain to a label-scare domain.
One of the most popular SSL techniques is pseudo-labeling that assigns pseudo labels for each unlabeled data.
We propose a new pseudo-labeling framework called Auxiliary Target Domain-Oriented (ATDOC)
ATDOC alleviates the bias by introducing an auxiliary classifier for target data only, to improve the quality of pseudo labels.
arXiv Detail & Related papers (2020-07-08T15:01:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.