Related papers: A Topological Data Analysis Based Classifier

A Topological Data Analysis Based Classifier

URL: http://arxiv.org/abs/2111.05214v2
Date: Wed, 10 Nov 2021 14:33:02 GMT
Title: A Topological Data Analysis Based Classifier
Authors: Rolando Kindelan and Jos\'e Fr\'ias and Mauricio Cerda and Nancy Hitschfeld
Abstract summary: This paper proposes an algorithm that applies Topological Data Analysis directly to multi-class classification problems. The proposed algorithm builds a filtered simplicial complex on the dataset. On average, the proposed TDABC method was better than KNN and weighted-KNN.
Score: 1.6668132748773563
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Topological Data Analysis (TDA) is an emergent field that aims to discover topological information hidden in a dataset. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes an algorithm that applies TDA directly to multi-class classification problems, without any further ML stage, showing advantages for imbalanced datasets. The proposed algorithm builds a filtered simplicial complex on the dataset. Persistent Homology (PH) is applied to guide the selection of a sub-complex where unlabeled points obtain the label with the majority of votes from labeled neighboring points. We select 8 datasets with different dimensions, degrees of class overlap and imbalanced samples per class. On average, the proposed TDABC method was better than KNN and weighted-KNN. It behaves competitively with Local SVM and Random Forest baseline classifiers in balanced datasets, and it outperforms all baseline methods classifying entangled and minority classes.

Related papers

Categorical Data Clustering via Value Order Estimated Distance Metric Learning [53.28598689867732]
This paper introduces a novel order distance metric learning approach to intuitively represent categorical attribute values.<n>A new joint learning paradigm is developed to alternatively perform clustering and order distance metric learning.<n>The proposed method achieves superior clustering accuracy on categorical and mixed datasets.
arXiv Detail & Related papers (2024-11-19T08:23:25Z)
A Closer Look at Deep Learning on Tabular Data [52.50778536274327]
Tabular data is prevalent across various domains in machine learning. Deep Neural Network (DNN)-based methods have shown promising performance comparable to tree-based ones.
arXiv Detail & Related papers (2024-07-01T04:24:07Z)
Minimally Supervised Learning using Topological Projections in Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs) Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU) Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z)
Image Classification using Combination of Topological Features and Neural Networks [1.0323063834827417]
We use the persistent homology method, a technique in topological data analysis (TDA), to extract essential topological features from the data space. This was carried out with the aim of classifying images from multiple classes in the MNIST dataset. Our approach inserts topological features into deep learning approaches composed by single and two-streams neural networks.
arXiv Detail & Related papers (2023-11-10T20:05:40Z)
Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task. We propose a co-training-based framework that encourages clustering consistency. Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z)
CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps. We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z)
Hybrid Ensemble optimized algorithm based on Genetic Programming for imbalanced data classification [0.0]
We propose a hybrid ensemble algorithm based on Genetic Programming (GP) for two classes of imbalanced data classification. Experimental results show the performance of the proposed method on the specified data sets in the size of the training set shows 40% and 50% better accuracy than other dimensions of the minority class prediction.
arXiv Detail & Related papers (2021-06-02T14:14:38Z)
Classification based on Topological Data Analysis [1.6668132748773563]
Topological Data Analysis (TDA) is an emergent field that aims to discover topological information hidden in a dataset. This paper proposes an algorithm that applies TDA directly to multi-class classification problems, even imbalanced datasets.
arXiv Detail & Related papers (2021-02-07T03:47:28Z)
A Method for Handling Multi-class Imbalanced Data by Geometry based Information Sampling and Class Prioritized Synthetic Data Generation (GICaPS) [15.433936272310952]
This paper looks into the problem of handling imbalanced data in a multi-label classification problem. Two novel methods are proposed that exploit the geometric relationship between the feature vectors. The efficacy of the proposed methods is analyzed by solving a generic multi-class recognition problem.
arXiv Detail & Related papers (2020-10-11T04:04:26Z)
Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation [75.93960390191262]
We exploit prior knowledge of the relations among object categories to cluster fine-grained classes into coarser parent classes. We propose a simple yet effective resampling method, NMS Resampling, to re-balance the data distribution. Our method, termed as Forest R-CNN, can serve as a plug-and-play module being applied to most object recognition models.
arXiv Detail & Related papers (2020-08-13T03:52:37Z)
Domain Adaptation with Auxiliary Target Domain-Oriented Classifier [115.39091109079622]
Domain adaptation aims to transfer knowledge from a label-rich but heterogeneous domain to a label-scare domain. One of the most popular SSL techniques is pseudo-labeling that assigns pseudo labels for each unlabeled data. We propose a new pseudo-labeling framework called Auxiliary Target Domain-Oriented (ATDOC) ATDOC alleviates the bias by introducing an auxiliary classifier for target data only, to improve the quality of pseudo labels.
arXiv Detail & Related papers (2020-07-08T15:01:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.