Related papers: Clustering Indices based Automatic Classification Model Selection

Clustering Indices based Automatic Classification Model Selection

URL: http://arxiv.org/abs/2305.13926v1
Date: Tue, 23 May 2023 10:52:37 GMT
Title: Clustering Indices based Automatic Classification Model Selection
Authors: Sudarsun Santhiappan, Nitin Shravan, Balaraman Ravindran
Abstract summary: We propose a novel method for automatic classification model selection from a set of candidate model classes. We compute the dataset clustering indices and directly predict the expected classification performance using the learned regressor. We also propose an end-to-end Automated ML system for data classification based on our model selection method.
Score: 16.096824533334352
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Classification model selection is a process of identifying a suitable model class for a given classification task on a dataset. Traditionally, model selection is based on cross-validation, meta-learning, and user preferences, which are often time-consuming and resource-intensive. The performance of any machine learning classification task depends on the choice of the model class, the learning algorithm, and the dataset's characteristics. Our work proposes a novel method for automatic classification model selection from a set of candidate model classes by determining the empirical model-fitness for a dataset based only on its clustering indices. Clustering Indices measure the ability of a clustering algorithm to induce good quality neighborhoods with similar data characteristics. We propose a regression task for a given model class, where the clustering indices of a given dataset form the features and the dependent variable represents the expected classification performance. We compute the dataset clustering indices and directly predict the expected classification performance using the learned regressor for each candidate model class to recommend a suitable model class for dataset classification. We evaluate our model selection method through cross-validation with 60 publicly available binary class datasets and show that our top3 model recommendation is accurate for over 45 of 60 datasets. We also propose an end-to-end Automated ML system for data classification based on our model selection method. We evaluate our end-to-end system against popular commercial and noncommercial Automated ML systems using a different collection of 25 public domain binary class datasets. We show that the proposed system outperforms other methods with an excellent average rank of 1.68.

Related papers

Algorithm Selection with Probing Trajectories: Benchmarking the Choice of Classifier Model [0.20718016474717196]
We conduct a benchmark study using 17 different classifiers and three types of trajectory on a classification task using the BBOB benchmark suite. We find that the choice of classifier has a significant impact, showing that feature-based and interval-based models are the best choices.
arXiv Detail & Related papers (2025-01-20T11:28:45Z)
Feature Selection for Latent Factor Models [2.07180164747172]
Feature selection is crucial for pinpointing relevant features in high-dimensional datasets. Traditional feature selection methods for classification use data from all classes to select features for each class. This paper explores feature selection methods that select features for each class separately, using class models based on low-rank generative methods.
arXiv Detail & Related papers (2024-12-13T13:20:10Z)
Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification [49.09505771145326]
We propose a Hierarchical Dynamic Labeling (HDL) algorithm that does not depend on model predictions and utilizes image embeddings to generate sample labels. Our approach has the potential to change the paradigm of pseudo-label generation in semi-supervised learning.
arXiv Detail & Related papers (2024-04-26T06:00:27Z)
DsDm: Model-Aware Dataset Selection with Datamodels [81.01744199870043]
Standard practice is to filter for examples that match human notions of data quality. We find that selecting according to similarity with "high quality" data sources may not increase (and can even hurt) performance compared to randomly selecting data. Our framework avoids handpicked notions of data quality, and instead models explicitly how the learning process uses train datapoints to predict on the target tasks.
arXiv Detail & Related papers (2024-01-23T17:22:00Z)
Automatic learning algorithm selection for classification via convolutional neural networks [0.0]
The goal of this study is to learn the inherent structure of the data without identifying meta-features. Experiments with simulated datasets show that the proposed approach achieves nearly perfect performance in identifying linear and nonlinear patterns.
arXiv Detail & Related papers (2023-05-16T01:57:01Z)
Anomaly Detection using Ensemble Classification and Evidence Theory [62.997667081978825]
We present a novel approach for novel detection using ensemble classification and evidence theory. A pool selection strategy is presented to build a solid ensemble classifier. We use uncertainty for the anomaly detection approach.
arXiv Detail & Related papers (2022-12-23T00:50:41Z)
Which is the best model for my data? [0.0]
The proposed meta-learning approach relies on machine learning and involves four major steps. We present a collection of 62 meta-features that address the problem of information cancellation when aggregation measure values involving positive and negative measurements. We show that our meta-learning approach can correctly predict an optimal model for 91% of the synthetic datasets and for 87% of the real-world datasets.
arXiv Detail & Related papers (2022-10-26T13:15:43Z)
A hybrid model-based and learning-based approach for classification using limited number of training samples [13.60714541247498]
In this paper, a hybrid classification method -- HyPhyLearn -- is proposed that exploits both the physics-based statistical models and the learning-based classifiers. The proposed solution is based on the conjecture that HyPhyLearn would alleviate the challenges associated with the individual approaches of learning-based and statistical model-based classifiers.
arXiv Detail & Related papers (2021-06-25T05:19:50Z)
Meta Learning for Few-Shot One-class Classification [0.0]
We formulate the learning of meaningful features for one-class classification as a meta-learning problem. To learn these representations, we require only multiclass data from similar tasks. We validate our approach by adapting few-shot classification datasets to the few-shot one-class classification scenario.
arXiv Detail & Related papers (2020-09-11T11:35:28Z)
Few-shot Classification via Adaptive Attention [93.06105498633492]
We propose a novel few-shot learning method via optimizing and fast adapting the query sample representation based on very few reference samples. As demonstrated experimentally, the proposed model achieves state-of-the-art classification results on various benchmark few-shot classification and fine-grained recognition datasets.
arXiv Detail & Related papers (2020-08-06T05:52:59Z)
Multi-label learning for dynamic model type recommendation [13.304462985219237]
We propose a problem-independent dynamic base-classifier model recommendation for the online local pool (OLP) technique. Our proposed framework builds a multi-label meta-classifier responsible for recommending a set of relevant model types. Experimental results show that different data distributions favored different model types on a local scope.
arXiv Detail & Related papers (2020-04-01T16:42:12Z)
Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective [98.70226503904402]
Object frequency in the real world often follows a power law, leading to a mismatch between datasets with long-tailed class distributions. We propose to augment the classic class-balanced learning by explicitly estimating the differences between the class-conditioned distributions with a meta-learning approach.
arXiv Detail & Related papers (2020-03-24T11:28:42Z)
Selecting Relevant Features from a Multi-domain Representation for Few-shot Classification [91.67977602992657]
We propose a new strategy based on feature selection, which is both simpler and more effective than previous feature adaptation approaches. We show that a simple non-parametric classifier built on top of such features produces high accuracy and generalizes to domains never seen during training.
arXiv Detail & Related papers (2020-03-20T15:44:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.