Clustering Indices based Automatic Classification Model Selection
- URL: http://arxiv.org/abs/2305.13926v1
- Date: Tue, 23 May 2023 10:52:37 GMT
- Title: Clustering Indices based Automatic Classification Model Selection
- Authors: Sudarsun Santhiappan, Nitin Shravan, Balaraman Ravindran
- Abstract summary: We propose a novel method for automatic classification model selection from a set of candidate model classes.
We compute the dataset clustering indices and directly predict the expected classification performance using the learned regressor.
We also propose an end-to-end Automated ML system for data classification based on our model selection method.
- Score: 16.096824533334352
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Classification model selection is a process of identifying a suitable model
class for a given classification task on a dataset. Traditionally, model
selection is based on cross-validation, meta-learning, and user preferences,
which are often time-consuming and resource-intensive. The performance of any
machine learning classification task depends on the choice of the model class,
the learning algorithm, and the dataset's characteristics. Our work proposes a
novel method for automatic classification model selection from a set of
candidate model classes by determining the empirical model-fitness for a
dataset based only on its clustering indices. Clustering Indices measure the
ability of a clustering algorithm to induce good quality neighborhoods with
similar data characteristics. We propose a regression task for a given model
class, where the clustering indices of a given dataset form the features and
the dependent variable represents the expected classification performance. We
compute the dataset clustering indices and directly predict the expected
classification performance using the learned regressor for each candidate model
class to recommend a suitable model class for dataset classification. We
evaluate our model selection method through cross-validation with 60 publicly
available binary class datasets and show that our top3 model recommendation is
accurate for over 45 of 60 datasets. We also propose an end-to-end Automated ML
system for data classification based on our model selection method. We evaluate
our end-to-end system against popular commercial and noncommercial Automated ML
systems using a different collection of 25 public domain binary class datasets.
We show that the proposed system outperforms other methods with an excellent
average rank of 1.68.
Related papers
- Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification [49.09505771145326]
We propose a Hierarchical Dynamic Labeling (HDL) algorithm that does not depend on model predictions and utilizes image embeddings to generate sample labels.
Our approach has the potential to change the paradigm of pseudo-label generation in semi-supervised learning.
arXiv Detail & Related papers (2024-04-26T06:00:27Z) - DsDm: Model-Aware Dataset Selection with Datamodels [81.01744199870043]
Standard practice is to filter for examples that match human notions of data quality.
We find that selecting according to similarity with "high quality" data sources may not increase (and can even hurt) performance compared to randomly selecting data.
Our framework avoids handpicked notions of data quality, and instead models explicitly how the learning process uses train datapoints to predict on the target tasks.
arXiv Detail & Related papers (2024-01-23T17:22:00Z) - Automatic learning algorithm selection for classification via
convolutional neural networks [0.0]
The goal of this study is to learn the inherent structure of the data without identifying meta-features.
Experiments with simulated datasets show that the proposed approach achieves nearly perfect performance in identifying linear and nonlinear patterns.
arXiv Detail & Related papers (2023-05-16T01:57:01Z) - Anomaly Detection using Ensemble Classification and Evidence Theory [62.997667081978825]
We present a novel approach for novel detection using ensemble classification and evidence theory.
A pool selection strategy is presented to build a solid ensemble classifier.
We use uncertainty for the anomaly detection approach.
arXiv Detail & Related papers (2022-12-23T00:50:41Z) - Which is the best model for my data? [0.0]
The proposed meta-learning approach relies on machine learning and involves four major steps.
We present a collection of 62 meta-features that address the problem of information cancellation when aggregation measure values involving positive and negative measurements.
We show that our meta-learning approach can correctly predict an optimal model for 91% of the synthetic datasets and for 87% of the real-world datasets.
arXiv Detail & Related papers (2022-10-26T13:15:43Z) - A hybrid model-based and learning-based approach for classification
using limited number of training samples [13.60714541247498]
In this paper, a hybrid classification method -- HyPhyLearn -- is proposed that exploits both the physics-based statistical models and the learning-based classifiers.
The proposed solution is based on the conjecture that HyPhyLearn would alleviate the challenges associated with the individual approaches of learning-based and statistical model-based classifiers.
arXiv Detail & Related papers (2021-06-25T05:19:50Z) - Meta Learning for Few-Shot One-class Classification [0.0]
We formulate the learning of meaningful features for one-class classification as a meta-learning problem.
To learn these representations, we require only multiclass data from similar tasks.
We validate our approach by adapting few-shot classification datasets to the few-shot one-class classification scenario.
arXiv Detail & Related papers (2020-09-11T11:35:28Z) - Few-shot Classification via Adaptive Attention [93.06105498633492]
We propose a novel few-shot learning method via optimizing and fast adapting the query sample representation based on very few reference samples.
As demonstrated experimentally, the proposed model achieves state-of-the-art classification results on various benchmark few-shot classification and fine-grained recognition datasets.
arXiv Detail & Related papers (2020-08-06T05:52:59Z) - Multi-label learning for dynamic model type recommendation [13.304462985219237]
We propose a problem-independent dynamic base-classifier model recommendation for the online local pool (OLP) technique.
Our proposed framework builds a multi-label meta-classifier responsible for recommending a set of relevant model types.
Experimental results show that different data distributions favored different model types on a local scope.
arXiv Detail & Related papers (2020-04-01T16:42:12Z) - Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition
from a Domain Adaptation Perspective [98.70226503904402]
Object frequency in the real world often follows a power law, leading to a mismatch between datasets with long-tailed class distributions.
We propose to augment the classic class-balanced learning by explicitly estimating the differences between the class-conditioned distributions with a meta-learning approach.
arXiv Detail & Related papers (2020-03-24T11:28:42Z) - Selecting Relevant Features from a Multi-domain Representation for
Few-shot Classification [91.67977602992657]
We propose a new strategy based on feature selection, which is both simpler and more effective than previous feature adaptation approaches.
We show that a simple non-parametric classifier built on top of such features produces high accuracy and generalizes to domains never seen during training.
arXiv Detail & Related papers (2020-03-20T15:44:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.