CLAMS: A System for Zero-Shot Model Selection for Clustering
- URL: http://arxiv.org/abs/2407.11286v1
- Date: Mon, 15 Jul 2024 23:50:07 GMT
- Title: CLAMS: A System for Zero-Shot Model Selection for Clustering
- Authors: Prabhant Singh, Pieter Gijsbers, Murat Onur Yildirim, Elif Ceren Gok, Joaquin Vanschoren,
- Abstract summary: We propose an AutoML system that enables model selection on clustering problems by leveraging optimal transport-based dataset similarity.
We compare our results against multiple clustering baselines and find that it outperforms all of them, hence demonstrating the utility of similarity-based automated model selection for solving clustering applications.
- Score: 3.7127285734321194
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose an AutoML system that enables model selection on clustering problems by leveraging optimal transport-based dataset similarity. Our objective is to establish a comprehensive AutoML pipeline for clustering problems and provide recommendations for selecting the most suitable algorithms, thus opening up a new area of AutoML beyond the traditional supervised learning settings. We compare our results against multiple clustering baselines and find that it outperforms all of them, hence demonstrating the utility of similarity-based automated model selection for solving clustering applications.
Related papers
- Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild [84.57103623507082]
This paper introduces Model-GLUE, a holistic Large Language Models scaling guideline.
Our work starts with a benchmarking of existing LLM scaling techniques, especially selective merging, and variants of mixture.
Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters through a model mixture.
arXiv Detail & Related papers (2024-10-07T15:55:55Z) - Problem-oriented AutoML in Clustering [2.541080349729282]
The Problem-oriented AutoML in Clustering (PoAC) framework introduces a novel, flexible approach to automating clustering tasks.
PoAC establishes a dynamic connection between the clustering problem, CVIs, and meta-features, allowing users to customize these components.
PoAC is algorithm-agnostic, adapting seamlessly to different clustering problems without requiring additional data or retraining.
arXiv Detail & Related papers (2024-09-24T16:25:53Z) - An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Large Language Models Enable Few-Shot Clustering [88.06276828752553]
We show that large language models can amplify an expert's guidance to enable query-efficient, few-shot semi-supervised text clustering.
We find incorporating LLMs in the first two stages can routinely provide significant improvements in cluster quality.
arXiv Detail & Related papers (2023-07-02T09:17:11Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection [19.066989850964756]
We introduce a discriminative clustering model trying to maximise a geometry-aware generalisation of the mutual information called GEMINI.
This algorithm avoids the burden of feature exploration and is easily scalable to high-dimensional data and large amounts of samples while only designing a discriminative clustering model.
Our results show that Sparse GEMINI is a competitive algorithm and has the ability to select relevant subsets of variables with respect to the clustering without using relevance criteria or prior hypotheses.
arXiv Detail & Related papers (2023-02-07T10:52:04Z) - A Deep Neural Networks ensemble workflow from hyperparameter search to
inference leveraging GPU clusters [0.0]
AutoML seeks to automatically build ensembles of Deep Neural Networks (DNNs) to achieve qualitative predictions.
We propose a new AutoML to build a larger library of accurate and diverse individual models to then construct ensembles.
New ensemble selection method based on a multi-objective greedy algorithm is proposed to generate accurate ensembles.
arXiv Detail & Related papers (2022-08-30T08:04:19Z) - Personalized Federated Learning via Convex Clustering [72.15857783681658]
We propose a family of algorithms for personalized federated learning with locally convex user costs.
The proposed framework is based on a generalization of convex clustering in which the differences between different users' models are penalized.
arXiv Detail & Related papers (2022-02-01T19:25:31Z) - ClusterVO: Clustering Moving Instances and Estimating Visual Odometry
for Self and Surroundings [54.33327082243022]
ClusterVO is a stereo Visual Odometry which simultaneously clusters and estimates the motion of both ego and surrounding rigid clusters/objects.
Unlike previous solutions relying on batch input or imposing priors on scene structure or dynamic object models, ClusterVO is online, general and thus can be used in various scenarios including indoor scene understanding and autonomous driving.
arXiv Detail & Related papers (2020-03-29T09:06:28Z) - Multi-objective Consensus Clustering Framework for Flight Search
Recommendation [4.5782961896413035]
Clustering ensemble approaches were developed to overcome well-known problems of classical clustering approaches.
We present a new clustering ensemble multi-objective optimization-based framework developed for analyzing Amadeus customer search data.
arXiv Detail & Related papers (2020-02-20T03:56:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.