Related papers: CLAMS: A System for Zero-Shot Model Selection for Clustering

CLAMS: A System for Zero-Shot Model Selection for Clustering

URL: http://arxiv.org/abs/2407.11286v1
Date: Mon, 15 Jul 2024 23:50:07 GMT
Title: CLAMS: A System for Zero-Shot Model Selection for Clustering
Authors: Prabhant Singh, Pieter Gijsbers, Murat Onur Yildirim, Elif Ceren Gok, Joaquin Vanschoren,
Abstract summary: We propose an AutoML system that enables model selection on clustering problems by leveraging optimal transport-based dataset similarity. We compare our results against multiple clustering baselines and find that it outperforms all of them, hence demonstrating the utility of similarity-based automated model selection for solving clustering applications.
Score: 3.7127285734321194
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose an AutoML system that enables model selection on clustering problems by leveraging optimal transport-based dataset similarity. Our objective is to establish a comprehensive AutoML pipeline for clustering problems and provide recommendations for selecting the most suitable algorithms, thus opening up a new area of AutoML beyond the traditional supervised learning settings. We compare our results against multiple clustering baselines and find that it outperforms all of them, hence demonstrating the utility of similarity-based automated model selection for solving clustering applications.

Related papers

SPILL: Domain-Adaptive Intent Clustering based on Selection and Pooling with Large Language Models [5.257115841810258]
Selection and Pooling with Large Language Models (SPILL) is an intuitive and domain-adaptive method for intent clustering without fine-tuning. Our goal is to make existing embedders more generalizable to new domain datasets without further fine-tuning. Our method achieves comparable results to other state-of-the-art studies, even those that use much larger models and require fine-tuning.
arXiv Detail & Related papers (2025-03-19T15:48:57Z)
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild [84.57103623507082]
This paper introduces Model-GLUE, a holistic Large Language Models scaling guideline. Our work starts with a benchmarking of existing LLM scaling techniques, especially selective merging, and variants of mixture. Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters through a model mixture.
arXiv Detail & Related papers (2024-10-07T15:55:55Z)
Problem-oriented AutoML in Clustering [2.541080349729282]
The Problem-oriented AutoML in Clustering (PoAC) framework introduces a novel, flexible approach to automating clustering tasks. PoAC establishes a dynamic connection between the clustering problem, CVIs, and meta-features, allowing users to customize these components. PoAC is algorithm-agnostic, adapting seamlessly to different clustering problems without requiring additional data or retraining.
arXiv Detail & Related papers (2024-09-24T16:25:53Z)
An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences. We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration. Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z)
Large Language Models Enable Few-Shot Clustering [88.06276828752553]
We show that large language models can amplify an expert's guidance to enable query-efficient, few-shot semi-supervised text clustering. We find incorporating LLMs in the first two stages can routinely provide significant improvements in cluster quality.
arXiv Detail & Related papers (2023-07-02T09:17:11Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection [19.066989850964756]
We introduce a discriminative clustering model trying to maximise a geometry-aware generalisation of the mutual information called GEMINI. This algorithm avoids the burden of feature exploration and is easily scalable to high-dimensional data and large amounts of samples while only designing a discriminative clustering model. Our results show that Sparse GEMINI is a competitive algorithm and has the ability to select relevant subsets of variables with respect to the clustering without using relevance criteria or prior hypotheses.
arXiv Detail & Related papers (2023-02-07T10:52:04Z)
A Deep Neural Networks ensemble workflow from hyperparameter search to inference leveraging GPU clusters [0.0]
AutoML seeks to automatically build ensembles of Deep Neural Networks (DNNs) to achieve qualitative predictions. We propose a new AutoML to build a larger library of accurate and diverse individual models to then construct ensembles. New ensemble selection method based on a multi-objective greedy algorithm is proposed to generate accurate ensembles.
arXiv Detail & Related papers (2022-08-30T08:04:19Z)
Late Fusion Multi-view Clustering via Global and Local Alignment Maximization [61.89218392703043]
Multi-view clustering (MVC) optimally integrates complementary information from different views to improve clustering performance. Most of existing approaches directly fuse multiple pre-specified similarities to learn an optimal similarity matrix for clustering. We propose late fusion MVC via alignment to address these issues.
arXiv Detail & Related papers (2022-08-02T01:49:31Z)
Personalized Federated Learning via Convex Clustering [72.15857783681658]
We propose a family of algorithms for personalized federated learning with locally convex user costs. The proposed framework is based on a generalization of convex clustering in which the differences between different users' models are penalized.
arXiv Detail & Related papers (2022-02-01T19:25:31Z)
ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings [54.33327082243022]
ClusterVO is a stereo Visual Odometry which simultaneously clusters and estimates the motion of both ego and surrounding rigid clusters/objects. Unlike previous solutions relying on batch input or imposing priors on scene structure or dynamic object models, ClusterVO is online, general and thus can be used in various scenarios including indoor scene understanding and autonomous driving.
arXiv Detail & Related papers (2020-03-29T09:06:28Z)
Multi-objective Consensus Clustering Framework for Flight Search Recommendation [4.5782961896413035]
Clustering ensemble approaches were developed to overcome well-known problems of classical clustering approaches. We present a new clustering ensemble multi-objective optimization-based framework developed for analyzing Amadeus customer search data.
arXiv Detail & Related papers (2020-02-20T03:56:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.