Explaining AutoClustering: Uncovering Meta-Feature Contribution in AutoML for Clustering
- URL: http://arxiv.org/abs/2602.18348v1
- Date: Fri, 20 Feb 2026 17:01:25 GMT
- Title: Explaining AutoClustering: Uncovering Meta-Feature Contribution in AutoML for Clustering
- Authors: Matheus Camilo da Silva, Leonardo Arrighi, Ana Carolina Lorena, Sylvio Barbon Junior,
- Abstract summary: AutoClustering methods often leverage meta-learning over dataset meta-features.<n>This limits reliability, bias diagnostics, and efficient meta-feature engineering.<n>This study offers a practical foundation for increasing decision transparency in unsupervised learning automation.
- Score: 0.6487259764989486
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AutoClustering methods aim to automate unsupervised learning tasks, including algorithm selection (AS), hyperparameter optimization (HPO), and pipeline synthesis (PS), by often leveraging meta-learning over dataset meta-features. While these systems often achieve strong performance, their recommendations are often difficult to justify: the influence of dataset meta-features on algorithm and hyperparameter choices is typically not exposed, limiting reliability, bias diagnostics, and efficient meta-feature engineering. This limits reliability and diagnostic insight for further improvements. In this work, we investigate the explainability of the meta-models in AutoClustering. We first review 22 existing methods and organize their meta-features into a structured taxonomy. We then apply a global explainability technique (i.e., Decision Predicate Graphs) to assess feature importance within meta-models from selected frameworks. Finally, we use local explainability tools such as SHAP (SHapley Additive exPlanations) to analyse specific clustering decisions. Our findings highlight consistent patterns in meta-feature relevance, identify structural weaknesses in current meta-learning strategies that can distort recommendations, and provide actionable guidance for more interpretable Automated Machine Learning (AutoML) design. This study therefore offers a practical foundation for increasing decision transparency in unsupervised learning automation.
Related papers
- MacNet: An End-to-End Manifold-Constrained Adaptive Clustering Network for Interpretable Whole Slide Image Classification [9.952997875404634]
Clustering-based approaches can provide explainable decision-making process but suffer from high dimension features and semantically ambiguous centroids.<n>We propose an end-to-end MIL framework that integrates Grassmann re-embedding and manifold adaptive clustering.<n> Experiments on multicentre WSI datasets demonstrate that: 1) our cluster-incorporated model achieves superior performance in both grading accuracy and interpretability; 2) end-to-end learning refines better feature representations and it requires acceptable resources.
arXiv Detail & Related papers (2026-02-16T06:43:36Z) - Control Reinforcement Learning: Interpretable Token-Level Steering of LLMs via Sparse Autoencoder Features [1.5874067490843806]
Control Reinforcement Learning trains a policy to select SAE features for steering at each token, producing interpretable intervention logs.<n> Adaptive Feature Masking encourages diverse feature discovery while preserving singlefeature interpretability.<n>On Gemma 2 2B across MMLU, BBQ, GSM8K, HarmBench, and XSTest, CRL achieves improvements while providing per-token intervention logs.
arXiv Detail & Related papers (2026-02-11T02:28:49Z) - DeepCAVE: A Visualization and Analysis Tool for Automated Machine Learning [71.68388452009194]
We present DeepCAVE, a tool for interactive visualization and analysis, providing insights into HPO.<n>Through an interactive dashboard, researchers, data scientists, and ML engineers can explore various aspects of the HPO process.<n>DeepCAVE contributes to the interpretability of HPO and ML on a design level and aims to foster the development of more robust and efficient methodologies in the future.
arXiv Detail & Related papers (2025-12-01T15:45:30Z) - SCAN: Structured Capability Assessment and Navigation for LLMs [54.54085382131134]
textbfSCAN (Structured Capability Assessment and Navigation) is a practical framework that enables detailed characterization of Large Language Models.<n>SCAN incorporates four key components:.<n>TaxBuilder, which extracts capability-indicating tags from queries to construct a hierarchical taxonomy;.<n>RealMix, a query synthesis and filtering mechanism that ensures sufficient evaluation data for each capability tag;.<n>A PC$2$-based (Pre-Comparison-derived Criteria) LLM-as-a-Judge approach achieves significantly higher accuracy compared to classic LLM-as-a-Judge method
arXiv Detail & Related papers (2025-05-10T16:52:40Z) - Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use.<n>MeCo quantifies metacognitive scores by capturing high-level cognitive signals in the representation space.<n>MeCo is fine-tuning-free and incurs minimal cost.
arXiv Detail & Related papers (2025-02-18T15:45:01Z) - Deciphering AutoML Ensembles: cattleia's Assistance in Decision-Making [0.0]
Cattleia is an application that deciphers the ensembles for regression, multiclass, and binary classification tasks.
It works with models built by three AutoML packages: auto-sklearn, AutoGluon, and FLAML.
arXiv Detail & Related papers (2024-03-19T11:56:21Z) - AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA.
It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z) - AutoML-GPT: Large Language Model for AutoML [5.9145212342776805]
We have established a framework called AutoML-GPT that integrates a comprehensive set of tools and libraries.
Through a conversational interface, users can specify their requirements, constraints, and evaluation metrics.
We have demonstrated that AutoML-GPT significantly reduces the time and effort required for machine learning tasks.
arXiv Detail & Related papers (2023-09-03T09:39:49Z) - On Taking Advantage of Opportunistic Meta-knowledge to Reduce
Configuration Spaces for Automated Machine Learning [11.670797168818773]
Key research question is whether it is possible and practical to preemptively avoid costly evaluations of poorly performing ML pipelines.
Numerous experiments with the AutoWeka4MCPS package suggest that opportunistic/systematic meta-knowledge can improve ML outcomes.
We observe strong sensitivity to the challenge' of a dataset, i.e. whether specificity in choosing a predictor leads to significantly better performance.
arXiv Detail & Related papers (2022-08-08T19:22:24Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z) - Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification.
Our strategy enables important aspects of the base learner objective to be learned during meta-training.
We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.