Toward Unsupervised Outlier Model Selection
- URL: http://arxiv.org/abs/2211.01834v1
- Date: Thu, 3 Nov 2022 14:14:46 GMT
- Title: Toward Unsupervised Outlier Model Selection
- Authors: Yue Zhao, Sean Zhang, Leman Akoglu
- Abstract summary: ELECT is a new approach to select an effective model on a new dataset without any labels.
It is based on meta-learning; transferring prior knowledge (e.g. model performance) on historical datasets that are similar to the new one.
It can serve an output on-demand, being able to accommodate varying time budgets.
- Score: 20.12322454417006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Today there exists no shortage of outlier detection algorithms in the
literature, yet the complementary and critical problem of unsupervised outlier
model selection (UOMS) is vastly understudied. In this work we propose ELECT, a
new approach to select an effective candidate model, i.e. an outlier detection
algorithm and its hyperparameter(s), to employ on a new dataset without any
labels. At its core, ELECT is based on meta-learning; transferring prior
knowledge (e.g. model performance) on historical datasets that are similar to
the new one to facilitate UOMS. Uniquely, it employs a dataset similarity
measure that is performance-based, which is more direct and goal-driven than
other measures used in the past. ELECT adaptively searches for similar
historical datasets, as such, it can serve an output on-demand, being able to
accommodate varying time budgets. Extensive experiments show that ELECT
significantly outperforms a wide range of basic UOMS baselines, including no
model selection (always using the same popular model such as iForest) as well
as more recent selection strategies based on meta-features.
Related papers
- Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - Adapt-$\infty$: Scalable Lifelong Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for Lifelong Instruction Tuning.
We construct pseudo-skill clusters by grouping gradient-based sample vectors.
We select the best-performing data selector for each skill cluster from a pool of selector experts.
arXiv Detail & Related papers (2024-10-14T15:48:09Z) - Zero-shot Outlier Detection via Prior-data Fitted Networks: Model Selection Bygone! [28.823740273813296]
Outlier detection (OD) has numerous applications in environmental monitoring, cybersecurity, finance, and medicine.
Being an inherently unsupervised task, model selection is a key bottleneck for OD without label supervision.
We present FoMo-0D, for zero/0-shot OD exploring a transformative new direction that bypasses the hurdle of model selection altogether.
arXiv Detail & Related papers (2024-09-09T14:41:24Z) - LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science.
Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z) - Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models [38.39395973523944]
We propose a three-stage scheme for data selection and review existing works according to this scheme.
We find that the more targeted method with data-specific and model-specific quality labels has higher efficiency.
arXiv Detail & Related papers (2024-06-20T08:58:58Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - DsDm: Model-Aware Dataset Selection with Datamodels [81.01744199870043]
Standard practice is to filter for examples that match human notions of data quality.
We find that selecting according to similarity with "high quality" data sources may not increase (and can even hurt) performance compared to randomly selecting data.
Our framework avoids handpicked notions of data quality, and instead models explicitly how the learning process uses train datapoints to predict on the target tasks.
arXiv Detail & Related papers (2024-01-23T17:22:00Z) - Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning [47.02160072880698]
We introduce a self-evolving mechanism that allows the model itself to actively sample subsets that are equally or even more effective.
The key to our data sampling technique lies in the enhancement of diversity in the chosen subsets.
Extensive experiments across three datasets and benchmarks demonstrate the effectiveness of DiverseEvol.
arXiv Detail & Related papers (2023-11-14T14:10:40Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Automating Outlier Detection via Meta-Learning [37.736124230543865]
We develop the first principled data-driven approach to model selection for outlier detection, called MetaOD, based on meta-learning.
We show the effectiveness of MetaOD in selecting a detection model that significantly outperforms the most popular outlier detectors.
To foster and further research on this new problem, we open-source our entire meta-learning system, benchmark environment, and testbed datasets.
arXiv Detail & Related papers (2020-09-22T15:14:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.