Related papers: OOD-Chameleon: Is Algorithm Selection for OOD Generalization Learnable?

OOD-Chameleon: Is Algorithm Selection for OOD Generalization Learnable?

URL: http://arxiv.org/abs/2410.02735v2
Date: Sat, 31 May 2025 07:31:51 GMT
Title: OOD-Chameleon: Is Algorithm Selection for OOD Generalization Learnable?
Authors: Liangze Jiang, Damien Teney,
Abstract summary: This work explores the possibility of learning the selection of a training algorithm for OOD generalization.<n>We propose a proof of concept that formulates the selection as a multi-label classification over candidate algorithms.<n>We evaluate the ability of OOD-Chameleon to rank algorithms on unseen shifts and datasets.
Score: 18.801143204410913
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Out-of-distribution (OOD) generalization is challenging because distribution shifts come in many forms. Numerous algorithms exist to address specific settings, but choosing the right training algorithm for the right dataset without trial and error is difficult. Indeed, real-world applications often involve multiple types and combinations of shifts that are hard to analyze theoretically. Method. This work explores the possibility of learning the selection of a training algorithm for OOD generalization. We propose a proof of concept (OOD-Chameleon) that formulates the selection as a multi-label classification over candidate algorithms, trained on a dataset of datasets representing a variety of shifts. We evaluate the ability of OOD-Chameleon to rank algorithms on unseen shifts and datasets based only on dataset characteristics, i.e., without training models first, unlike traditional model selection. Findings. Extensive experiments show that the learned selector identifies high-performing algorithms across synthetic, vision, and language tasks. Further inspection shows that it learns non-trivial decision rules, which provide new insights into the applicability of existing algorithms. Overall, this new approach opens the possibility of better exploiting and understanding the plethora of existing algorithms for OOD generalization.

Related papers

Data Selection for ERMs [67.57726352698933]
We study how well can $mathcalA$ perform when trained on at most $nll N$ data points selected from a population of $N$ points. Our results include optimal data-selection bounds for mean estimation, linear classification, and linear regression.
arXiv Detail & Related papers (2025-04-20T11:26:01Z)
TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data [29.45013725650798]
It is essential to extract a subset of instruction datasets that achieves comparable performance to the full dataset. We propose Task-Agnostic Gradient Clustered COreset Selection (TAGCOS) Specifically, we leverage sample gradients as the data representations, perform clustering to group similar data, and apply an efficient greedy algorithm for coreset selection.
arXiv Detail & Related papers (2024-07-21T17:59:20Z)
DsDm: Model-Aware Dataset Selection with Datamodels [81.01744199870043]
Standard practice is to filter for examples that match human notions of data quality. We find that selecting according to similarity with "high quality" data sources may not increase (and can even hurt) performance compared to randomly selecting data. Our framework avoids handpicked notions of data quality, and instead models explicitly how the learning process uses train datapoints to predict on the target tasks.
arXiv Detail & Related papers (2024-01-23T17:22:00Z)
Towards Better Out-of-Distribution Generalization of Neural Algorithmic Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks. The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z)
A Generalist Neural Algorithmic Learner [18.425083543441776]
We build a single graph neural network processor capable of learning to execute a wide range of algorithms. We show that it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime.
arXiv Detail & Related papers (2022-09-22T16:41:33Z)
Towards Diverse Evaluation of Class Incremental Learning: A Representation Learning Perspective [67.45111837188685]
Class incremental learning (CIL) algorithms aim to continually learn new object classes from incrementally arriving data. We experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning.
arXiv Detail & Related papers (2022-06-16T11:44:11Z)
Sub-Setting Algorithm for Training Data Selection in Pattern Recognition [0.0]
This paper proposes a training data selection algorithm that identifies multiple subsets with simple structures. A sub-setting algorithm identifies multiple subsets with simple local patterns by identifying similar instances in the neighborhood of an instance. Our bottom-up sub-setting algorithm performed on an average 15% better than the top-down decision tree learned on the entire dataset.
arXiv Detail & Related papers (2021-10-13T06:42:41Z)
Algorithm Selection on a Meta Level [58.720142291102135]
We introduce the problem of meta algorithm selection, which essentially asks for the best way to combine a given set of algorithm selectors. We present a general methodological framework for meta algorithm selection as well as several concrete learning methods as instantiations of this framework.
arXiv Detail & Related papers (2021-07-20T11:23:21Z)
Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models. The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning. We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z)
Towards Understanding the Behaviors of Optimal Deep Active Learning Algorithms [19.65665942630067]
Active learning (AL) algorithms may achieve better performance with fewer data because the model guides the data selection process. There is little study on what the optimal AL looks like, which would help researchers understand where their models fall short. We present a simulated annealing algorithm to search for this optimal oracle and analyze it for several tasks.
arXiv Detail & Related papers (2020-12-29T22:56:42Z)
Finding the Homology of Decision Boundaries with Active Learning [26.31885403636642]
We propose an active learning algorithm to recover the homology of decision boundaries. Our algorithm sequentially and adaptively selects which samples it requires the labels of. Experiments on several datasets show the sample complexity improvement in recovering the homology.
arXiv Detail & Related papers (2020-11-19T04:22:06Z)
Strong Generalization and Efficiency in Neural Programs [69.18742158883869]
We study the problem of learning efficient algorithms that strongly generalize in the framework of neural program induction. By carefully designing the input / output interfaces of the neural model and through imitation, we are able to learn models that produce correct results for arbitrary input sizes.
arXiv Detail & Related papers (2020-07-07T17:03:02Z)
Run2Survive: A Decision-theoretic Approach to Algorithm Selection based on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime. We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive. In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z)
On the Robustness of Active Learning [0.7340017786387767]
Active Learning is concerned with how to identify the most useful samples for a Machine Learning algorithm to be trained with. We find that it is often applied with not enough care and domain knowledge. We propose the new "Sum of Squared Logits" method based on the Simpson diversity index and investigate the effect of using the confusion matrix for balancing in sample selection.
arXiv Detail & Related papers (2020-06-18T09:07:23Z)
Fase-AL -- Adaptation of Fast Adaptive Stacking of Ensembles for Supporting Active Learning [0.0]
This work presents the FASE-AL algorithm which induces classification models with non-labeled instances using Active Learning. The algorithm achieves promising results in terms of the percentage of correctly classified instances.
arXiv Detail & Related papers (2020-01-30T17:25:47Z)
Stepwise Model Selection for Sequence Prediction via Deep Kernel Learning [100.83444258562263]
We propose a novel Bayesian optimization (BO) algorithm to tackle the challenge of model selection in this setting. In order to solve the resulting multiple black-box function optimization problem jointly and efficiently, we exploit potential correlations among black-box functions. We are the first to formulate the problem of stepwise model selection (SMS) for sequence prediction, and to design and demonstrate an efficient joint-learning algorithm for this purpose.
arXiv Detail & Related papers (2020-01-12T09:42:19Z)
Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback. We devise an algorithm with a minimal cluster recovery error rate. For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.