A Markov Decision Process Approach to Active Meta Learning
- URL: http://arxiv.org/abs/2009.04950v1
- Date: Thu, 10 Sep 2020 15:45:34 GMT
- Title: A Markov Decision Process Approach to Active Meta Learning
- Authors: Bingjia Wang, Alec Koppel and Vikram Krishnamurthy
- Abstract summary: In supervised learning, we fit a single statistical model to a given data set, assuming that the data is associated with a singular task.
In meta-learning, the data is associated with numerous tasks, and we seek a model that may perform well on all tasks simultaneously.
- Score: 24.50189361694407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In supervised learning, we fit a single statistical model to a given data
set, assuming that the data is associated with a singular task, which yields
well-tuned models for specific use, but does not adapt well to new contexts. By
contrast, in meta-learning, the data is associated with numerous tasks, and we
seek a model that may perform well on all tasks simultaneously, in pursuit of
greater generalization. One challenge in meta-learning is how to exploit
relationships between tasks and classes, which is overlooked by commonly used
random or cyclic passes through data. In this work, we propose actively
selecting samples on which to train by discerning covariates inside and between
meta-training sets. Specifically, we cast the problem of selecting a sample
from a number of meta-training sets as either a multi-armed bandit or a Markov
Decision Process (MDP), depending on how one encapsulates correlation across
tasks. We develop scheduling schemes based on Upper Confidence Bound (UCB),
Gittins Index and tabular Markov Decision Problems (MDPs) solved with linear
programming, where the reward is the scaled statistical accuracy to ensure it
is a time-invariant function of state and action. Across a variety of
experimental contexts, we observe significant reductions in sample complexity
of active selection scheme relative to cyclic or i.i.d. sampling, demonstrating
the merit of exploiting covariates in practice.
Related papers
- Adapt-$\infty$: Scalable Lifelong Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for Lifelong Instruction Tuning.
We construct pseudo-skill clusters by grouping gradient-based sample vectors.
We select the best-performing data selector for each skill cluster from a pool of selector experts.
arXiv Detail & Related papers (2024-10-14T15:48:09Z) - Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process.
vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner.
We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning [53.52699766206808]
We propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning.
We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches and learnable baselines.
arXiv Detail & Related papers (2023-05-23T20:15:56Z) - Mixing Deep Learning and Multiple Criteria Optimization: An Application
to Distributed Learning with Multiple Datasets [0.0]
Training phase is the most important stage during the machine learning process.
We develop a multiple criteria optimization model in which each criterion measures the distance between the output associated with a specific input and its label.
We propose a scalarization approach to implement this model and numerical experiments in digit classification using MNIST data.
arXiv Detail & Related papers (2021-12-02T16:00:44Z) - BAMLD: Bayesian Active Meta-Learning by Disagreement [39.59987601426039]
This paper introduces an information-theoretic active task selection mechanism to decrease the number of labeling requests for meta-training tasks.
We report its empirical performance results that compare favourably against existing acquisition mechanisms.
arXiv Detail & Related papers (2021-10-19T13:06:51Z) - Meta-Regularization by Enforcing Mutual-Exclusiveness [0.8057006406834467]
We propose a regularization technique for meta-learning models that gives the model designer more control over the information flow during meta-training.
Our proposed regularization function shows an accuracy boost of $sim$ $36%$ on the Omniglot dataset.
arXiv Detail & Related papers (2021-01-24T22:57:19Z) - Probabilistic Active Meta-Learning [15.432006404678981]
We introduce task selection based on prior experience into a meta-learning algorithm.
We provide empirical evidence that our approach improves data-efficiency when compared to strong baselines on simulated robotic experiments.
arXiv Detail & Related papers (2020-07-17T12:51:42Z) - Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time.
We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.