Making Scalable Meta Learning Practical
- URL: http://arxiv.org/abs/2310.05674v2
- Date: Mon, 23 Oct 2023 14:16:49 GMT
- Title: Making Scalable Meta Learning Practical
- Authors: Sang Keun Choe, Sanket Vaibhav Mehta, Hwijeen Ahn, Willie Neiswanger,
Pengtao Xie, Emma Strubell, Eric Xing
- Abstract summary: meta learning has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support.
In this work, we focus on making scalable meta learning practical by introducing SAMA, which combines advances in both implicit differentiation algorithms and systems.
We show that SAMA-based data optimization leads to consistent improvements in text classification accuracy with BERT and RoBERTa large language models, and achieves state-of-the-art results in both small- and large-scale data pruning on image classification tasks.
- Score: 40.24886572503001
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite its flexibility to learn diverse inductive biases in machine learning
programs, meta learning (i.e., learning to learn) has long been recognized to
suffer from poor scalability due to its tremendous compute/memory costs,
training instability, and a lack of efficient distributed training support. In
this work, we focus on making scalable meta learning practical by introducing
SAMA, which combines advances in both implicit differentiation algorithms and
systems. Specifically, SAMA is designed to flexibly support a broad range of
adaptive optimizers in the base level of meta learning programs, while reducing
computational burden by avoiding explicit computation of second-order gradient
information, and exploiting efficient distributed training techniques
implemented for first-order gradients. Evaluated on multiple large-scale meta
learning benchmarks, SAMA showcases up to 1.7/4.8x increase in throughput and
2.0/3.8x decrease in memory consumption respectively on single-/multi-GPU
setups compared to other baseline meta learning algorithms. Furthermore, we
show that SAMA-based data optimization leads to consistent improvements in text
classification accuracy with BERT and RoBERTa large language models, and
achieves state-of-the-art results in both small- and large-scale data pruning
on image classification tasks, demonstrating the practical applicability of
scalable meta learning across language and vision domains.
Related papers
- Data Augmentation for Sparse Multidimensional Learning Performance Data Using Generative AI [17.242331892899543]
Learning performance data describe correct and incorrect answers or problem-solving attempts in adaptive learning.
Learning performance data tend to be highly sparse (80%(sim)90% missing observations) in most real-world applications due to adaptive item selection.
This article proposes a systematic framework for augmenting learner data to address data sparsity in learning performance data.
arXiv Detail & Related papers (2024-09-24T00:25:07Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Training Efficiency and Robustness in Deep Learning [2.6451769337566406]
We study approaches to improve the training efficiency and robustness of deep learning models.
We find that prioritizing learning on more informative training data increases convergence speed and improves generalization performance on test data.
We show that a redundancy-aware modification to the sampling of training data improves the training speed and develops an efficient method for detecting the diversity of training signal.
arXiv Detail & Related papers (2021-12-02T17:11:33Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Memory Efficient Meta-Learning with Large Images [62.70515410249566]
Meta learning approaches to few-shot classification are computationally efficient at test time requiring just a few optimization steps or single forward pass to learn a new task.
This limitation arises because a task's entire support set, which can contain up to 1000 images, must be processed before an optimization step can be taken.
We propose LITE, a general and memory efficient episodic training scheme that enables meta-training on large tasks composed of large images on a single GPU.
arXiv Detail & Related papers (2021-07-02T14:37:13Z) - Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification.
Our strategy enables important aspects of the base learner objective to be learned during meta-training.
We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.