Wormhole MAML: Meta-Learning in Glued Parameter Space
- URL: http://arxiv.org/abs/2212.14094v1
- Date: Wed, 28 Dec 2022 20:46:05 GMT
- Title: Wormhole MAML: Meta-Learning in Glued Parameter Space
- Authors: Chih-Jung Tracy Chang, Yuan Gao, Beicheng Lou
- Abstract summary: We introduce a novel variation of model-agnostic meta-learning, where an extra multiplicative parameter is introduced in the inner-loop adaptation.
Our variation creates a shortcut in the parameter space for the inner-loop adaptation and increases model expressivity in a highly controllable manner.
- Score: 4.785489100601398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce a novel variation of model-agnostic
meta-learning, where an extra multiplicative parameter is introduced in the
inner-loop adaptation. Our variation creates a shortcut in the parameter space
for the inner-loop adaptation and increases model expressivity in a highly
controllable manner. We show both theoretically and numerically that our
variation alleviates the problem of conflicting gradients and improves training
dynamics. We conduct experiments on 3 distinctive problems, including a toy
classification problem for threshold comparison, a regression problem for
wavelet transform, and a classification problem on MNIST. We also discuss ways
to generalize our method to a broader class of problems.
Related papers
- A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models [45.82689769685688]
Post-training has emerged as a crucial paradigm for adapting large-scale pre-trained models to various tasks.
We introduce extensions to existing techniques like DARE and BitDelta to enhance the applicability and effectiveness of delta parameter editing in post-trained models.
arXiv Detail & Related papers (2024-10-17T17:56:53Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Few-Shot Class Incremental Learning via Robust Transformer Approach [16.590193619691416]
Few-Shot Class-Incremental Learning presents an extension of the Class Incremental Learning problem where a model is faced with the problem of data scarcity.
This problem remains an open problem because all recent works are built upon the convolutional neural networks performing sub-optimally.
Our paper presents Robust Transformer Approach built upon the Compact Convolution Transformer.
arXiv Detail & Related papers (2024-05-08T03:35:52Z) - Neural network analysis of neutron and X-ray reflectivity data:
Incorporating prior knowledge for tackling the phase problem [141.5628276096321]
We present an approach that utilizes prior knowledge to regularize the training process over larger parameter spaces.
We demonstrate the effectiveness of our method in various scenarios, including multilayer structures with box model parameterization.
In contrast to previous methods, our approach scales favorably when increasing the complexity of the inverse problem.
arXiv Detail & Related papers (2023-06-28T11:15:53Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Attention, Filling in The Gaps for Generalization in Routing Problems [5.210197476419621]
This paper aims at encouraging the consolidation of the field through understanding and improving current existing models.
We first target model discrepancies by adapting the Kool et al. method and its loss function for Sparse Dynamic Attention.
We then target inherent differences through the use of a mixed instance training method that has been shown to outperform single instance training in certain scenarios.
arXiv Detail & Related papers (2022-07-14T21:36:51Z) - Total Deep Variation for Linear Inverse Problems [71.90933869570914]
We propose a novel learnable general-purpose regularizer exploiting recent architectural design patterns from deep learning.
We show state-of-the-art performance for classical image restoration and medical image reconstruction problems.
arXiv Detail & Related papers (2020-01-14T19:01:50Z) - FLAT: Few-Shot Learning via Autoencoding Transformation Regularizers [67.46036826589467]
We present a novel regularization mechanism by learning the change of feature representations induced by a distribution of transformations without using the labels of data examples.
It could minimize the risk of overfitting into base categories by inspecting the transformation-augmented variations at the encoded feature level.
Experiment results show the superior performances to the current state-of-the-art methods in literature.
arXiv Detail & Related papers (2019-12-29T15:26:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.