Related papers: Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-learning

Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-learning

URL: http://arxiv.org/abs/2206.01690v1
Date: Fri, 3 Jun 2022 17:09:26 GMT
Title: Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-learning
Authors: Arnav Chavan, Rishabh Tiwari, Udbhav Bamba, Deepak K. Gupta
Abstract summary: We present MetaDOCK, a task-specific dynamic kernel selection strategy for designing compressed CNN models. Our method is based on the hypothesis that for a given set of similar tasks, not all kernels of the network are needed by each individual task. We show that for the same inference budget, pruned versions of large CNN models obtained using our approach consistently outperform the conventional choices of CNN models.
Score: 9.176056742068813
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Gradient based meta-learning methods are prone to overfit on the meta-training set, and this behaviour is more prominent with large and complex networks. Moreover, large networks restrict the application of meta-learning models on low-power edge devices. While choosing smaller networks avoid these issues to a certain extent, it affects the overall generalization leading to reduced performance. Clearly, there is an approximately optimal choice of network architecture that is best suited for every meta-learning problem, however, identifying it beforehand is not straightforward. In this paper, we present MetaDOCK, a task-specific dynamic kernel selection strategy for designing compressed CNN models that generalize well on unseen tasks in meta-learning. Our method is based on the hypothesis that for a given set of similar tasks, not all kernels of the network are needed by each individual task. Rather, each task uses only a fraction of the kernels, and the selection of the kernels per task can be learnt dynamically as a part of the inner update steps. MetaDOCK compresses the meta-model as well as the task-specific inner models, thus providing significant reduction in model size for each task, and through constraining the number of active kernels for every task, it implicitly mitigates the issue of meta-overfitting. We show that for the same inference budget, pruned versions of large CNN models obtained using our approach consistently outperform the conventional choices of CNN models. MetaDOCK couples well with popular meta-learning approaches such as iMAML. The efficacy of our method is validated on CIFAR-fs and mini-ImageNet datasets, and we have observed that our approach can provide improvements in model accuracy of up to 2% on standard meta-learning benchmark, while reducing the model size by more than 75%.

Related papers

Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost. We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion. By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z)
Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data. For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z)
Few-Shot Learning of Compact Models via Task-Specific Meta Distillation [16.683801607142257]
We consider a new problem of few-shot learning of compact models. We propose task-specific meta distillation that simultaneously learns two models in meta-learning.
arXiv Detail & Related papers (2022-10-18T15:06:47Z)
Meta-Learning with Self-Improving Momentum Target [72.98879709228981]
We propose Self-improving Momentum Target (SiMT) to improve the performance of a meta-learner. SiMT generates the target model by adapting from the temporal ensemble of the meta-learner. We show that SiMT brings a significant performance gain when combined with a wide range of meta-learning methods.
arXiv Detail & Related papers (2022-10-11T06:45:15Z)
Learning Instance and Task-Aware Dynamic Kernels for Few Shot Learning [32.3217883750605]
We learn the dynamic kernels of a convolution network as a function of the task at hand, enabling faster generalization. We empirically show that our model improves performance on few-shot classification and detection tasks.
arXiv Detail & Related papers (2021-12-07T04:52:36Z)
Energy-Efficient and Federated Meta-Learning via Projected Stochastic Gradient Ascent [79.58680275615752]
We propose an energy-efficient federated meta-learning framework. We assume each task is owned by a separate agent, so a limited number of tasks is used to train a meta-model.
arXiv Detail & Related papers (2021-05-31T08:15:44Z)
Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK) Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework. We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z)
Meta-Regularization by Enforcing Mutual-Exclusiveness [0.8057006406834467]
We propose a regularization technique for meta-learning models that gives the model designer more control over the information flow during meta-training. Our proposed regularization function shows an accuracy boost of $sim$ $36%$ on the Omniglot dataset.
arXiv Detail & Related papers (2021-01-24T22:57:19Z)
MetaGater: Fast Learning of Conditional Channel Gated Networks via Federated Meta-Learning [46.79356071007187]
We propose a holistic approach to jointly train the backbone network and the channel gating. We develop a federated meta-learning approach to jointly learn good meta-initializations for both backbone networks and gating modules.
arXiv Detail & Related papers (2020-11-25T04:26:23Z)
Improving Generalization in Meta-learning via Task Augmentation [69.83677015207527]
We propose two task augmentation methods, including MetaMix and Channel Shuffle. Both MetaMix and Channel Shuffle outperform state-of-the-art results by a large margin across many datasets.
arXiv Detail & Related papers (2020-07-26T01:50:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.