What the Weight?! A Unified Framework for Zero-Shot Knowledge
Composition
- URL: http://arxiv.org/abs/2401.12756v2
- Date: Thu, 25 Jan 2024 14:32:36 GMT
- Title: What the Weight?! A Unified Framework for Zero-Shot Knowledge
Composition
- Authors: Carolin Holtermann, Markus Frohmann, Navid Rekabsaz, Anne Lauscher
- Abstract summary: We propose a novel framework for zero-shot module composition, which encompasses existing and some novel variations for selecting, weighting, and combining parameter modules.
We conduct the first comprehensive benchmarking study of various zero-shot knowledge composition strategies.
Our results highlight the efficacy of ensembling but also hint at the power of simple though often-ignored weighting methods.
- Score: 20.742004197901576
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The knowledge encapsulated in a model is the core factor determining its
final performance on downstream tasks. Much research in NLP has focused on
efficient methods for storing and adapting different types of knowledge, e.g.,
in dedicated modularized structures, and on how to effectively combine these,
e.g., by learning additional parameters. However, given the many possible
options, a thorough understanding of the mechanisms involved in these
compositions is missing, and hence it remains unclear which strategies to
utilize. To address this research gap, we propose a novel framework for
zero-shot module composition, which encompasses existing and some novel
variations for selecting, weighting, and combining parameter modules under a
single unified notion. Focusing on the scenario of domain knowledge and adapter
layers, our framework provides a systematic unification of concepts, allowing
us to conduct the first comprehensive benchmarking study of various zero-shot
knowledge composition strategies. In particular, we test two module combination
methods and five selection and weighting strategies for their effectiveness and
efficiency in an extensive experimental setup. Our results highlight the
efficacy of ensembling but also hint at the power of simple though
often-ignored weighting methods. Further in-depth analyses allow us to
understand the role of weighting vs. top-k selection, and show that, to a
certain extent, the performance of adapter composition can even be predicted.
Related papers
- Towards a Unified View of Preference Learning for Large Language Models: A Survey [88.66719962576005]
Large Language Models (LLMs) exhibit remarkably powerful capabilities.
One of the crucial factors to achieve success is aligning the LLM's output with human preferences.
We decompose all the strategies in preference learning into four components: model, data, feedback, and algorithm.
arXiv Detail & Related papers (2024-09-04T15:11:55Z) - Pushing the Boundary: Specialising Deep Configuration Performance Learning [0.0]
This thesis begins with a systematic literature review of deep learning techniques in configuration performance modeling.
The first knowledge gap is the lack of understanding about which encoding scheme is better.
The second knowledge gap is the sparsity inherited from the configuration landscape.
arXiv Detail & Related papers (2024-07-02T22:59:19Z) - A Unified and General Framework for Continual Learning [58.72671755989431]
Continual Learning (CL) focuses on learning from dynamic and changing data distributions while retaining previously acquired knowledge.
Various methods have been developed to address the challenge of catastrophic forgetting, including regularization-based, Bayesian-based, and memory-replay-based techniques.
This research aims to bridge this gap by introducing a comprehensive and overarching framework that encompasses and reconciles these existing methodologies.
arXiv Detail & Related papers (2024-03-20T02:21:44Z) - Self-Supervised Representation Learning with Meta Comprehensive
Regularization [11.387994024747842]
We introduce a module called CompMod with Meta Comprehensive Regularization (MCR), embedded into existing self-supervised frameworks.
We update our proposed model through a bi-level optimization mechanism, enabling it to capture comprehensive features.
We provide theoretical support for our proposed method from information theory and causal counterfactual perspective.
arXiv Detail & Related papers (2024-03-03T15:53:48Z) - Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - Deep Active Ensemble Sampling For Image Classification [8.31483061185317]
Active learning frameworks aim to reduce the cost of data annotation by actively requesting the labeling for the most informative data points.
Some proposed approaches include uncertainty-based techniques, geometric methods, implicit combination of uncertainty-based and geometric approaches.
We present an innovative integration of recent progress in both uncertainty-based and geometric frameworks to enable an efficient exploration/exploitation trade-off in sample selection strategy.
Our framework provides two advantages: (1) accurate posterior estimation, and (2) tune-able trade-off between computational overhead and higher accuracy.
arXiv Detail & Related papers (2022-10-11T20:20:20Z) - Weakly Supervised Semantic Segmentation via Alternative Self-Dual
Teaching [82.71578668091914]
This paper establishes a compact learning framework that embeds the classification and mask-refinement components into a unified deep model.
We propose a novel alternative self-dual teaching (ASDT) mechanism to encourage high-quality knowledge interaction.
arXiv Detail & Related papers (2021-12-17T11:56:56Z) - Partner-Assisted Learning for Few-Shot Image Classification [54.66864961784989]
Few-shot Learning has been studied to mimic human visual capabilities and learn effective models without the need of exhaustive human annotation.
In this paper, we focus on the design of training strategy to obtain an elemental representation such that the prototype of each novel class can be estimated from a few labeled samples.
We propose a two-stage training scheme, which first trains a partner encoder to model pair-wise similarities and extract features serving as soft-anchors, and then trains a main encoder by aligning its outputs with soft-anchors while attempting to maximize classification performance.
arXiv Detail & Related papers (2021-09-15T22:46:19Z) - A generalized framework for active learning reliability: survey and
benchmark [0.0]
We propose a modular framework to build on-the-fly efficient active learning strategies.
We devise 39 strategies for the solution of 20 reliability benchmark problems.
arXiv Detail & Related papers (2021-06-03T09:33:59Z) - Dynamic Feature Integration for Simultaneous Detection of Salient
Object, Edge and Skeleton [108.01007935498104]
In this paper, we solve three low-level pixel-wise vision problems, including salient object segmentation, edge detection, and skeleton extraction.
We first show some similarities shared by these tasks and then demonstrate how they can be leveraged for developing a unified framework.
arXiv Detail & Related papers (2020-04-18T11:10:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.