Related papers: Investigating Task Arithmetic for Zero-Shot Information Retrieval

Investigating Task Arithmetic for Zero-Shot Information Retrieval

URL: http://arxiv.org/abs/2505.00649v1
Date: Thu, 01 May 2025 16:48:37 GMT
Title: Investigating Task Arithmetic for Zero-Shot Information Retrieval
Authors: Marco Braga, Pranav Kasela, Alessandro Raganato, Gabriella Pasi,
Abstract summary: Task Arithmetic is a technique that combines the weights of Large Language Models pre-trained on different tasks or domains via simple mathematical operations.<n>Our method is able to synthesize diverse tasks and domain knowledge into a single model, enabling effective zero-shot adaptation in different retrieval contexts.
Score: 47.300506002171275
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have shown impressive zero-shot performance across a variety of Natural Language Processing tasks, including document re-ranking. However, their effectiveness degrades on unseen tasks and domains, largely due to shifts in vocabulary and word distributions. In this paper, we investigate Task Arithmetic, a technique that combines the weights of LLMs pre-trained on different tasks or domains via simple mathematical operations, such as addition or subtraction, to adapt retrieval models without requiring additional fine-tuning. Our method is able to synthesize diverse tasks and domain knowledge into a single model, enabling effective zero-shot adaptation in different retrieval contexts. Extensive experiments on publicly available scientific, biomedical, and multilingual datasets show that our method improves state-of-the-art re-ranking performance by up to 18% in NDCG@10 and 15% in P@10. In addition to these empirical gains, our analysis provides insights into the strengths and limitations of Task Arithmetic as a practical strategy for zero-shot learning and model adaptation. We make our code publicly available at https://github.com/DetectiveMB/Task-Arithmetic-for-ZS-IR.

Related papers

Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.<n>We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.<n>We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
Task Arithmetic Through The Lens Of One-Shot Federated Learning [3.8230727103887943]
Task Arithmetic is a model merging technique that enables the combination of multiple models' capabilities into a single model. We show that Task Arithmetic is mathematically equivalent to the commonly used algorithm in Federated Learning. We adapt several algorithms from Federated Learning to improve the effectiveness of Task Arithmetic.
arXiv Detail & Related papers (2024-11-27T18:53:41Z)
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic [6.46176287368784]
We propose textbfModel textbfExclusive textbfTask textbfArithmetic for merging textbfGPT-scale models. Our proposed MetaGPT is data-agnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs.
arXiv Detail & Related papers (2024-06-17T10:12:45Z)
OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs. Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z)
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks. We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain. We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z)
Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification. Our key idea is to represent each task using gradient information from a base model. Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z)
Rectification-based Knowledge Retention for Continual Learning [49.1447478254131]
Deep learning models suffer from catastrophic forgetting when trained in an incremental learning setting. We propose a novel approach to address the task incremental learning problem, which involves training a model on new tasks that arrive in an incremental manner. Our approach can be used in both the zero-shot and non zero-shot task incremental learning settings.
arXiv Detail & Related papers (2021-03-30T18:11:30Z)
Making Pre-trained Language Models Better Few-shot Learners [11.90626040104822]
Recent GPT-3 model achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller language models for which fine-tuning is computationally efficient. We present LM-BFF--better few-shot fine-tuning of language models--a suite of simple and complementary techniques for fine-tuning language models on a small number of annotated examples.
arXiv Detail & Related papers (2020-12-31T17:21:26Z)
Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks. We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.