Related papers: CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models

CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models

URL: http://arxiv.org/abs/2305.16876v1
Date: Tue, 23 May 2023 06:32:55 GMT
Title: CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models
Authors: Aitor Ormazabal, Mikel Artetxe and Eneko Agirre
Abstract summary: Methods for adapting language models (LMs) to new tasks and domains have traditionally assumed white-box access to the model. We present a lightweight method for adapting large LMs to new domains and tasks, assuming no access to their weights or intermediate activations. Our approach fine-tunes a small white-box LM and combines it with the large black-box LM at the probability level through a small network.
Score: 43.28607973774104
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Methods for adapting language models (LMs) to new tasks and domains have traditionally assumed white-box access to the model, and work by modifying its parameters. However, this is incompatible with a recent trend in the field, where the highest quality models are only available as black-boxes through inference APIs. Even when the model weights are available, the computational cost of fine-tuning large LMs can be prohibitive for most practitioners. In this work, we present a lightweight method for adapting large LMs to new domains and tasks, assuming no access to their weights or intermediate activations. Our approach fine-tunes a small white-box LM and combines it with the large black-box LM at the probability level through a small network, learned on a small validation set. We validate our approach by adapting a large LM (OPT-30B) to several domains and a downstream task (machine translation), observing improved performance in all cases, of up to 9%, while using a domain expert 23x smaller.

Related papers

Scaling Laws for Optimal Data Mixtures [30.981047302765138]
We propose a systematic method to determine the optimal data mixture for any target domain using scaling laws.<n>We validate the universality of these scaling laws by demonstrating their predictive power in three distinct and large-scale settings.
arXiv Detail & Related papers (2025-07-12T21:16:08Z)
Large Language Models as Attribution Regularizers for Efficient Model Training [0.0]
Large Language Models (LLMs) have demonstrated remarkable performance across diverse domains. We introduce a novel yet straightforward method for incorporating LLM-generated global task feature attributions into the training process of smaller networks. Our approach yields superior performance in few-shot learning scenarios.
arXiv Detail & Related papers (2025-02-27T16:55:18Z)
Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages [10.418542753869433]
Low-resource languages (LRLs) face significant challenges in natural language processing (NLP) due to limited data. Current state-of-the-art large language models (LLMs) still struggle with LRLs. Small multilingual models (mLMs) such as mBERT and XLM-R offer greater promise due to a better fit of their capacity to low training data sizes.
arXiv Detail & Related papers (2025-02-14T13:10:39Z)
BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [56.89958793648104]
Large Language Models (LLMs) are versatile and capable of addressing a diverse range of tasks. Previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs. We present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models.
arXiv Detail & Related papers (2024-03-27T08:57:21Z)
Tuning Language Models by Proxy [110.49482736590907]
We introduce proxy-tuning, a lightweight decoding-time algorithm that operates on top of black-box LMs to achieve the same end as direct tuning. Our method tunes a smaller LM, then applies the difference between the predictions of the small tuned and untuned LMs to shift the original predictions of the larger untuned model in the direction of tuning.
arXiv Detail & Related papers (2024-01-16T18:49:55Z)
LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities. We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English. When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z)
CELDA: Leveraging Black-box Language Model as Enhanced Classifier without Labels [14.285609493077965]
Clustering-enhanced Linear Discriminative Analysis, a novel approach that improves the text classification accuracy with a very weak-supervision signal. Our framework draws a precise decision boundary without accessing weights or gradients of the LM model or data labels.
arXiv Detail & Related papers (2023-06-05T08:35:31Z)
Small Models are Valuable Plug-ins for Large Language Models [65.29370906766997]
Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are often publicly unavailable. We propose Super In-Context Learning (SuperICL) which allows black-box LLMs to work with locally fine-tuned smaller models.
arXiv Detail & Related papers (2023-05-15T17:59:01Z)
eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception. Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency. We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z)
Adaptable Multi-Domain Language Model for Transformer ASR [16.8397357399749]
The proposed model can reuse the full fine-tuned LM which is fine-tuned using all layers of an original model. The proposed model is also effective in reducing the model maintenance cost because it is possible to omit the costly and time-consuming common LM pre-training process.
arXiv Detail & Related papers (2020-08-14T06:33:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.