CombLM: Adapting Black-Box Language Models through Small Fine-Tuned
Models
- URL: http://arxiv.org/abs/2305.16876v1
- Date: Tue, 23 May 2023 06:32:55 GMT
- Title: CombLM: Adapting Black-Box Language Models through Small Fine-Tuned
Models
- Authors: Aitor Ormazabal, Mikel Artetxe and Eneko Agirre
- Abstract summary: Methods for adapting language models (LMs) to new tasks and domains have traditionally assumed white-box access to the model.
We present a lightweight method for adapting large LMs to new domains and tasks, assuming no access to their weights or intermediate activations.
Our approach fine-tunes a small white-box LM and combines it with the large black-box LM at the probability level through a small network.
- Score: 43.28607973774104
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Methods for adapting language models (LMs) to new tasks and domains have
traditionally assumed white-box access to the model, and work by modifying its
parameters. However, this is incompatible with a recent trend in the field,
where the highest quality models are only available as black-boxes through
inference APIs. Even when the model weights are available, the computational
cost of fine-tuning large LMs can be prohibitive for most practitioners. In
this work, we present a lightweight method for adapting large LMs to new
domains and tasks, assuming no access to their weights or intermediate
activations. Our approach fine-tunes a small white-box LM and combines it with
the large black-box LM at the probability level through a small network,
learned on a small validation set. We validate our approach by adapting a large
LM (OPT-30B) to several domains and a downstream task (machine translation),
observing improved performance in all cases, of up to 9%, while using a domain
expert 23x smaller.
Related papers
- BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [56.89958793648104]
Large Language Models (LLMs) are versatile and capable of addressing a diverse range of tasks.
Previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs.
We present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models.
arXiv Detail & Related papers (2024-03-27T08:57:21Z) - Tuning Language Models by Proxy [110.49482736590907]
We introduce proxy-tuning, a lightweight decoding-time algorithm that operates on top of black-box LMs to achieve the same end as direct tuning.
Our method tunes a smaller LM, then applies the difference between the predictions of the small tuned and untuned LMs to shift the original predictions of the larger untuned model in the direction of tuning.
arXiv Detail & Related papers (2024-01-16T18:49:55Z) - LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities.
We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English.
When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z) - CELDA: Leveraging Black-box Language Model as Enhanced Classifier
without Labels [14.285609493077965]
Clustering-enhanced Linear Discriminative Analysis, a novel approach that improves the text classification accuracy with a very weak-supervision signal.
Our framework draws a precise decision boundary without accessing weights or gradients of the LM model or data labels.
arXiv Detail & Related papers (2023-06-05T08:35:31Z) - Small Models are Valuable Plug-ins for Large Language Models [65.29370906766997]
Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are often publicly unavailable.
We propose Super In-Context Learning (SuperICL) which allows black-box LLMs to work with locally fine-tuned smaller models.
arXiv Detail & Related papers (2023-05-15T17:59:01Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Adaptable Multi-Domain Language Model for Transformer ASR [16.8397357399749]
The proposed model can reuse the full fine-tuned LM which is fine-tuned using all layers of an original model.
The proposed model is also effective in reducing the model maintenance cost because it is possible to omit the costly and time-consuming common LM pre-training process.
arXiv Detail & Related papers (2020-08-14T06:33:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.