Augmenting Interpretable Models with LLMs during Training
- URL: http://arxiv.org/abs/2209.11799v3
- Date: Tue, 25 Apr 2023 01:39:59 GMT
- Title: Augmenting Interpretable Models with LLMs during Training
- Authors: Chandan Singh, Armin Askari, Rich Caruana, Jianfeng Gao
- Abstract summary: We propose Augmented Interpretable Models (Aug-imodels) to build efficient and interpretable models.
Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency.
We explore two instantiations of Aug-imodels in natural-language processing: (i) Aug-GAM, which augments a generalized additive model with decoupled embeddings from an LLM and (ii) Aug-Tree, which augments a decision tree with LLM feature expansions.
- Score: 73.40079895413861
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent large language models (LLMs) have demonstrated remarkable prediction
performance for a growing array of tasks. However, their proliferation into
high-stakes domains (e.g. medicine) and compute-limited settings has created a
burgeoning need for interpretability and efficiency. We address this need by
proposing Augmented Interpretable Models (Aug-imodels), a framework for
leveraging the knowledge learned by LLMs to build extremely efficient and
interpretable models. Aug-imodels use LLMs during fitting but not during
inference, allowing complete transparency and often a speed/memory improvement
of greater than 1,000x for inference compared to LLMs. We explore two
instantiations of Aug-imodels in natural-language processing: (i) Aug-GAM,
which augments a generalized additive model with decoupled embeddings from an
LLM and (ii) Aug-Tree, which augments a decision tree with LLM feature
expansions. Across a variety of text-classification datasets, both outperform
their non-augmented counterparts. Aug-GAM can even outperform much larger
models (e.g. a 6-billion parameter GPT-J model), despite having 10,000x fewer
parameters and being fully transparent. We further explore Aug-imodels in a
natural-language fMRI study, where they generate interesting interpretations
from scientific data. All code for using Aug-imodels and reproducing results is
made available on Github.
Related papers
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking [35.393279375085854]
Large language models (LLMs) are more robust at interpreting uncommon mentions.
We introduce LLM-Augmented Entity Linking LLMAEL, a plug-and-play approach to enhance entity linking.
Experiments on 6 standard datasets show that the vanilla LLMAEL outperforms baseline EL models in most cases.
arXiv Detail & Related papers (2024-07-04T15:55:13Z) - Data Science with LLMs and Interpretable Models [19.4969442162327]
Large language models (LLMs) are remarkably good at working with interpretable models.
We show that LLMs can describe, interpret, and debug Generalized Additive Models (GAMs)
arXiv Detail & Related papers (2024-02-22T12:04:15Z) - LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities.
We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English.
When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - The Truth is in There: Improving Reasoning in Language Models with
Layer-Selective Rank Reduction [22.659005954676598]
We show that it is possible to significantly improve the performance of Large Language Models by selectively removing higher-order components of their weight matrices.
This simple intervention, which we call LAyer-SElective Rank reduction (LASER), can be done on a model after training has completed.
We show extensive experiments demonstrating the generality of this finding across language models and datasets.
arXiv Detail & Related papers (2023-12-21T03:51:08Z) - Empower Your Model with Longer and Better Context Comprehension [15.377707808279908]
We investigate the nature of information transfer within Large Language Models (LLMs)
We propose a novel technique called Attention Transition to empower models to achieve longer and better context comprehension.
Our experiments are conducted on the challenging XSum dataset using LLaMa-7b model with context token length ranging from 800 to 1900.
arXiv Detail & Related papers (2023-07-25T09:34:42Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z) - Large Language Models Are Latent Variable Models: Explaining and Finding
Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.