Zoology: Measuring and Improving Recall in Efficient Language Models
- URL: http://arxiv.org/abs/2312.04927v1
- Date: Fri, 8 Dec 2023 09:44:25 GMT
- Title: Zoology: Measuring and Improving Recall in Efficient Language Models
- Authors: Simran Arora, Sabri Eyuboglu, Aman Timalsina, Isys Johnson, Michael
Poli, James Zou, Atri Rudra, and Christopher R\'e
- Abstract summary: We train a suite of 17 attention and "gated-convolution" language models.
We find that gated-convolution architectures still underperform attention by up to 2.1 perplexity points on the Pile.
We develop a new formalization of the task called multi-query associative recall (MQAR) that better reflects actual language.
- Score: 42.159338928861864
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Attention-free language models that combine gating and convolutions are
growing in popularity due to their efficiency and increasingly competitive
performance. To better understand these architectures, we pretrain a suite of
17 attention and "gated-convolution" language models, finding that SoTA
gated-convolution architectures still underperform attention by up to 2.1
perplexity points on the Pile. In fine-grained analysis, we find 82% of the gap
is explained by each model's ability to recall information that is previously
mentioned in-context, e.g. "Hakuna Matata means no worries Hakuna Matata it
means no" $\rightarrow$ "??". On this task, termed "associative recall", we
find that attention outperforms gated-convolutions by a large margin: a 70M
parameter attention model outperforms a 1.4 billion parameter gated-convolution
model on associative recall. This is surprising because prior work shows gated
convolutions can perfectly solve synthetic tests for AR capability. To close
the gap between synthetics and real language, we develop a new formalization of
the task called multi-query associative recall (MQAR) that better reflects
actual language. We perform an empirical and theoretical study of MQAR that
elucidates differences in the parameter-efficiency of attention and
gated-convolution recall. Informed by our analysis, we evaluate simple
convolution-attention hybrids and show that hybrids with input-dependent sparse
attention patterns can close 97.4% of the gap to attention, while maintaining
sub-quadratic scaling. Our code is accessible at:
https://github.com/HazyResearch/zoology.
Related papers
- From RAG to Memory: Non-Parametric Continual Learning for Large Language Models [6.380729797938521]
retrieval-augmented generation (RAG) has become the dominant way to introduce new information.
Recent RAG approaches augment vector embeddings with various structures like knowledge graphs to address some gaps, namely sense-making and associativity.
We propose HippoRAG 2, a framework that outperforms standard RAG comprehensively on factual, sense-making, and associative memory tasks.
arXiv Detail & Related papers (2025-02-20T18:26:02Z) - Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling [52.34735382627312]
Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.
Existing approaches mainly rely on imitation learning and struggle to achieve effective test-time scaling.
We present T1 to scale reinforcement learning by encouraging exploration and understand inference scaling.
arXiv Detail & Related papers (2025-01-20T18:33:33Z) - Align Attention Heads Before Merging Them: An Effective Way for Converting MHA to GQA [8.305827430948654]
We propose a low-cost method for pruning MHA models into GQA models with any compression ratio of key-value heads.
Our strategy can compress up to 87.5% of key-value heads of the LLaMA2-7B model without too much performance degradation.
arXiv Detail & Related papers (2024-12-30T03:05:45Z) - Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning [89.96284387376119]
We show how diffusion models learn difficult subgoals that elude autoregressive approaches.
We propose Multi-Granularity Diffusion Modeling (MGDM), which prioritizes subgoals based on difficulty during learning.
MGDM significantly outperforms autoregressive models without using search techniques.
arXiv Detail & Related papers (2024-10-18T03:48:53Z) - Honest AI: Fine-Tuning "Small" Language Models to Say "I Don't Know", and Reducing Hallucination in RAG [6.326488286636623]
Hallucination is a key roadblock for applications of Large Language Models (LLMs)
We propose Honest AI: a novel strategy to fine-tune "small" language models to say "I don't know" to reduce hallucination.
arXiv Detail & Related papers (2024-10-13T02:34:47Z) - PORT: Preference Optimization on Reasoning Traces [1.7292887546437081]
This paper proposes using preference optimization methods on Chain-of-Thought steps in order to improve the mathematical reasoning performances of language models.
Our approach leads to increased accuracy on the GSM8K and AQuA-RAT mathematical reasoning benchmarks for Falcon2-11B and Mistral-7B.
The improved abilities transfer to non-mathematical tasks, including the ARC benchmark and symbolic reasoning challenges.
arXiv Detail & Related papers (2024-06-23T09:51:06Z) - DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs [70.54226917774933]
We propose the DecompositionAlignment-Reasoning Agent (DARA) framework.
DARA effectively parses questions into formal queries through a dual mechanism.
We show that DARA attains performance comparable to state-of-the-art enumerating-and-ranking-based methods for KGQA.
arXiv Detail & Related papers (2024-06-11T09:09:37Z) - Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models [102.72940700598055]
In reasoning tasks, even a minor error can cascade into inaccurate results.
We develop a method that avoids introducing external resources, relying instead on perturbations to the input.
Our training approach randomly masks certain tokens within the chain of thought, a technique we found to be particularly effective for reasoning tasks.
arXiv Detail & Related papers (2024-03-04T16:21:54Z) - Simple linear attention language models balance the recall-throughput
tradeoff [40.08746299497935]
We propose BASED, a simple architecture combining linear and sliding window attention.
We train language models up to 1.3b parameters and show that BASED matches the strongest sub-quadratic models in perplexity and outperforms them on real-world recall-intensive tasks by 6.22 accuracy points.
arXiv Detail & Related papers (2024-02-28T19:28:27Z) - Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models [59.05769810380928]
Rephrase, Augment and Reason (RepARe) is a gradient-free framework that extracts salient details about the image using the underlying vision-language model.
We show that RepARe can result in a 3.85% (absolute) increase in zero-shot accuracy on VQAv2, 6.41%, and 7.94% points increase on A-OKVQA, and VizWiz respectively.
arXiv Detail & Related papers (2023-10-09T16:57:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.