Look Before You Leap: A Universal Emergent Decomposition of Retrieval
Tasks in Language Models
- URL: http://arxiv.org/abs/2312.10091v1
- Date: Wed, 13 Dec 2023 18:36:43 GMT
- Title: Look Before You Leap: A Universal Emergent Decomposition of Retrieval
Tasks in Language Models
- Authors: Alexandre Variengien and Eric Winsor
- Abstract summary: We study how language models (LMs) solve retrieval tasks in diverse situations.
We introduce ORION, a collection of structured retrieval tasks spanning six domains.
We find that LMs internally decompose retrieval tasks in a modular way.
- Score: 58.57279229066477
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When solving challenging problems, language models (LMs) are able to identify
relevant information from long and complicated contexts. To study how LMs solve
retrieval tasks in diverse situations, we introduce ORION, a collection of
structured retrieval tasks spanning six domains, from text understanding to
coding. Each task in ORION can be represented abstractly by a request (e.g. a
question) that retrieves an attribute (e.g. the character name) from a context
(e.g. a story). We apply causal analysis on 18 open-source language models with
sizes ranging from 125 million to 70 billion parameters. We find that LMs
internally decompose retrieval tasks in a modular way: middle layers at the
last token position process the request, while late layers retrieve the correct
entity from the context. After causally enforcing this decomposition, models
are still able to solve the original task, preserving 70% of the original
correct token probability in 98 of the 106 studied model-task pairs. We connect
our macroscopic decomposition with a microscopic description by performing a
fine-grained case study of a question-answering task on Pythia-2.8b. Building
on our high-level understanding, we demonstrate a proof of concept application
for scalable internal oversight of LMs to mitigate prompt-injection while
requiring human supervision on only a single input. Our solution improves
accuracy drastically (from 15.5% to 97.5% on Pythia-12b). This work presents
evidence of a universal emergent modular processing of tasks across varied
domains and models and is a pioneering effort in applying interpretability for
scalable internal oversight of LMs.
Related papers
- MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs [78.5013630951288]
This paper introduces techniques for advancing information retrieval with multimodal large language models (MLLMs)
We first study fine-tuning an MLLM as a bi-encoder retriever on 10 datasets with 16 retrieval tasks.
We propose modality-aware hard negative mining to mitigate the modality bias exhibited by MLLM retrievers.
arXiv Detail & Related papers (2024-11-04T20:06:34Z) - Probing the Robustness of Theory of Mind in Large Language Models [6.7932860553262415]
We introduce a novel dataset of 68 tasks for probing ToM in LLMs.
We evaluate the ToM performance of four SotA open source LLMs on our dataset and the dataset introduced by (Kosinski, 2023)
We find a consistent tendency in all tested LLMs to perform poorly on tasks that require the realization that an agent has knowledge of automatic state changes in its environment.
arXiv Detail & Related papers (2024-10-08T18:13:27Z) - Analyzing the Role of Semantic Representations in the Era of Large Language Models [104.18157036880287]
We investigate the role of semantic representations in the era of large language models (LLMs)
We propose an AMR-driven chain-of-thought prompting method, which we call AMRCoT.
We find that it is difficult to predict which input examples AMR may help or hurt on, but errors tend to arise with multi-word expressions.
arXiv Detail & Related papers (2024-05-02T17:32:59Z) - Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems [76.69936664916061]
We study how the number of LM calls affects the performance of Vote and Filter-Vote.
We find, surprisingly, that across multiple language tasks, the performance of both Vote and Filter-Vote can first increase but then decrease as a function of the number of LM calls.
arXiv Detail & Related papers (2024-03-04T19:12:48Z) - Language Models Implement Simple Word2Vec-style Vector Arithmetic [32.2976613483151]
A primary criticism towards language models (LMs) is their inscrutability.
This paper presents evidence that, despite their size and complexity, LMs sometimes exploit a simple vector arithmetic style mechanism to solve some relational tasks.
arXiv Detail & Related papers (2023-05-25T15:04:01Z) - ZEROTOP: Zero-Shot Task-Oriented Semantic Parsing using Large Language
Models [6.13621607944513]
We propose ZEROTOP, a zero-shot task-oriented parsing method that decomposes a semantic parsing problem into a set of abstractive and extractive question-answering problems.
We show that our QA-based decomposition paired with the fine-tuned LLM can correctly parse 16% of utterances in the MTOP dataset without requiring any annotated data.
arXiv Detail & Related papers (2022-12-21T07:06:55Z) - Successive Prompting for Decomposing Complex Questions [50.00659445976735]
Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting.
We introduce Successive Prompting'', where we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution.
Our best model (with successive prompting) achieves an improvement of 5% absolute F1 on a few-shot version of the DROP dataset.
arXiv Detail & Related papers (2022-12-08T06:03:38Z) - Is a Question Decomposition Unit All We Need? [20.66688303609522]
We investigate if humans can decompose a hard question into a set of simpler questions that are relatively easier for models to solve.
We analyze a range of datasets involving various forms of reasoning and find that it is indeed possible to significantly improve model performance.
Our findings indicate that Human-in-the-loop Question Decomposition (HQD) can potentially provide an alternate path to building large LMs.
arXiv Detail & Related papers (2022-05-25T07:24:09Z) - Text Modular Networks: Learning to Decompose Tasks in the Language of
Existing Models [61.480085460269514]
We propose a framework for building interpretable systems that learn to solve complex tasks by decomposing them into simpler ones solvable by existing models.
We use this framework to build ModularQA, a system that can answer multi-hop reasoning questions by decomposing them into sub-questions answerable by a neural factoid single-span QA model and a symbolic calculator.
arXiv Detail & Related papers (2020-09-01T23:45:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.