Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models
- URL: http://arxiv.org/abs/2308.00304v3
- Date: Tue, 16 Jul 2024 20:09:47 GMT
- Title: Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models
- Authors: Jiaao Chen, Xiaoman Pan, Dian Yu, Kaiqiang Song, Xiaoyang Wang, Dong Yu, Jianshu Chen,
- Abstract summary: We investigate how to elicit compositional generalization capabilities in large language models (LLMs)
We find that demonstrating both foundational skills and compositional examples grounded in these skills within the same prompt context is crucial.
We show that fine-tuning LLMs with SKiC-style data can elicit zero-shot weak-to-strong generalization.
- Score: 68.18370230899102
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate how to elicit compositional generalization capabilities in large language models (LLMs). Compositional generalization empowers LLMs to solve complex problems by combining foundational skills, a critical reasoning ability akin to human intelligence. However, even the most advanced LLMs currently struggle with this form of reasoning. We examine this problem within the framework of in-context learning and find that demonstrating both foundational skills and compositional examples grounded in these skills within the same prompt context is crucial. We refer to this prompt structure as skills-in-context (SKiC). With as few as two exemplars, this in-context learning structure enables LLMs to tackle more challenging problems requiring innovative skill combinations, achieving near-perfect systematic generalization across a broad range of tasks. Intriguingly, SKiC also unlocks the latent potential of LLMs, allowing them to more actively utilize pre-existing internal skills acquired during earlier pretraining stages to solve complex reasoning problems. The SKiC structure is robust across different skill constructions and exemplar choices and demonstrates strong transferability to new tasks. Finally, inspired by our in-context learning study, we show that fine-tuning LLMs with SKiC-style data can elicit zero-shot weak-to-strong generalization, enabling the models to solve much harder problems directly with standard prompting.
Related papers
- BloomWise: Enhancing Problem-Solving capabilities of Large Language Models using Bloom's-Taxonomy-Inspired Prompts [59.83547898874152]
We introduce BloomWise, a new prompting technique, inspired by Bloom's taxonomy, to improve the performance of Large Language Models (LLMs)
The decision regarding the need to employ more sophisticated cognitive skills is based on self-evaluation performed by the LLM.
In extensive experiments across 4 popular math reasoning datasets, we have demonstrated the effectiveness of our proposed approach.
arXiv Detail & Related papers (2024-10-05T09:27:52Z) - CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge [44.59258397967782]
Large language models (LLMs) have demonstrated impressive capabilities across various natural language processing tasks.
We present a systematic evaluation of state-of-the-art LLMs' complex logical reasoning abilities.
We find that LLMs excel at reasoning over general world knowledge but face significant challenges with specialized domain-specific knowledge.
arXiv Detail & Related papers (2024-07-30T05:40:32Z) - Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability [12.349247962800813]
Large language models (LLMs) have emerged as powerful tools for many AI problems.
They exhibit remarkable in-context learning (ICL) capabilities.
How they approach composite tasks remains an open and largely underexplored question.
arXiv Detail & Related papers (2024-07-22T15:22:34Z) - Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever [48.5585921817745]
Large Language Models (LLMs) are used to automate the knowledge tagging task.
We show the strong performance of zero- and few-shot results over math questions knowledge tagging tasks.
By proposing a reinforcement learning-based demonstration retriever, we successfully exploit the great potential of different-sized LLMs.
arXiv Detail & Related papers (2024-06-19T23:30:01Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - Puzzle Solving using Reasoning of Large Language Models: A Survey [1.9939549451457024]
This survey examines the capabilities of Large Language Models (LLMs) in puzzle solving.
Our findings highlight the disparity between LLM capabilities and human-like reasoning.
The survey underscores the necessity for novel strategies and richer datasets to advance LLMs' puzzle-solving proficiency.
arXiv Detail & Related papers (2024-02-17T14:19:38Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - When does In-context Learning Fall Short and Why? A Study on
Specification-Heavy Tasks [54.71034943526973]
In-context learning (ICL) has become the default method for using large language models (LLMs)
We find that ICL falls short of handling specification-heavy tasks, which are tasks with complicated and extensive task specifications.
We identify three primary reasons: inability to specifically understand context, misalignment in task schema comprehension with humans, and inadequate long-text understanding ability.
arXiv Detail & Related papers (2023-11-15T14:26:30Z) - Collaborating with language models for embodied reasoning [30.82976922056617]
Reasoning in a complex and ambiguous environment is a key goal for Reinforcement Learning (RL) agents.
We present a set of tasks that require reasoning, test this system's ability to generalize zero-shot and investigate failure cases.
arXiv Detail & Related papers (2023-02-01T21:26:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.