Can LLM find the green circle? Investigation and Human-guided tool
manipulation for compositional generalization
- URL: http://arxiv.org/abs/2312.07763v1
- Date: Tue, 12 Dec 2023 22:11:17 GMT
- Title: Can LLM find the green circle? Investigation and Human-guided tool
manipulation for compositional generalization
- Authors: Min Zhang, Jianfeng He, Shuo Lei, Murong Yue, Linhang Wang, Chang-Tien
Lu
- Abstract summary: Large language models (LLMs) exhibit impressive generalization abilities on many tasks through in-context learning (ICL)
We propose a human-guided tool manipulation framework (HTM) that generates tools for sub-questions and integrates multiple tools.
Experiments show that our method achieves state-of-the-art performance on two compositional generalization benchmarks and outperforms existing methods on the most challenging test split by 70%.
- Score: 28.069928613367978
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The meaning of complex phrases in natural language is composed of their
individual components. The task of compositional generalization evaluates a
model's ability to understand new combinations of components. Previous studies
trained smaller, task-specific models, which exhibited poor generalization.
While large language models (LLMs) exhibit impressive generalization abilities
on many tasks through in-context learning (ICL), their potential for
compositional generalization remains unexplored. In this paper, we first
empirically investigate prevailing ICL methods in compositional generalization.
We find that they struggle with complex compositional questions due to
cumulative errors in long reasoning steps and intricate logic required for
tool-making. Consequently, we propose a human-guided tool manipulation
framework (HTM) that generates tools for sub-questions and integrates multiple
tools. Our method enhances the effectiveness of tool creation and usage with
minimal human effort. Experiments show that our method achieves
state-of-the-art performance on two compositional generalization benchmarks and
outperforms existing methods on the most challenging test split by 70%.
Related papers
- Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability [12.349247962800813]
Large language models (LLMs) have emerged as powerful tools for many AI problems.
They exhibit remarkable in-context learning (ICL) capabilities.
How they approach composite tasks remains an open and largely underexplored question.
arXiv Detail & Related papers (2024-07-22T15:22:34Z) - MetaTool: Facilitating Large Language Models to Master Tools with Meta-task Augmentation [25.360660222418183]
We present MetaTool, a novel tool learning methodology designed to generalize across any reusable toolset.
By incorporating meta-task data into task-oriented training, our method significantly enhances the performance of open-source Large Language Models.
arXiv Detail & Related papers (2024-07-15T10:15:41Z) - Towards Completeness-Oriented Tool Retrieval for Large Language Models [60.733557487886635]
Real-world systems often incorporate a wide array of tools, making it impractical to input all tools into Large Language Models.
Existing tool retrieval methods primarily focus on semantic matching between user queries and tool descriptions.
We propose a novel modelagnostic COllaborative Learning-based Tool Retrieval approach, COLT, which captures not only the semantic similarities between user queries and tool descriptions but also takes into account the collaborative information of tools.
arXiv Detail & Related papers (2024-05-25T06:41:23Z) - Towards Compositionally Generalizable Semantic Parsing in Large Language Models: A Survey [0.0]
We present a literature survey geared at recent advances in analysis, methods, and evaluation schemes for compositional generalization.
This type of generalization is particularly relevant to the semantic parsing community for applications such as task-oriented dialogue.
arXiv Detail & Related papers (2024-04-15T10:44:58Z) - From Summary to Action: Enhancing Large Language Models for Complex
Tasks with Open World APIs [62.496139001509114]
We introduce a novel tool invocation pipeline designed to control massive real-world APIs.
This pipeline mirrors the human task-solving process, addressing complicated real-life user queries.
Empirical evaluations of our Sum2Act pipeline on the ToolBench benchmark show significant performance improvements.
arXiv Detail & Related papers (2024-02-28T08:42:23Z) - MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language
Feedback [78.60644407028022]
We introduce MINT, a benchmark that evaluates large language models' ability to solve tasks with multi-turn interactions.
LLMs generally benefit from tools and language feedback, with performance gains of 1-8% for each turn of tool use.
LLMs evaluated, supervised instruction-finetuning (SIFT) and reinforcement learning from human feedback (RLHF) generally hurt multi-turn capabilities.
arXiv Detail & Related papers (2023-09-19T15:25:42Z) - Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models [68.18370230899102]
We investigate how to elicit compositional generalization capabilities in large language models (LLMs)
We find that demonstrating both foundational skills and compositional examples grounded in these skills within the same prompt context is crucial.
We show that fine-tuning LLMs with SKiC-style data can elicit zero-shot weak-to-strong generalization.
arXiv Detail & Related papers (2023-08-01T05:54:12Z) - ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis [54.18659323181771]
We characterize several different forms of compositional generalization that are desirable in program synthesis.
We propose ExeDec, a novel decomposition-based strategy that predicts execution subgoals to solve problems step-by-step informed by program execution at each step.
arXiv Detail & Related papers (2023-07-26T01:07:52Z) - Compositional Generalization and Decomposition in Neural Program
Synthesis [59.356261137313275]
In this paper, we focus on measuring the ability of learned program synthesizers to compositionally generalize.
We first characterize several different axes along which program synthesis methods would be desired to generalize.
We introduce a benchmark suite of tasks to assess these abilities based on two popular existing datasets.
arXiv Detail & Related papers (2022-04-07T22:16:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.