Related papers: Can LLM find the green circle? Investigation and Human-guided tool manipulation for compositional generalization

Can LLM find the green circle? Investigation and Human-guided tool manipulation for compositional generalization

URL: http://arxiv.org/abs/2312.07763v1
Date: Tue, 12 Dec 2023 22:11:17 GMT
Title: Can LLM find the green circle? Investigation and Human-guided tool manipulation for compositional generalization
Authors: Min Zhang, Jianfeng He, Shuo Lei, Murong Yue, Linhang Wang, Chang-Tien Lu
Abstract summary: Large language models (LLMs) exhibit impressive generalization abilities on many tasks through in-context learning (ICL) We propose a human-guided tool manipulation framework (HTM) that generates tools for sub-questions and integrates multiple tools. Experiments show that our method achieves state-of-the-art performance on two compositional generalization benchmarks and outperforms existing methods on the most challenging test split by 70%.
Score: 28.069928613367978
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The meaning of complex phrases in natural language is composed of their individual components. The task of compositional generalization evaluates a model's ability to understand new combinations of components. Previous studies trained smaller, task-specific models, which exhibited poor generalization. While large language models (LLMs) exhibit impressive generalization abilities on many tasks through in-context learning (ICL), their potential for compositional generalization remains unexplored. In this paper, we first empirically investigate prevailing ICL methods in compositional generalization. We find that they struggle with complex compositional questions due to cumulative errors in long reasoning steps and intricate logic required for tool-making. Consequently, we propose a human-guided tool manipulation framework (HTM) that generates tools for sub-questions and integrates multiple tools. Our method enhances the effectiveness of tool creation and usage with minimal human effort. Experiments show that our method achieves state-of-the-art performance on two compositional generalization benchmarks and outperforms existing methods on the most challenging test split by 70%.

Related papers

Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning [63.31585771716123]
Large language models (LLMs) have shown remarkable reasoning capabilities via large-scale reinforcement learning (RL)<n>We introduce Tool-Star, an RL-based framework designed to empower LLMs to autonomously invoke multiple external tools during stepwise reasoning.<n>Tool-Star integrates six types of tools and incorporates systematic designs in both data synthesis and training.
arXiv Detail & Related papers (2025-05-22T09:00:19Z)
Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability [12.349247962800813]
Large language models (LLMs) have emerged as powerful tools for many AI problems. They exhibit remarkable in-context learning (ICL) capabilities. How they approach composite tasks remains an open and largely underexplored question.
arXiv Detail & Related papers (2024-07-22T15:22:34Z)
MetaTool: Facilitating Large Language Models to Master Tools with Meta-task Augmentation [25.360660222418183]
We present MetaTool, a novel tool learning methodology designed to generalize across any reusable toolset. By incorporating meta-task data into task-oriented training, our method significantly enhances the performance of open-source Large Language Models.
arXiv Detail & Related papers (2024-07-15T10:15:41Z)
Towards Completeness-Oriented Tool Retrieval for Large Language Models [60.733557487886635]
Real-world systems often incorporate a wide array of tools, making it impractical to input all tools into Large Language Models. Existing tool retrieval methods primarily focus on semantic matching between user queries and tool descriptions. We propose a novel modelagnostic COllaborative Learning-based Tool Retrieval approach, COLT, which captures not only the semantic similarities between user queries and tool descriptions but also takes into account the collaborative information of tools.
arXiv Detail & Related papers (2024-05-25T06:41:23Z)
Towards Compositionally Generalizable Semantic Parsing in Large Language Models: A Survey [0.0]
We present a literature survey geared at recent advances in analysis, methods, and evaluation schemes for compositional generalization. This type of generalization is particularly relevant to the semantic parsing community for applications such as task-oriented dialogue.
arXiv Detail & Related papers (2024-04-15T10:44:58Z)
From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs [62.496139001509114]
We introduce a novel tool invocation pipeline designed to control massive real-world APIs. This pipeline mirrors the human task-solving process, addressing complicated real-life user queries. Empirical evaluations of our Sum2Act pipeline on the ToolBench benchmark show significant performance improvements.
arXiv Detail & Related papers (2024-02-28T08:42:23Z)
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback [78.60644407028022]
We introduce MINT, a benchmark that evaluates large language models' ability to solve tasks with multi-turn interactions. LLMs generally benefit from tools and language feedback, with performance gains of 1-8% for each turn of tool use. LLMs evaluated, supervised instruction-finetuning (SIFT) and reinforcement learning from human feedback (RLHF) generally hurt multi-turn capabilities.
arXiv Detail & Related papers (2023-09-19T15:25:42Z)
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models [68.18370230899102]
We investigate how to elicit compositional generalization capabilities in large language models (LLMs) We find that demonstrating both foundational skills and compositional examples grounded in these skills within the same prompt context is crucial. We show that fine-tuning LLMs with SKiC-style data can elicit zero-shot weak-to-strong generalization.
arXiv Detail & Related papers (2023-08-01T05:54:12Z)
ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis [54.18659323181771]
We characterize several different forms of compositional generalization that are desirable in program synthesis. We propose ExeDec, a novel decomposition-based strategy that predicts execution subgoals to solve problems step-by-step informed by program execution at each step.
arXiv Detail & Related papers (2023-07-26T01:07:52Z)
Compositional Generalization and Decomposition in Neural Program Synthesis [59.356261137313275]
In this paper, we focus on measuring the ability of learned program synthesizers to compositionally generalize. We first characterize several different axes along which program synthesis methods would be desired to generalize. We introduce a benchmark suite of tasks to assess these abilities based on two popular existing datasets.
arXiv Detail & Related papers (2022-04-07T22:16:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.