SciAgent: Tool-augmented Language Models for Scientific Reasoning
- URL: http://arxiv.org/abs/2402.11451v2
- Date: Wed, 21 Feb 2024 03:04:49 GMT
- Title: SciAgent: Tool-augmented Language Models for Scientific Reasoning
- Authors: Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming
Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla and Weizhu Chen
- Abstract summary: We introduce a new task setting named tool-augmented scientific reasoning.
This setting supplements Large Language Models with scalable toolsets.
We construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools.
Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving.
- Score: 129.51442677710452
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scientific reasoning poses an excessive challenge for even the most advanced
Large Language Models (LLMs). To make this task more practical and solvable for
LLMs, we introduce a new task setting named tool-augmented scientific
reasoning. This setting supplements LLMs with scalable toolsets, and shifts the
focus from pursuing an omniscient problem solver to a proficient tool-user. To
facilitate the research of such setting, we construct a tool-augmented training
corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000
tools. Building on MathFunc, we develop SciAgent to retrieve, understand and,
if necessary, use tools for scientific problem solving. Additionally, we craft
a benchmark, SciToolBench, spanning five scientific domains to evaluate LLMs'
abilities with tool assistance. Extensive experiments on SciToolBench confirm
the effectiveness of SciAgent. Notably, SciAgent-Mistral-7B surpasses other
LLMs with the same size by more than 13% in absolute accuracy. Furthermore,
SciAgent-DeepMath-7B shows much superior performance than ChatGPT.
Related papers
- StepTool: A Step-grained Reinforcement Learning Framework for Tool Learning in LLMs [44.906714156993694]
We introduce StepTool, a novel step-grained reinforcement learning framework to improve tool learning in Large Language Models.
StepTool significantly outperforms existing methods in multi-step, tool-based tasks.
arXiv Detail & Related papers (2024-10-10T09:23:26Z) - Efficient and Scalable Estimation of Tool Representations in Vector Space [34.767193045989515]
We present a framework for generating synthetic data for tool retrieval applications and an efficient data-driven tool retrieval strategy using small encoder models.
We create ToolBank, a new tool retrieval dataset that reflects real human user usages.
With these new methods, we achieve improvements of up to 27.28 in Recall@K on the ToolBench dataset and 30.5 in Recall@K on ToolBank.
arXiv Detail & Related papers (2024-09-02T19:39:24Z) - What Are Tools Anyway? A Survey from the Language Model Perspective [67.18843218893416]
Language models (LMs) are powerful yet mostly for text generation tasks.
We provide a unified definition of tools as external programs used by LMs.
We empirically study the efficiency of various tooling methods.
arXiv Detail & Related papers (2024-03-18T17:20:07Z) - LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error [54.954211216847135]
Existing large language models (LLMs) only reach a correctness rate in the range of 30% to 60%.
We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE)
STE orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory.
arXiv Detail & Related papers (2024-03-07T18:50:51Z) - EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction [56.02100384015907]
EasyTool is a framework transforming diverse and lengthy tool documentation into a unified and concise tool instruction.
It can significantly reduce token consumption and improve the performance of tool utilization in real-world scenarios.
arXiv Detail & Related papers (2024-01-11T15:45:11Z) - MetaTool Benchmark for Large Language Models: Deciding Whether to Use
Tools and Which to Use [82.24774504584066]
Large language models (LLMs) have garnered significant attention due to their impressive natural language processing (NLP) capabilities.
We introduce MetaTool, a benchmark designed to evaluate whether LLMs have tool usage awareness and can correctly choose tools.
We conduct experiments involving eight popular LLMs and find that the majority of them still struggle to effectively select tools.
arXiv Detail & Related papers (2023-10-04T19:39:26Z) - CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models [74.22729793816451]
Large Language Models (LLMs) have made significant progress in utilizing tools, but their ability is limited by API availability.
We propose CREATOR, a novel framework that enables LLMs to create their own tools using documentation and code realization.
We evaluate CREATOR on MATH and TabMWP benchmarks, respectively consisting of challenging math competition problems.
arXiv Detail & Related papers (2023-05-23T17:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.