Related papers: Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation

Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation

URL: http://arxiv.org/abs/2411.00412v1
Date: Fri, 01 Nov 2024 07:18:31 GMT
Title: Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation
Authors: Bohan Lyu, Yadi Cao, Duncan Watson-Parris, Leon Bergen, Taylor Berg-Kirkpatrick, Rose Yu,
Abstract summary: Large Language Models (LLMs) demonstrate promising capabilities in solving simple scientific problems. Human experts first assess problem complexity using domain knowledge before choosing an appropriate solution approach. We propose a novel two-component fine-tuning method. Our models demonstrate a 28.18% improvement in answer accuracy and a 13.89% increase in tool usage precision across all datasets.
Score: 39.805610561281455
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) demonstrate promising capabilities in solving simple scientific problems but often produce hallucinations for complex ones. While integrating LLMs with tools can increase reliability, this approach typically results in over-reliance on tools, diminishing the model's ability to solve simple problems through basic reasoning. In contrast, human experts first assess problem complexity using domain knowledge before choosing an appropriate solution approach. Inspired by this human problem-solving process, we propose a novel two-component fine-tuning method. In the first component World Knowledge Distillation (WKD), LLMs learn directly from solutions generated using tool's information to internalize domain knowledge. In the second component Tool Usage Adaptation (TUA), we partition problems into easy and hard categories based on the model's direct answering accuracy. While maintaining the same alignment target for easy problems as in WKD, we train the model to intelligently switch to tool usage for more challenging problems. We validate our method on six scientific benchmark datasets, spanning mathematics, climate science and epidemiology. On average, our models demonstrate a 28.18% improvement in answer accuracy and a 13.89% increase in tool usage precision across all datasets, surpassing state-of-the-art models including GPT-4o and Claude-3.5.

Related papers

FamilyTool: A Multi-hop Personalized Tool Use Benchmark [94.1158032740113]
We introduce FamilyTool, a novel benchmark grounded in a family-based knowledge graph (KG) FamilyTool challenges Large Language Models with queries spanning 1 to 3 relational hops. Experiments reveal significant performance gaps in state-of-the-art LLMs.
arXiv Detail & Related papers (2025-04-09T10:42:36Z)
GenTool: Enhancing Tool Generalization in Language Models through Zero-to-One and Weak-to-Strong Simulation [37.85029997364506]
Large Language Models (LLMs) can enhance their capabilities as AI assistants by integrating external tools. We present GenTool, a novel training framework that prepares LLMs for diverse generalization challenges in tool utilization. Our approach addresses two fundamental dimensions critical for real-world applications: Zero-to-One Generalization and Weak-to-Strong Generalization.
arXiv Detail & Related papers (2025-02-26T09:54:33Z)
Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use. MeCo captures high-level cognitive signals in the representation space, guiding when to invoke tools. Our experiments show that MeCo accurately detects LLMs' internal cognitive signals and significantly improves tool-use decision-making.
arXiv Detail & Related papers (2025-02-18T15:45:01Z)
Evaluating Language Models as Synthetic Data Generators [74.80905172696366]
AgoraBench is a benchmark that provides standardized settings and metrics to evaluate LMs' data generation abilities. Through synthesizing 1.26 million training instances using 6 LMs and training 99 student models, we uncover key insights about LMs' data generation capabilities.
arXiv Detail & Related papers (2024-12-04T19:20:32Z)
Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents [27.112239616508834]
Mixture of Refinement Agents (MoRA) is a novel agentic refinement framework for large language models (LLMs) MoRA iteratively refines the LLM generated base solution by correcting the aforementioned errors, resulting in a significant performance improvement for open-source LLMs. We evaluate our approach on the SciEval and MMLU subsets along with our own physics dataset (PhysicsQA)
arXiv Detail & Related papers (2024-12-01T14:15:55Z)
OptiMUS-0.3: Using Large Language Models to Model and Solve Optimization Problems at Scale [16.33736498565436]
We introduce a Large Language Model (LLM)-based system designed to formulate and solve linear programming problems from their natural language descriptions. Our system is capable of developing mathematical models, writing and debugning solver code, evaluating the generated solutions, and improving efficiency and correctness of its model and code. Experiments demonstrate that OptiMUS-0.3 outperforms existing state-of-the-art methods on easy datasets by more than 12% and on hard datasets by more than 8%.
arXiv Detail & Related papers (2024-07-29T01:31:45Z)
Deriva-ML: A Continuous FAIRness Approach to Reproducible Machine Learning Models [1.204452887718077]
We show how data management tools can significantly improve the quality of data that is used for machine learning (ML) applications. We propose an architecture and implementation of such tools and demonstrate through two use cases how they can be used to improve ML-based eScience investigations.
arXiv Detail & Related papers (2024-06-27T04:42:29Z)
Learning Task Decomposition to Assist Humans in Competitive Programming [90.4846613669734]
We introduce a novel objective for learning task decomposition, termed value (AssistV) We collect a dataset of human repair experiences on different decomposed solutions. Under 177 hours of human study, our method enables non-experts to solve 33.3% more problems, speeds them up by 3.3x, and empowers them to match unassisted experts.
arXiv Detail & Related papers (2024-06-07T03:27:51Z)
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models. It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths. It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z)
PETScML: Second-order solvers for training regression problems in Scientific Machine Learning [0.22499166814992438]
In recent years, we have witnessed the emergence of scientific machine learning as a data-driven tool for the analysis. We introduce a software built on top of the Portable and Extensible Toolkit for Scientific computation to bridge the gap between deep-learning software and conventional machine-learning techniques.
arXiv Detail & Related papers (2024-03-18T18:59:42Z)
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error [54.954211216847135]
Existing large language models (LLMs) only reach a correctness rate in the range of 30% to 60%. We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE) STE orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory.
arXiv Detail & Related papers (2024-03-07T18:50:51Z)
SciAgent: Tool-augmented Language Models for Scientific Reasoning [129.51442677710452]
We introduce a new task setting named tool-augmented scientific reasoning. This setting supplements Large Language Models with scalable toolsets. We construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools. Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving.
arXiv Detail & Related papers (2024-02-18T04:19:44Z)
Improving Large Language Model Fine-tuning for Solving Math Problems [20.417053742869403]
A large gap exists between large language models' pass-at-one and pass-at-N performance in solving math problems. Using the challenging MATH dataset, we investigate three fine-tuning strategies. We design a fine-tuning recipe that yields approximately 58.8% accuracy on the MATH dataset with fine-tuned PaLM 2-L models.
arXiv Detail & Related papers (2023-10-16T04:11:19Z)
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving [170.7899683843177]
ToRA is a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems. ToRA models significantly outperform open-source models on 10 mathematical reasoning datasets across all scales. ToRA-Code-34B is the first open-source model that achieves an accuracy exceeding 50% on MATH.
arXiv Detail & Related papers (2023-09-29T17:59:38Z)
Learning to Optimize Permutation Flow Shop Scheduling via Graph-based Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems. We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately. Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z)
Minimizing Entropy to Discover Good Solutions to Recurrent Mixed Integer Programs [0.0]
Current solvers for mixed-integer programming (MIP) problems are designed to perform well on a wide range of problems. Recent works have shown that machine learning (ML) can be integrated with an MIP solver to inject domain knowledge and efficiently close the optimality gap. This paper proposes an online solver that uses the notion of entropy to efficiently build a model with minimal training data and tuning.
arXiv Detail & Related papers (2022-02-07T18:52:56Z)
Measuring Mathematical Problem Solving With the MATH Dataset [55.4376028963537]
We introduce MATH, a dataset of 12,500 challenging competition mathematics problems. Each problem has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations. We also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics.
arXiv Detail & Related papers (2021-03-05T18:59:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.