MathViz-E: A Case-study in Domain-Specialized Tool-Using Agents
- URL: http://arxiv.org/abs/2407.17544v1
- Date: Wed, 24 Jul 2024 15:45:07 GMT
- Title: MathViz-E: A Case-study in Domain-Specialized Tool-Using Agents
- Authors: Arya Bulusu, Brandon Man, Ashish Jagmohan, Aditya Vempaty, Jennifer Mari-Wyka, Deepak Akkil,
- Abstract summary: We present an automated math visualizer and solver system for mathematical pedagogy.
The system orchestrates mathematical solvers and math graphing tools to produce accurate visualizations from simple natural language commands.
We describe the creation of specialized data-sets, and also develop an auto-evaluator to easily evaluate the outputs of our system.
- Score: 1.1962302221087486
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There has been significant recent interest in harnessing LLMs to control software systems through multi-step reasoning, planning and tool-usage. While some promising results have been obtained, application to specific domains raises several general issues including the control of specialized domain tools, the lack of existing datasets for training and evaluation, and the non-triviality of automated system evaluation and improvement. In this paper, we present a case-study where we examine these issues in the context of a specific domain. Specifically, we present an automated math visualizer and solver system for mathematical pedagogy. The system orchestrates mathematical solvers and math graphing tools to produce accurate visualizations from simple natural language commands. We describe the creation of specialized data-sets, and also develop an auto-evaluator to easily evaluate the outputs of our system by comparing them to ground-truth expressions. We have open sourced the data-sets and code for the proposed system.
Related papers
- Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning [85.635988711588]
We argue that enhancing the capabilities of large language models requires a paradigm shift in the design of mathematical datasets.
We advocate for mathematical dataset developers to consider the concept of "motivated proof", introduced by G. P'olya in 1949, which can serve as a blueprint for datasets that offer a better proof learning signal.
We provide a questionnaire designed specifically for math datasets that we urge creators to include with their datasets.
arXiv Detail & Related papers (2024-12-19T18:55:17Z) - Learning to Solve Domain-Specific Calculation Problems with Knowledge-Intensive Programs Generator [33.680619900836376]
We propose a pipeline to solve the domain-specific calculation problems with Knowledge-Intensive Programs Generator.
It generates knowledge-intensive programs according to the domain-specific documents.
We also find that the code generator is also adaptable to other domains, without training on the new knowledge.
arXiv Detail & Related papers (2024-12-12T13:42:58Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains [54.117238759317004]
Massive Multitask Agent Understanding (MMAU) benchmark features comprehensive offline tasks that eliminate the need for complex environment setups.
It evaluates models across five domains, including Tool-use, Directed Acyclic Graph (DAG) QA, Data Science and Machine Learning coding, Contest-level programming and Mathematics.
With a total of 20 meticulously designed tasks encompassing over 3K distinct prompts, MMAU provides a comprehensive framework for evaluating the strengths and limitations of LLM agents.
arXiv Detail & Related papers (2024-07-18T00:58:41Z) - MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkit [4.957099360745168]
Large language models (LLMs) have been explored in a variety of reasoning tasks including solving of mathematical problems.
We introduce a comprehensive mathematical evaluation toolkit that not only utilizes a python computer algebra system (CAS) for its numerical accuracy, but also integrates an optional LLM.
arXiv Detail & Related papers (2024-04-22T07:03:44Z) - Machine Learning Augmented Branch and Bound for Mixed Integer Linear
Programming [11.293025183996832]
Mixed Linear Programming (MILP) offers a powerful modeling language for a wide range of applications.
In recent years, there has been an explosive development in the use of machine learning algorithms for enhancing all main tasks involved in the branch-and-bound algorithm.
In particular, we give detailed attention to machine learning algorithms that automatically optimize some metric of branch-and-bound efficiency.
arXiv Detail & Related papers (2024-02-08T09:19:26Z) - Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios [93.68764280953624]
UltraTool is a novel benchmark designed to improve and evaluate Large Language Models' ability in tool utilization.
It emphasizes real-world complexities, demanding accurate, multi-step planning for effective problem-solving.
A key feature of UltraTool is its independent evaluation of planning with natural language, which happens before tool usage.
arXiv Detail & Related papers (2024-01-30T16:52:56Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - Improving Search by Utilizing State Information in OPTIC Planners
Compilation to LP [1.9686770963118378]
Many planners are domain-independent, allowing their deployment in a variety of domains.
These planners perform Forward Search and call a Linear Programming (LP) solver multiple times at every state to check for consistency and to set bounds on the numeric variables.
This paper suggests a method for identifying information about the specific state being evaluated, allowing the formulation of the equations to facilitate better solver selection and faster LP solving.
arXiv Detail & Related papers (2021-06-15T07:23:31Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z) - AHMoSe: A Knowledge-Based Visual Support System for Selecting Regression
Machine Learning Models [2.9998889086656577]
AHMoSe is a visual support system that allows domain experts to better understand, diagnose and compare different regression models.
We describe a use case scenario in the viticulture domain, grape quality prediction, where the system enables users to diagnose and select prediction models that perform better.
arXiv Detail & Related papers (2021-01-28T12:55:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.