MathViz-E: A Case-study in Domain-Specialized Tool-Using Agents
- URL: http://arxiv.org/abs/2407.17544v1
- Date: Wed, 24 Jul 2024 15:45:07 GMT
- Title: MathViz-E: A Case-study in Domain-Specialized Tool-Using Agents
- Authors: Arya Bulusu, Brandon Man, Ashish Jagmohan, Aditya Vempaty, Jennifer Mari-Wyka, Deepak Akkil,
- Abstract summary: We present an automated math visualizer and solver system for mathematical pedagogy.
The system orchestrates mathematical solvers and math graphing tools to produce accurate visualizations from simple natural language commands.
We describe the creation of specialized data-sets, and also develop an auto-evaluator to easily evaluate the outputs of our system.
- Score: 1.1962302221087486
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There has been significant recent interest in harnessing LLMs to control software systems through multi-step reasoning, planning and tool-usage. While some promising results have been obtained, application to specific domains raises several general issues including the control of specialized domain tools, the lack of existing datasets for training and evaluation, and the non-triviality of automated system evaluation and improvement. In this paper, we present a case-study where we examine these issues in the context of a specific domain. Specifically, we present an automated math visualizer and solver system for mathematical pedagogy. The system orchestrates mathematical solvers and math graphing tools to produce accurate visualizations from simple natural language commands. We describe the creation of specialized data-sets, and also develop an auto-evaluator to easily evaluate the outputs of our system by comparing them to ground-truth expressions. We have open sourced the data-sets and code for the proposed system.
Related papers
- SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains [54.117238759317004]
Massive Multitask Agent Understanding (MMAU) benchmark features comprehensive offline tasks that eliminate the need for complex environment setups.
It evaluates models across five domains, including Tool-use, Directed Acyclic Graph (DAG) QA, Data Science and Machine Learning coding, Contest-level programming and Mathematics.
With a total of 20 meticulously designed tasks encompassing over 3K distinct prompts, MMAU provides a comprehensive framework for evaluating the strengths and limitations of LLM agents.
arXiv Detail & Related papers (2024-07-18T00:58:41Z) - MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkit [4.957099360745168]
Large language models (LLMs) have been explored in a variety of reasoning tasks including solving of mathematical problems.
We introduce a comprehensive mathematical evaluation toolkit that not only utilizes a python computer algebra system (CAS) for its numerical accuracy, but also integrates an optional LLM.
arXiv Detail & Related papers (2024-04-22T07:03:44Z) - Towards MLOps: A DevOps Tools Recommender System for Machine Learning
System [1.065497990128313]
MLOps and machine learning systems evolve on new data unlike traditional systems on requirements.
In this paper, we present a framework for recommendation system that processes the contextual information.
Four different approaches i.e., rule-based, random forest, decision trees and k-nearest neighbors were investigated.
arXiv Detail & Related papers (2024-02-20T09:57:49Z) - Machine Learning Augmented Branch and Bound for Mixed Integer Linear
Programming [11.293025183996832]
Mixed Linear Programming (MILP) offers a powerful modeling language for a wide range of applications.
In recent years, there has been an explosive development in the use of machine learning algorithms for enhancing all main tasks involved in the branch-and-bound algorithm.
In particular, we give detailed attention to machine learning algorithms that automatically optimize some metric of branch-and-bound efficiency.
arXiv Detail & Related papers (2024-02-08T09:19:26Z) - Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios [93.68764280953624]
UltraTool is a novel benchmark designed to improve and evaluate Large Language Models' ability in tool utilization.
It emphasizes real-world complexities, demanding accurate, multi-step planning for effective problem-solving.
A key feature of UltraTool is its independent evaluation of planning with natural language, which happens before tool usage.
arXiv Detail & Related papers (2024-01-30T16:52:56Z) - A Graphical Modeling Language for Artificial Intelligence Applications
in Automation Systems [69.50862982117127]
An interdisciplinary graphical modeling language that enables the modeling of an AI application as an overall system comprehensible to all disciplines does not yet exist.
This paper presents a graphical modeling language that enables consistent and understandable modeling of AI applications in automation systems at system level.
arXiv Detail & Related papers (2023-06-20T12:06:41Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - Improving Search by Utilizing State Information in OPTIC Planners
Compilation to LP [1.9686770963118378]
Many planners are domain-independent, allowing their deployment in a variety of domains.
These planners perform Forward Search and call a Linear Programming (LP) solver multiple times at every state to check for consistency and to set bounds on the numeric variables.
This paper suggests a method for identifying information about the specific state being evaluated, allowing the formulation of the equations to facilitate better solver selection and faster LP solving.
arXiv Detail & Related papers (2021-06-15T07:23:31Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z) - AHMoSe: A Knowledge-Based Visual Support System for Selecting Regression
Machine Learning Models [2.9998889086656577]
AHMoSe is a visual support system that allows domain experts to better understand, diagnose and compare different regression models.
We describe a use case scenario in the viticulture domain, grape quality prediction, where the system enables users to diagnose and select prediction models that perform better.
arXiv Detail & Related papers (2021-01-28T12:55:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.