MathViz-E: A Case-study in Domain-Specialized Tool-Using Agents
- URL: http://arxiv.org/abs/2407.17544v1
- Date: Wed, 24 Jul 2024 15:45:07 GMT
- Title: MathViz-E: A Case-study in Domain-Specialized Tool-Using Agents
- Authors: Arya Bulusu, Brandon Man, Ashish Jagmohan, Aditya Vempaty, Jennifer Mari-Wyka, Deepak Akkil,
- Abstract summary: We present an automated math visualizer and solver system for mathematical pedagogy.
The system orchestrates mathematical solvers and math graphing tools to produce accurate visualizations from simple natural language commands.
We describe the creation of specialized data-sets, and also develop an auto-evaluator to easily evaluate the outputs of our system.
- Score: 1.1962302221087486
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There has been significant recent interest in harnessing LLMs to control software systems through multi-step reasoning, planning and tool-usage. While some promising results have been obtained, application to specific domains raises several general issues including the control of specialized domain tools, the lack of existing datasets for training and evaluation, and the non-triviality of automated system evaluation and improvement. In this paper, we present a case-study where we examine these issues in the context of a specific domain. Specifically, we present an automated math visualizer and solver system for mathematical pedagogy. The system orchestrates mathematical solvers and math graphing tools to produce accurate visualizations from simple natural language commands. We describe the creation of specialized data-sets, and also develop an auto-evaluator to easily evaluate the outputs of our system by comparing them to ground-truth expressions. We have open sourced the data-sets and code for the proposed system.
Related papers
- Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkit [4.957099360745168]
Large language models (LLMs) have been explored in a variety of reasoning tasks including solving of mathematical problems.
We introduce a comprehensive mathematical evaluation toolkit that not only utilizes a python computer algebra system (CAS) for its numerical accuracy, but also integrates an optional LLM.
arXiv Detail & Related papers (2024-04-22T07:03:44Z) - Towards MLOps: A DevOps Tools Recommender System for Machine Learning
System [1.065497990128313]
MLOps and machine learning systems evolve on new data unlike traditional systems on requirements.
In this paper, we present a framework for recommendation system that processes the contextual information.
Four different approaches i.e., rule-based, random forest, decision trees and k-nearest neighbors were investigated.
arXiv Detail & Related papers (2024-02-20T09:57:49Z) - Machine Learning Augmented Branch and Bound for Mixed Integer Linear
Programming [11.293025183996832]
Mixed Linear Programming (MILP) offers a powerful modeling language for a wide range of applications.
In recent years, there has been an explosive development in the use of machine learning algorithms for enhancing all main tasks involved in the branch-and-bound algorithm.
In particular, we give detailed attention to machine learning algorithms that automatically optimize some metric of branch-and-bound efficiency.
arXiv Detail & Related papers (2024-02-08T09:19:26Z) - Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios [93.68764280953624]
UltraTool is a novel benchmark designed to improve and evaluate Large Language Models' ability in tool utilization.
It emphasizes real-world complexities, demanding accurate, multi-step planning for effective problem-solving.
A key feature of UltraTool is its independent evaluation of planning with natural language, which happens before tool usage.
arXiv Detail & Related papers (2024-01-30T16:52:56Z) - ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving [170.7899683843177]
ToRA is a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems.
ToRA models significantly outperform open-source models on 10 mathematical reasoning datasets across all scales.
ToRA-Code-34B is the first open-source model that achieves an accuracy exceeding 50% on MATH.
arXiv Detail & Related papers (2023-09-29T17:59:38Z) - A Graphical Modeling Language for Artificial Intelligence Applications
in Automation Systems [69.50862982117127]
An interdisciplinary graphical modeling language that enables the modeling of an AI application as an overall system comprehensible to all disciplines does not yet exist.
This paper presents a graphical modeling language that enables consistent and understandable modeling of AI applications in automation systems at system level.
arXiv Detail & Related papers (2023-06-20T12:06:41Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - Improving Search by Utilizing State Information in OPTIC Planners
Compilation to LP [1.9686770963118378]
Many planners are domain-independent, allowing their deployment in a variety of domains.
These planners perform Forward Search and call a Linear Programming (LP) solver multiple times at every state to check for consistency and to set bounds on the numeric variables.
This paper suggests a method for identifying information about the specific state being evaluated, allowing the formulation of the equations to facilitate better solver selection and faster LP solving.
arXiv Detail & Related papers (2021-06-15T07:23:31Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z) - AHMoSe: A Knowledge-Based Visual Support System for Selecting Regression
Machine Learning Models [2.9998889086656577]
AHMoSe is a visual support system that allows domain experts to better understand, diagnose and compare different regression models.
We describe a use case scenario in the viticulture domain, grape quality prediction, where the system enables users to diagnose and select prediction models that perform better.
arXiv Detail & Related papers (2021-01-28T12:55:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.