Related papers: MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification

MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification

URL: http://arxiv.org/abs/2412.04494v2
Date: Fri, 10 Jan 2025 22:22:41 GMT
Title: MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification
Authors: Saptarshi Sengupta, Harsh Vashistha, Kristal Curtis, Akshay Mallipeddi, Abhinav Mathur, Joseph Ross, Liang Gou,
Abstract summary: MAG-V is a framework to generate a dataset of questions that mimic customer queries.<n>Our synthetic data can improve agent performance on actual customer queries.
Score: 5.666070277424383
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Extending the capabilities of Large Language Models (LLMs) with functions or tools for environment interaction has led to the emergence of the agent paradigm. In industry, training an LLM is not always feasible because of the scarcity of domain data, legal holds on proprietary customer data, rapidly changing business requirements, and the need to prototype new assistants. Agents provide an elegant solution to the above by relying on the zero-shot reasoning abilities of the underlying LLM and utilizing tools to explore and reason over customer data and respond to user requests. However, there are two concerns here: (I) acquiring large scale customer queries for agent testing is time-consuming, and (II) high reliance on the tool call sequence (or trajectory) followed by the agent to respond to user queries may lead to unexpected or incorrect behavior. To address this, we propose MAG-V, a multi-agent framework to first generate a dataset of questions that mimic customer queries; and second, reverse-engineer alternate questions from the responses for trajectory verification. Initial results indicate that our synthetic data can improve agent performance on actual customer queries. Furthermore, our trajectory verification methodology, inspired by distant supervision and using traditional machine learning (ML) models, outperforms a GPT-4o judge baseline by 11% accuracy and matches the performance of a GPT-4 judge on our constructed dataset. Overall, our approach is a step towards unifying diverse task agents into a cohesive framework for achieving an aligned objective.

Related papers

AgenticData: An Agentic Data Analytics System for Heterogeneous Data [12.67277567222908]
AgenticData is an agentic data analytics system that allows users to pose natural language (NL) questions while autonomously analyzing data sources across multiple domains.<n>We propose a multi-agent collaboration strategy by utilizing a data profiling agent for discovering relevant data, a semantic cross-validation agent for iterative optimization based on feedback, and a smart memory agent for maintaining short-term context.
arXiv Detail & Related papers (2025-08-07T03:33:59Z)
Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification [17.67273082468732]
Verifiers -- functions assigning rewards to agent behavior -- have been key for AI progress in domains like math and board games.<n>We evaluate Multimodal Large Language Models (MLLMs) as verifiers of agent trajectories across web navigation, computer use, and robotic manipulation.<n>We propose Self-Grounded Verification (SGV), a lightweight method that enables more effective use of MLLMs' knowledge and reasoning.
arXiv Detail & Related papers (2025-07-15T18:50:29Z)
LAM SIMULATOR: Advancing Data Generation for Large Action Model Training via Online Exploration and Trajectory Feedback [121.78866929908871]
Large Action Models (LAMs) for AI Agents offer incredible potential but face challenges due to the need for high-quality training data.<n>We present LAM SIMULATOR, a comprehensive framework designed for online exploration of agentic tasks with high-quality feedback.<n>Our framework features a dynamic task query generator, an extensive collection of tools, and an interactive environment where Large Language Model (LLM) Agents can call tools and receive real-time feedback.
arXiv Detail & Related papers (2025-06-02T22:36:02Z)
Multi-Mission Tool Bench: Assessing the Robustness of LLM based Agents through Related and Dynamic Missions [12.218102495632937]
Large language models (LLMs) demonstrate strong potential as agents for tool invocation due to their advanced comprehension and planning capabilities. We propose the Multi-Mission Tool Bench. In the benchmark, each test case comprises multiple interrelated missions. We also propose a novel method to evaluate the accuracy and efficiency of agent decisions with dynamic decision trees.
arXiv Detail & Related papers (2025-04-03T14:21:33Z)
Agent-centric Information Access [21.876205078570507]
Large language models (LLMs) become more specialized, each trained on proprietary data and excelling in specific domains. This paper introduces a framework for agent-centric information access, where LLMs function as knowledge agents that are dynamically ranked and queried based on their demonstrated expertise. We propose a scalable evaluation framework that leverages retrieval-augmented generation and clustering techniques to construct and assess thousands of specialized models, with the potential to scale toward millions.
arXiv Detail & Related papers (2025-02-26T16:56:19Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks. However, they still struggle with problems requiring multi-step decision-making and environmental feedback. We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets. The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method. The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z)
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space. AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z)
RAG based Question-Answering for Contextual Response Prediction System [0.4660328753262075]
Large Language Models (LLMs) have shown versatility in various Natural Language Processing (NLP) tasks. Retrieval Augmented Generation (RAG) emerges as a promising technique to address this challenge. This paper introduces an end-to-end framework that employs LLMs with RAG capabilities for industry use cases.
arXiv Detail & Related papers (2024-09-05T17:14:23Z)
AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning [93.96463520716759]
Large language model (LLM) agents have demonstrated impressive capabilities in utilizing external tools and knowledge to boost accuracy and hallucinations. Here, we introduce AvaTaR, a novel and automated framework that optimize an LLM agent to effectively leverage provided tools, improving performance on a given task.
arXiv Detail & Related papers (2024-06-17T04:20:02Z)
RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent [15.836845304125436]
RS-Agent is an AI agent designed to interact with human users and autonomously leverage specialized models.<n> RS-Agent integrates four key components: a Central Controller based on large language models, a dynamic toolkit for tool execution, a Solution Space for task-specific expert guidance, and a Knowledge Space for domain-level reasoning.<n>Extensive experiments across 9 datasets and 18 remote sensing tasks demonstrate that RS-Agent significantly outperforms state-of-the-art MLLMs.
arXiv Detail & Related papers (2024-06-11T09:30:02Z)
Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities. When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z)
Learning to Use Tools via Cooperative and Interactive Agents [58.77710337157665]
Tool learning empowers large language models (LLMs) as agents to use external tools and extend their utility. We propose ConAgents, a Cooperative and interactive Agents framework, which coordinates three specialized agents for tool selection, tool execution, and action calibration separately. Our experiments on three datasets show that the LLMs, when equipped with ConAgents, outperform baselines with substantial improvement.
arXiv Detail & Related papers (2024-03-05T15:08:16Z)
MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization [86.61052121715689]
MatPlotAgent is a model-agnostic framework designed to automate scientific data visualization tasks. MatPlotBench is a high-quality benchmark consisting of 100 human-verified test cases.
arXiv Detail & Related papers (2024-02-18T04:28:28Z)
Hallucination-minimized Data-to-answer Framework for Financial Decision-makers [1.3781777926017094]
Large Language Models (LLMs) have been applied to build several automation and personalized question-answering prototypes so far. We present a novel Langchain-based framework that transforms data tables into hierarchical textual data chunks to enable a wide variety of actionable question answering.
arXiv Detail & Related papers (2023-11-09T22:53:52Z)
ChatGPT as your Personal Data Scientist [0.9689893038619583]
This paper introduces a ChatGPT-based conversational data-science framework to act as a "personal data scientist" Our model pivots around four dialogue states: Data visualization, Task Formulation, Prediction Engineering, and Result Summary and Recommendation. In summary, we developed an end-to-end system that not only proves the viability of the novel concept of conversational data science but also underscores the potency of LLMs in solving complex tasks.
arXiv Detail & Related papers (2023-05-23T04:00:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.