MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification
- URL: http://arxiv.org/abs/2412.04494v2
- Date: Fri, 10 Jan 2025 22:22:41 GMT
- Title: MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification
- Authors: Saptarshi Sengupta, Harsh Vashistha, Kristal Curtis, Akshay Mallipeddi, Abhinav Mathur, Joseph Ross, Liang Gou,
- Abstract summary: MAG-V is a framework to generate a dataset of questions that mimic customer queries.
Our synthetic data can improve agent performance on actual customer queries.
- Score: 5.666070277424383
- License:
- Abstract: Extending the capabilities of Large Language Models (LLMs) with functions or tools for environment interaction has led to the emergence of the agent paradigm. In industry, training an LLM is not always feasible because of the scarcity of domain data, legal holds on proprietary customer data, rapidly changing business requirements, and the need to prototype new assistants. Agents provide an elegant solution to the above by relying on the zero-shot reasoning abilities of the underlying LLM and utilizing tools to explore and reason over customer data and respond to user requests. However, there are two concerns here: (I) acquiring large scale customer queries for agent testing is time-consuming, and (II) high reliance on the tool call sequence (or trajectory) followed by the agent to respond to user queries may lead to unexpected or incorrect behavior. To address this, we propose MAG-V, a multi-agent framework to first generate a dataset of questions that mimic customer queries; and second, reverse-engineer alternate questions from the responses for trajectory verification. Initial results indicate that our synthetic data can improve agent performance on actual customer queries. Furthermore, our trajectory verification methodology, inspired by distant supervision and using traditional machine learning (ML) models, outperforms a GPT-4o judge baseline by 11% accuracy and matches the performance of a GPT-4 judge on our constructed dataset. Overall, our approach is a step towards unifying diverse task agents into a cohesive framework for achieving an aligned objective.
Related papers
- Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.
However, they still struggle with problems requiring multi-step decision-making and environmental feedback.
We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets.
The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method.
The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z) - AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space.
AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z) - RAG based Question-Answering for Contextual Response Prediction System [0.4660328753262075]
Large Language Models (LLMs) have shown versatility in various Natural Language Processing (NLP) tasks.
Retrieval Augmented Generation (RAG) emerges as a promising technique to address this challenge.
This paper introduces an end-to-end framework that employs LLMs with RAG capabilities for industry use cases.
arXiv Detail & Related papers (2024-09-05T17:14:23Z) - AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning [93.96463520716759]
Large language model (LLM) agents have demonstrated impressive capabilities in utilizing external tools and knowledge to boost accuracy and hallucinations.
Here, we introduce AvaTaR, a novel and automated framework that optimize an LLM agent to effectively leverage provided tools, improving performance on a given task.
arXiv Detail & Related papers (2024-06-17T04:20:02Z) - DCA-Bench: A Benchmark for Dataset Curation Agents [9.60250892491588]
We propose a dataset curation agent benchmark, DCA-Bench, to measure large language models' capability of detecting hidden dataset quality issues.
Specifically, we collect diverse real-world dataset quality issues from eight open dataset platforms as a testbed.
The proposed benchmark can also serve as a testbed for measuring the capability of LLMs in problem discovery rather than just problem-solving.
arXiv Detail & Related papers (2024-06-11T14:02:23Z) - Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities.
When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z) - MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization [86.61052121715689]
MatPlotAgent is a model-agnostic framework designed to automate scientific data visualization tasks.
MatPlotBench is a high-quality benchmark consisting of 100 human-verified test cases.
arXiv Detail & Related papers (2024-02-18T04:28:28Z) - Hallucination-minimized Data-to-answer Framework for Financial
Decision-makers [1.3781777926017094]
Large Language Models (LLMs) have been applied to build several automation and personalized question-answering prototypes so far.
We present a novel Langchain-based framework that transforms data tables into hierarchical textual data chunks to enable a wide variety of actionable question answering.
arXiv Detail & Related papers (2023-11-09T22:53:52Z) - ChatGPT as your Personal Data Scientist [0.9689893038619583]
This paper introduces a ChatGPT-based conversational data-science framework to act as a "personal data scientist"
Our model pivots around four dialogue states: Data visualization, Task Formulation, Prediction Engineering, and Result Summary and Recommendation.
In summary, we developed an end-to-end system that not only proves the viability of the novel concept of conversational data science but also underscores the potency of LLMs in solving complex tasks.
arXiv Detail & Related papers (2023-05-23T04:00:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.