PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods
- URL: http://arxiv.org/abs/2407.06985v4
- Date: Fri, 30 Aug 2024 06:27:58 GMT
- Title: PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods
- Authors: Yiying Wang, Xiaojing Li, Binzhu Wang, Yueyang Zhou, Yingru Lin, Han Ji, Hong Chen, Jinshi Zhang, Fei Yu, Zewei Zhao, Song Jin, Renji Gong, Wanqing Xu,
- Abstract summary: GPT-4 shows notable potential but faces the critical tri-lemma of performance, cost, and data privacy.
We introduce the PEER (Plan, Execute, Express, Review) multi-agent framework.
This systematizes domain-specific tasks by integrating precise question decomposition, advanced information retrieval, comprehensive summarization, and rigorous self-assessment.
- Score: 9.604121358026303
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In domain-specific applications, GPT-4, augmented with precise prompts or Retrieval-Augmented Generation (RAG), shows notable potential but faces the critical tri-lemma of performance, cost, and data privacy. High performance requires sophisticated processing techniques, yet managing multiple agents within a complex workflow often proves costly and challenging. To address this, we introduce the PEER (Plan, Execute, Express, Review) multi-agent framework. This systematizes domain-specific tasks by integrating precise question decomposition, advanced information retrieval, comprehensive summarization, and rigorous self-assessment. Given the concerns of cost and data privacy, enterprises are shifting from proprietary models like GPT-4 to custom models, striking a balance between cost, security, and performance. We developed industrial practices leveraging online data and user feedback for efficient model tuning. This study provides best practice guidelines for applying multi-agent systems in domain-specific problem-solving and implementing effective agent tuning strategies. Our empirical studies, particularly in the financial question-answering domain, demonstrate that our approach achieves 95.0% of GPT-4's performance, while effectively managing costs and ensuring data privacy.
Related papers
- Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets.
The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method.
The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z) - The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving [8.552242818726347]
INFERMAX is an analytical framework that uses inference cost models to compare various schedulers.
Our findings indicate that preempting requests can reduce GPU costs by 30% compared to avoiding preemptions at all.
arXiv Detail & Related papers (2024-11-12T00:10:34Z) - CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments [90.29937153770835]
We introduce CRMArena, a benchmark designed to evaluate AI agents on realistic tasks grounded in professional work environments.
We show that state-of-the-art LLM agents succeed in less than 40% of the tasks with ReAct prompting, and less than 55% even with function-calling abilities.
Our findings highlight the need for enhanced agent capabilities in function-calling and rule-following to be deployed in real-world work environments.
arXiv Detail & Related papers (2024-11-04T17:30:51Z) - Towards Human-Level Understanding of Complex Process Engineering Schematics: A Pedagogical, Introspective Multi-Agent Framework for Open-Domain Question Answering [0.0]
In the chemical and process industries, Process Flow Diagrams (PFDs) and Piping and Instrumentation Diagrams (P&IDs) are critical for design, construction, and maintenance.
Recent advancements in Generative AI have shown promise in understanding and interpreting process diagrams for Visual Question Answering (VQA)
We propose a secure, on-premises enterprise solution using a hierarchical, multi-agent Retrieval Augmented Generation (RAG) framework.
arXiv Detail & Related papers (2024-08-24T19:34:04Z) - MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains [54.117238759317004]
Massive Multitask Agent Understanding (MMAU) benchmark features comprehensive offline tasks that eliminate the need for complex environment setups.
It evaluates models across five domains, including Tool-use, Directed Acyclic Graph (DAG) QA, Data Science and Machine Learning coding, Contest-level programming and Mathematics.
With a total of 20 meticulously designed tasks encompassing over 3K distinct prompts, MMAU provides a comprehensive framework for evaluating the strengths and limitations of LLM agents.
arXiv Detail & Related papers (2024-07-18T00:58:41Z) - Enhancing the Efficiency and Accuracy of Underlying Asset Reviews in Structured Finance: The Application of Multi-agent Framework [3.022596401099308]
We show that AI can automate the verification of information between loan applications and bank statements effectively.
This research highlights AI's potential to minimize manual errors and streamline due diligence, suggesting a broader application of AI in financial document analysis and risk management.
arXiv Detail & Related papers (2024-05-07T13:09:49Z) - Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities.
When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z) - Dial-insight: Fine-tuning Large Language Models with High-Quality Domain-Specific Data Preventing Capability Collapse [4.98050508891467]
We propose a two-stage approach for the construction of production prompts designed to yield high-quality data.
This method involves the generation of a diverse array of prompts that encompass a broad spectrum of tasks and exhibit a rich variety of expressions.
We introduce a cost-effective, multi-dimensional quality assessment framework to ensure the integrity of the generated labeling data.
arXiv Detail & Related papers (2024-03-14T08:27:32Z) - Physics-Aware Multifidelity Bayesian Optimization: a Generalized Formulation [0.0]
Multifidelity Bayesian methods (MFBO) allow to include costly high-fidelity responses for a sub-selection of queries only.
State-of-the-art methods rely on a purely data-driven search and do not include explicit information about the physical context.
This paper acknowledges that prior knowledge about the physical domains of engineering problems can be leveraged to accelerate these data-driven searches.
arXiv Detail & Related papers (2023-12-10T09:11:53Z) - Optimal Event Monitoring through Internet Mashup over Multivariate Time
Series [77.34726150561087]
This framework supports the services of model definitions, querying, parameter learning, model evaluations, data monitoring, decision recommendations, and web portals.
We further extend the MTSA data model and query language to support this class of problems for the services of learning, monitoring, and recommendation.
arXiv Detail & Related papers (2022-10-18T16:56:17Z) - Reinforcement Learning with Efficient Active Feature Acquisition [59.91808801541007]
In real-life, information acquisition might correspond to performing a medical test on a patient.
We propose a model-based reinforcement learning framework that learns an active feature acquisition policy.
Key to the success is a novel sequential variational auto-encoder that learns high-quality representations from partially observed states.
arXiv Detail & Related papers (2020-11-02T08:46:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.