Related papers: PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods

Related papers

Efficient Multimodal Planning Agent for Visual Question-Answering [67.26245301307539]
This paper proposes a method that trains a multimodal planning agent, dynamically decomposing the mRAG pipeline to solve the VQA task.<n>In our experiments, the agent can help reduce redundant computations, cutting search time by over 60% compared to existing methods.
arXiv Detail & Related papers (2026-01-28T14:58:59Z)
Reliable LLM-Based Edge-Cloud-Expert Cascades for Telecom Knowledge Systems [54.916243942641444]
Large language models (LLMs) are emerging as key enablers of automation in domains such as telecommunications.<n>We study an edge-cloud-expert cascaded LLM-based knowledge system that supports decision-making through a question-and-answer pipeline.
arXiv Detail & Related papers (2025-12-23T03:10:09Z)
Building Domain-Specific Small Language Models via Guided Data Generation [5.404790079646315]
Large Language Models (LLMs) have shown remarkable success in supporting a wide range of knowledge-intensive tasks.<n>In specialized domains, there is growing interest in leveraging LLMs to assist subject matter experts with domain-specific challenges.<n>Many open-source models demand significant computational resources for effective domain adaptation and deployment.<n>We present a cost-efficient and scalable training pipeline that combines guided synthetic data generation from a small seed corpus with bottom-up domain data.
arXiv Detail & Related papers (2025-11-23T07:19:31Z)
AstuteRAG-FQA: Task-Aware Retrieval-Augmented Generation Framework for Proprietary Data Challenges in Financial Question Answering [0.0]
We introduce AstuteRAG-FQA, an adaptive RAG framework tailored for Financial Question Answering (FQA)<n>We propose a four-tier task classification: explicit factual, implicit factual, interpretable rationale, and hidden rationale involving implicit causal reasoning.<n>The framework incorporates multi-layered security mechanisms including differential privacy, data anonymization, and role-based access controls to protect sensitive financial information.
arXiv Detail & Related papers (2025-10-31T15:13:03Z)
ORMind: A Cognitive-Inspired End-to-End Reasoning Framework for Operations Research [53.736407871322314]
We introduce ORMind, a cognitive-inspired framework that enhances optimization through counterfactual reasoning.<n>Our approach emulates human cognition, implementing an end-to-end workflow that transforms requirements into mathematical models and executable code.<n>It is currently being tested internally in Lenovo's AI Assistant, with plans to enhance optimization capabilities for both business and consumer customers.
arXiv Detail & Related papers (2025-06-02T05:11:21Z)
Hard Negative Mining for Domain-Specific Retrieval in Enterprise Systems [2.4830284216463]
We propose a scalable hard-negative mining framework tailored specifically for domain-specific enterprise data.<n>Our approach dynamically selects semantically challenging but contextually irrelevant documents to enhance deployed re-ranking models.
arXiv Detail & Related papers (2025-05-23T20:51:20Z)
Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection [71.92083784393418]
Inference-time methods such as Best-of-N (BON) sampling offer a simple yet effective alternative to improve performance. We propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework [81.29965270493238]
We develop a specialized dataset aimed at enhancing the evaluation and fine-tuning of large language models (LLMs) for wireless communication applications. The dataset includes a diverse set of multi-hop questions, including true/false and multiple-choice types, spanning varying difficulty levels from easy to hard. We introduce a Pointwise V-Information (PVI) based fine-tuning method, providing a detailed theoretical analysis and justification for its use in quantifying the information content of training data.
arXiv Detail & Related papers (2025-01-16T16:19:53Z)
MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification [5.666070277424383]
MAG-V is a framework to generate a dataset of questions that mimic customer queries. Our synthetic data can improve agent performance on actual customer queries.
arXiv Detail & Related papers (2024-11-28T19:36:11Z)
Multi-modal Retrieval Augmented Multi-modal Generation: Datasets, Evaluation Metrics and Strong Baselines [64.61315565501681]
Multi-modal Retrieval Augmented Multi-modal Generation (M$2$RAG) is a novel task that enables foundation models to process multi-modal web content. Despite its potential impact, M$2$RAG remains understudied, lacking comprehensive analysis and high-quality data resources.
arXiv Detail & Related papers (2024-11-25T13:20:19Z)
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets. The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method. The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z)
The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving [8.552242818726347]
INFERMAX is an analytical framework that uses inference cost models to compare various schedulers. Our findings indicate that preempting requests can reduce GPU costs by 30% compared to avoiding preemptions at all.
arXiv Detail & Related papers (2024-11-12T00:10:34Z)
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments [90.29937153770835]
We introduce CRMArena, a benchmark designed to evaluate AI agents on realistic tasks grounded in professional work environments. We show that state-of-the-art LLM agents succeed in less than 40% of the tasks with ReAct prompting, and less than 55% even with function-calling abilities. Our findings highlight the need for enhanced agent capabilities in function-calling and rule-following to be deployed in real-world work environments.
arXiv Detail & Related papers (2024-11-04T17:30:51Z)
MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services [94.61039892220037]
We propose an immersion-aware model trading framework that facilitates data provision for services while ensuring privacy through federated learning (FL) We design an incentive mechanism to incentivize metaverse users (MUs) to contribute high-value models under resource constraints. We develop a fully distributed dynamic reward algorithm based on deep reinforcement learning, without accessing any private information about MUs and other MSPs.
arXiv Detail & Related papers (2024-10-25T16:20:46Z)
Optimizing Cross-Client Domain Coverage for Federated Instruction Tuning of Large Language Models [87.49293964617128]
Federated domain-specific instruction tuning (FedDIT) for large language models (LLMs) aims to enhance performance in specialized domains using distributed private and limited data.<n>We empirically establish that cross-client domain coverage, rather than data heterogeneity, is the pivotal factor.<n>We introduce FedDCA, an algorithm that explicitly maximizes this coverage through diversity-oriented client center selection and retrieval-based augmentation.
arXiv Detail & Related papers (2024-09-30T09:34:31Z)
Towards Human-Level Understanding of Complex Process Engineering Schematics: A Pedagogical, Introspective Multi-Agent Framework for Open-Domain Question Answering [0.0]
In the chemical and process industries, Process Flow Diagrams (PFDs) and Piping and Instrumentation Diagrams (P&IDs) are critical for design, construction, and maintenance. Recent advancements in Generative AI have shown promise in understanding and interpreting process diagrams for Visual Question Answering (VQA) We propose a secure, on-premises enterprise solution using a hierarchical, multi-agent Retrieval Augmented Generation (RAG) framework.
arXiv Detail & Related papers (2024-08-24T19:34:04Z)
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains [54.117238759317004]
Massive Multitask Agent Understanding (MMAU) benchmark features comprehensive offline tasks that eliminate the need for complex environment setups. It evaluates models across five domains, including Tool-use, Directed Acyclic Graph (DAG) QA, Data Science and Machine Learning coding, Contest-level programming and Mathematics. With a total of 20 meticulously designed tasks encompassing over 3K distinct prompts, MMAU provides a comprehensive framework for evaluating the strengths and limitations of LLM agents.
arXiv Detail & Related papers (2024-07-18T00:58:41Z)
Enhancing the Efficiency and Accuracy of Underlying Asset Reviews in Structured Finance: The Application of Multi-agent Framework [3.022596401099308]
We show that AI can automate the verification of information between loan applications and bank statements effectively. This research highlights AI's potential to minimize manual errors and streamline due diligence, suggesting a broader application of AI in financial document analysis and risk management.
arXiv Detail & Related papers (2024-05-07T13:09:49Z)
Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities. When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z)
Dial-insight: Fine-tuning Large Language Models with High-Quality Domain-Specific Data Preventing Capability Collapse [4.98050508891467]
We propose a two-stage approach for the construction of production prompts designed to yield high-quality data. This method involves the generation of a diverse array of prompts that encompass a broad spectrum of tasks and exhibit a rich variety of expressions. We introduce a cost-effective, multi-dimensional quality assessment framework to ensure the integrity of the generated labeling data.
arXiv Detail & Related papers (2024-03-14T08:27:32Z)
Physics-Aware Multifidelity Bayesian Optimization: a Generalized Formulation [0.0]
Multifidelity Bayesian methods (MFBO) allow to include costly high-fidelity responses for a sub-selection of queries only. State-of-the-art methods rely on a purely data-driven search and do not include explicit information about the physical context. This paper acknowledges that prior knowledge about the physical domains of engineering problems can be leveraged to accelerate these data-driven searches.
arXiv Detail & Related papers (2023-12-10T09:11:53Z)
Optimal Event Monitoring through Internet Mashup over Multivariate Time Series [77.34726150561087]
This framework supports the services of model definitions, querying, parameter learning, model evaluations, data monitoring, decision recommendations, and web portals. We further extend the MTSA data model and query language to support this class of problems for the services of learning, monitoring, and recommendation.
arXiv Detail & Related papers (2022-10-18T16:56:17Z)
Reinforcement Learning with Efficient Active Feature Acquisition [59.91808801541007]
In real-life, information acquisition might correspond to performing a medical test on a patient. We propose a model-based reinforcement learning framework that learns an active feature acquisition policy. Key to the success is a novel sequential variational auto-encoder that learns high-quality representations from partially observed states.
arXiv Detail & Related papers (2020-11-02T08:46:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.