Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search
- URL: http://arxiv.org/abs/2502.00955v1
- Date: Sun, 02 Feb 2025 23:20:16 GMT
- Title: Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search
- Authors: Wentao Shi, Zichun Yu, Fuli Feng, Xiangnan He, Chenyan Xiong,
- Abstract summary: We propose Data Influence-oriented Tree Search (DITS) to guide both tree search and data selection.
By leveraging influence scores, we effectively identify the most impactful data for system improvement.
We derive influence score estimation methods tailored for non-differentiable metrics.
- Score: 59.75749613951193
- License:
- Abstract: Monte Carlo Tree Search (MCTS) based methods provide promising approaches for generating synthetic data to enhance the self-training of Large Language Model (LLM) based multi-agent systems (MAS). These methods leverage Q-values to estimate individual agent contributions. However, relying solely on Q-values to identify informative data may misalign with the data synthesis objective, as the focus should be on selecting data that best enhances model training. To address this discrepancy, we propose Data Influence-oriented Tree Search (DITS), a novel framework that incorporates influence scores to guide both tree search and data selection. By leveraging influence scores, we effectively identify the most impactful data for system improvement, thereby enhancing model performance. Furthermore, we derive influence score estimation methods tailored for non-differentiable metrics, significantly reducing computational overhead by utilizing inference computations. Extensive experiments on eight multi-agent datasets demonstrate the robustness and effectiveness of the proposed methods. Notably, our findings reveal that allocating more inference resources to estimate influence scores, rather than Q-values, during data synthesis can more effectively and efficiently enhance model training.
Related papers
- Data-Efficient Pretraining with Group-Level Data Influence Modeling [49.18903821780051]
Group-Level Data Influence Modeling (Group-MATES) is a novel data-efficient pretraining method.
Group-MATES collects oracle group-level influences by locally probing the pretraining model with data sets.
It then fine-tunes a relational data influence model to approximate oracles as relationship-weighted aggregations of individual influences.
arXiv Detail & Related papers (2025-02-20T16:34:46Z) - Capturing the Temporal Dependence of Training Data Influence [100.91355498124527]
We formalize the concept of trajectory-specific leave-one-out influence, which quantifies the impact of removing a data point during training.
We propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO.
As data value embedding captures training data ordering, it offers valuable insights into model training dynamics.
arXiv Detail & Related papers (2024-12-12T18:28:55Z) - Optimizing LLMs with Direct Preferences: A Data Efficiency Perspective [4.548047308860141]
This study investigates the impact of different type of preference data on model performance.
It aims to reduce their dependency on extensive amounts of preference data, which is expensive to collect.
arXiv Detail & Related papers (2024-10-22T00:11:41Z) - Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs)
Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws.
Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - C-XGBoost: A tree boosting model for causal effect estimation [8.246161706153805]
Causal effect estimation aims at estimating the Average Treatment Effect as well as the Conditional Average Treatment Effect of a treatment to an outcome from the available data.
We propose a new causal inference model, named C-XGBoost, for the prediction of potential outcomes.
arXiv Detail & Related papers (2024-03-31T17:43:37Z) - Striving for data-model efficiency: Identifying data externalities on
group performance [75.17591306911015]
Building trustworthy, effective, and responsible machine learning systems hinges on understanding how differences in training data and modeling decisions interact to impact predictive performance.
We focus on a particular type of data-model inefficiency, in which adding training data from some sources can actually lower performance evaluated on key sub-groups of the population.
Our results indicate that data-efficiency is a key component of both accurate and trustworthy machine learning.
arXiv Detail & Related papers (2022-11-11T16:48:27Z) - Improving Data Quality with Training Dynamics of Gradient Boosting
Decision Trees [1.5605040219256345]
We propose a method based on metrics from training dynamics of Gradient Boosting Decision Trees (GBDTs) to assess the behavior of each training example.
We show results on detecting noisy labels in order clean datasets, improving models' metrics in synthetic and real public datasets, as well as on a industry case in which we deployed a model based on the proposed solution.
arXiv Detail & Related papers (2022-10-20T15:02:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.