Related papers: Debate, Reflect, and Distill: Multi-Agent Feedback with Tree-Structured Preference Optimization for Efficient Language Model Enhancement

Debate, Reflect, and Distill: Multi-Agent Feedback with Tree-Structured Preference Optimization for Efficient Language Model Enhancement

URL: http://arxiv.org/abs/2506.03541v1
Date: Wed, 04 Jun 2025 03:52:20 GMT
Title: Debate, Reflect, and Distill: Multi-Agent Feedback with Tree-Structured Preference Optimization for Efficient Language Model Enhancement
Authors: Xiaofeng Zhou, Heyan Huang, Lizi Liao,
Abstract summary: Large Language Models (LLMs) continue to set new standards in knowledge-intensive and complex reasoning tasks.<n>Current techniques, such as static knowledge distillation, resource-intensive reinforcement learning from human feedback, or limited self-reflection to yield substantial and lasting performance gains.<n>We present a novel Reflect and Debate (D&R) framework that orchestrates multi-turn debates between smaller models and stronger teacher models, eliciting actionable feedback.
Score: 43.532921045069365
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) continue to set new standards in knowledge-intensive and complex reasoning tasks, yet their high computational demands limit widespread adoption. While distilling large models into smaller ones offers a sustainable solution, current techniques--such as static knowledge distillation, resource-intensive reinforcement learning from human feedback, or limited self-reflection--struggle to yield substantial and lasting performance gains. In this paper, we present a novel Debate and Reflect (D&R) framework that orchestrates multi-turn debates between smaller models and stronger teacher models, eliciting actionable feedback (e.g., error analysis, corrective strategies) to guide student models. Further, we introduce Tree-structured Direct Preference Optimization (T-DPO) to efficiently leverage these debate logs, organizing interactions into a hierarchical format for effective training. Empirical evaluations across diverse NLP benchmarks demonstrate that our approach significantly improves smaller-model accuracy, robustness, and generalization, outperforming conventional baselines by a large margin.

Related papers

Model Steering: Learning with a Reference Model Improves Generalization Bounds and Scaling Laws [52.10468229008941]
This paper formalizes an emerging learning paradigm that uses a trained model as a reference to guide and enhance the training of a target model through strategic data selection or weighting.<n>We provide theoretical insights into why this approach improves generalization and data efficiency compared to training without a reference model.<n>Building on these insights, we introduce a novel method for Contrastive Language-Image Pretraining with a reference model, termed DRRho-CLIP.
arXiv Detail & Related papers (2025-05-10T16:55:03Z)
Detect, Explain, Escalate: Low-Carbon Dialogue Breakdown Management for LLM-Powered Agents [30.13634341221476]
Large Language Models (LLMs) are transforming numerous applications, but their susceptibility to conversational breakdowns remains a critical challenge undermining user trust.<n>This paper introduces a "Detect, Explain, Escalate" framework to manage dialogue breakdowns in LLM-powered agents, emphasizing low-carbon operation.
arXiv Detail & Related papers (2025-04-26T07:51:05Z)
Debate-Feedback: A Multi-Agent Framework for Efficient Legal Judgment Prediction [7.196065223124077]
We propose a novel legal judgment prediction model based on the Debate-Feedback architecture.<n>Unlike traditional methods, our model achieves significant improvements in efficiency by minimizing the need for large historical datasets.
arXiv Detail & Related papers (2025-04-07T09:34:14Z)
Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking [21.23826888841565]
We present a novel approach for training small language models for reasoning-intensive document ranking.<n>We use web data and a teacher LLM to automatically generate high-quality training examples with relevance explanations.<n>Our model ranks third on the leaderboard while using substantially fewer parameters than other approaches.
arXiv Detail & Related papers (2025-04-04T21:27:48Z)
A Survey of Direct Preference Optimization [103.59317151002693]
Large Language Models (LLMs) have demonstrated unprecedented generative capabilities.<n>Their alignment with human values remains critical for ensuring helpful and harmless deployments.<n>Direct Preference Optimization (DPO) has recently gained prominence as a streamlined alternative.
arXiv Detail & Related papers (2025-03-12T08:45:15Z)
Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees [3.4289478404209826]
Large Language Models excel in generative tasks but exhibit inefficiencies in structured text selection.<n>We propose a Learning-to-Defer framework that allocates queries to specialized experts, ensuring high-confidence predictions.
arXiv Detail & Related papers (2024-10-21T08:21:00Z)
Adversarial Multi-Agent Evaluation of Large Language Models through Iterative Debates [0.0]
We propose a framework that interprets large language models (LLMs) as advocates within an ensemble of interacting agents. This approach offers a more dynamic and comprehensive evaluation process compared to traditional human-based assessments or automated metrics.
arXiv Detail & Related papers (2024-10-07T00:22:07Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling. Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training. Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z)
Enhancing Dialogue Generation via Multi-Level Contrastive Learning [57.005432249952406]
We propose a multi-level contrastive learning paradigm to model the fine-grained quality of the responses with respect to the query. A Rank-aware (RC) network is designed to construct the multi-level contrastive optimization objectives. We build a Knowledge Inference (KI) component to capture the keyword knowledge from the reference during training and exploit such information to encourage the generation of informative words.
arXiv Detail & Related papers (2020-09-19T02:41:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.