Aligning Compound AI Systems via System-level DPO
- URL: http://arxiv.org/abs/2502.17721v1
- Date: Mon, 24 Feb 2025 23:25:13 GMT
- Title: Aligning Compound AI Systems via System-level DPO
- Authors: Xiangwen Wang, Yibo Jacky Zhang, Zhoujie Ding, Katherine Tsai, Sanmi Koyejo,
- Abstract summary: We propose a system-level DPO (SysDPO) to jointly align compound systems by adapting the DPO to operate on these DAGs.<n>Our exploration provides insights into the alignment of compound AI systems and lays a foundation for future advancements.
- Score: 14.017369528123096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Compound AI systems, comprising multiple interacting components such as LLM agents and external tools, demonstrate state-of-the-art results across diverse tasks. It is hence crucial to align components within the system to produce consistent results that match human expectations. However, conventional alignment methods, such as Direct Preference Optimization (DPO), are not directly applicable to compound AI systems. These challenges include the non-differentiable interactions between components, making end-to-end gradient optimization infeasible. Additionally, system-level preferences cannot be directly translated into component-level preferences, further complicating alignment. We address the issues by formulating compound AI systems as Directed Acyclic Graphs (DAGs), capturing the connections between agents and the data generation processes. We propose a system-level DPO (SysDPO) to jointly align compound systems by adapting the DPO to operate on these DAGs. We study the joint alignment of an LLM and a diffusion model to demonstrate the effectiveness of our approach. Our exploration provides insights into the alignment of compound AI systems and lays a foundation for future advancements.
Related papers
- Performant LLM Agentic Framework for Conversational AI [1.6114012813668932]
We introduce the Performant Agentic Framework (PAF), a novel system that assists Large Language Models (LLMs) in selecting appropriate nodes and executing actions in order when traversing complex graphs.
PAF combines LLM-based reasoning with a mathematically grounded vector scoring mechanism, achieving both higher accuracy and reduced latency.
Experiments demonstrate that PAF significantly outperforms baseline methods, paving the way for scalable, real-time Conversational AI systems in complex business environments.
arXiv Detail & Related papers (2025-03-09T02:58:34Z) - How to Correctly do Semantic Backpropagation on Language-based Agentic Systems [23.4193991777817]
We formalize the concept of semantic backpropagation with semantic gradients.<n>This serves as a method for computing directional information about how changes to each component might improve the system's output.<n>Our results on both BIG-Bench Hard and GSM8K show that our approach outperforms existing state-of-the-art methods for solving GASO problems.
arXiv Detail & Related papers (2024-12-04T15:52:03Z) - LLM-based Optimization of Compound AI Systems: A Survey [64.39860384538338]
In a compound AI system, components such as an LLM call, a retriever, a code interpreter, or tools are interconnected.
Recent advancements enable end-to-end optimization of these parameters using an LLM.
This paper presents a survey of the principles and emerging trends in LLM-based optimization of compound AI systems.
arXiv Detail & Related papers (2024-10-21T18:06:25Z) - Adaptive Active Inference Agents for Heterogeneous and Lifelong Federated Learning [4.274943486546923]
We introduce a conceptual agent for heterogeneous pervasive systems that permits setting global systems constraints as high-level SLOs.
We conduct experiments on a physical testbed of devices with different resource types and vendor specifications.
The AIF agent can balance competing SLOs in resource heterogeneous environments to ensure up to 98% fulfillment rate.
arXiv Detail & Related papers (2024-10-09T10:43:29Z) - Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization [75.1240295759264]
We propose an effective framework for Bridging and Modeling Correlations in pairwise data, named BMC.<n>We increase the consistency and informativeness of the pairwise preference signals through targeted modifications.<n>We identify that DPO alone is insufficient to model these correlations and capture nuanced variations.
arXiv Detail & Related papers (2024-08-14T11:29:47Z) - Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.<n>The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.<n>We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z) - Interactive System-wise Anomaly Detection [66.3766756452743]
Anomaly detection plays a fundamental role in various applications.
It is challenging for existing methods to handle the scenarios where the instances are systems whose characteristics are not readily observed as data.
We develop an end-to-end approach which includes an encoder-decoder module that learns system embeddings.
arXiv Detail & Related papers (2023-04-21T02:20:24Z) - Learning to Decouple Complex Systems [11.674072457685007]
We propose a sequential learning approach for handling irregularly sampled and cluttered sequential observations.<n>We argue that the meta-system evolving within a simplex is governed by projected differential equations (ProjDEs)
arXiv Detail & Related papers (2023-02-03T07:24:58Z) - Quality-Based Conditional Processing in Multi-Biometrics: Application to
Sensor Interoperability [63.05238390013457]
We describe and evaluate the ATVS-UAM fusion approach submitted to the quality-based evaluation of the 2007 BioSecure Multimodal Evaluation Campaign.
Our approach is based on linear logistic regression, in which fused scores tend to be log-likelihood-ratios.
Results show that the proposed approach outperforms all the rule-based fusion schemes.
arXiv Detail & Related papers (2022-11-24T12:11:22Z) - DHA: End-to-End Joint Optimization of Data Augmentation Policy,
Hyper-parameter and Architecture [81.82173855071312]
We propose an end-to-end solution that integrates the AutoML components and returns a ready-to-use model at the end of the search.
Dha achieves state-of-the-art (SOTA) results on various datasets, especially 77.4% accuracy on ImageNet with cell based search space.
arXiv Detail & Related papers (2021-09-13T08:12:50Z) - Better Together -- An Ensemble Learner for Combining the Results of
Ready-made Entity Linking Systems [2.163881720692685]
We argue that performance may be optimised by exploiting results from distinct EL systems on the same corpus.
In this paper, we introduce a supervised approach which exploits the output of multiple ready-made EL systems by predicting the correct link on a per-mention basis.
arXiv Detail & Related papers (2021-01-14T14:42:57Z) - Towards an Interface Description Template for AI-enabled Systems [77.34726150561087]
Reuse is a common system architecture approach that seeks to instantiate a system architecture with existing components.
There is currently no framework that guides the selection of necessary information to assess their portability to operate in a system different than the one for which the component was originally purposed.
We present ongoing work on establishing an interface description template that captures the main information of an AI-enabled component.
arXiv Detail & Related papers (2020-07-13T20:30:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.