Related papers: ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

URL: http://arxiv.org/abs/2502.04306v1
Date: Thu, 06 Feb 2025 18:47:49 GMT
Title: ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization
Authors: Yinjie Wang, Ling Yang, Guohao Li, Mengdi Wang, Bryon Aragam,
Abstract summary: We develop ScoreFlow, a high-performance framework for agent workflow optimization.<n>ScoreFlow incorporates Score-DPO, a novel variant of the direct preference optimization method that accounts for quantitative feedback.<n>It achieves an 8.2% improvement over existing baselines across question answering, coding, and mathematical reasoning.
Score: 51.280919773837645
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent research has leveraged large language model multi-agent systems for complex problem-solving while trying to reduce the manual effort required to build them, driving the development of automated agent workflow optimization methods. However, existing methods remain inflexible due to representational limitations, a lack of adaptability, and poor scalability when relying on discrete optimization techniques. We address these challenges with ScoreFlow, a simple yet high-performance framework that leverages efficient gradient-based optimization in a continuous space. ScoreFlow incorporates Score-DPO, a novel variant of the direct preference optimization method that accounts for quantitative feedback. Across six benchmarks spanning question answering, coding, and mathematical reasoning, ScoreFlow achieves an 8.2% improvement over existing baselines. Moreover, it empowers smaller models to outperform larger ones with lower inference costs. Project: https://github.com/Gen-Verse/ScoreFlow

Related papers

Dynamic Noise Preference Optimization for LLM Self-Improvement via Synthetic Data [51.62162460809116]
We introduce Dynamic Noise Preference Optimization (DNPO) to ensure consistent improvements across iterations. In experiments with Zephyr-7B, DNPO consistently outperforms existing methods, showing an average performance boost of 2.6%. DNPO shows a significant improvement in model-generated data quality, with a 29.4% win-loss rate gap compared to the baseline in GPT-4 evaluations.
arXiv Detail & Related papers (2025-02-08T01:20:09Z)
Direct Preference Optimization Using Sparse Feature-Level Constraints [47.15096507230884]
Feature-level constrained Preference Optimization is a novel method designed to simplify the alignment process while ensuring stability. Our approach enjoys efficiency by using sparse features activated in a well-trained sparse autoencoder and the quality of sequential KL divergence.
arXiv Detail & Related papers (2024-11-12T07:54:13Z)
AFlow: Automating Agentic Workflow Generation [36.61172223528231]
Large language models (LLMs) have demonstrated remarkable potential in solving complex tasks across diverse domains. We introduce AFlow, an automated framework that efficiently explores this space using Monte Carlo Tree Search. Empirical evaluations across six benchmark datasets demonstrate AFlow's efficacy, yielding a 5.7% average improvement over state-of-the-art baselines.
arXiv Detail & Related papers (2024-10-14T17:40:40Z)
Federated Learning of Large Language Models with Parameter-Efficient Prompt Tuning and Adaptive Optimization [71.87335804334616]
Federated learning (FL) is a promising paradigm to enable collaborative model training with decentralized data. The training process of Large Language Models (LLMs) generally incurs the update of significant parameters. This paper proposes an efficient partial prompt tuning approach to improve performance and efficiency simultaneously.
arXiv Detail & Related papers (2023-10-23T16:37:59Z)
FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization Bugs [92.47146416628965]
FuzzyFlow is a fault localization and test case extraction framework designed to test program optimizations. We leverage dataflow program representations to capture a fully reproducible system state and area-of-effect for optimizations. To reduce testing time, we design an algorithm for minimizing test inputs, trading off memory for recomputation.
arXiv Detail & Related papers (2023-06-28T13:00:17Z)
Deep Equilibrium Optical Flow Estimation [80.80992684796566]
Recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms. These RNNs impose large computation and memory overheads, and are not directly trained to model such stable estimation. We propose deep equilibrium (DEQ) flow estimators, an approach that directly solves for the flow as the infinite-level fixed point of an implicit layer.
arXiv Detail & Related papers (2022-04-18T17:53:44Z)
Global Matching with Overlapping Attention for Optical Flow Estimation [10.320192824517358]
GMFlowNet is a learning-based matching-optimization framework for optical flow estimation. It achieves state-of-the-art performance on standard benchmarks. Thanks to the matching and overlapping attention, GMFlowNet obtains major improvements on the predictions for textureless regions and large motions.
arXiv Detail & Related papers (2022-03-21T20:52:19Z)
Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free Optimization Algorithms [0.6543507682026964]
Deep learning (DL) applications are built using DL libraries and frameworks such as Genetic and PyTorch. These frameworks have complex parameters and tuning them to obtain good training and inference performance is challenging for typical users. In this paper, we treat the problem of tuning parameters of DL frameworks to improve training and inference performance as a black-box problem.
arXiv Detail & Related papers (2021-09-13T19:10:23Z)
Self Normalizing Flows [65.73510214694987]
We propose a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layer's exact update from $mathcalO(D3)$ to $mathcalO(D2)$. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts.
arXiv Detail & Related papers (2020-11-14T09:51:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.