Related papers: Online Joint Fine-tuning of Multi-Agent Flows

Online Joint Fine-tuning of Multi-Agent Flows

URL: http://arxiv.org/abs/2406.04516v3
Date: Tue, 16 Jul 2024 13:46:36 GMT
Title: Online Joint Fine-tuning of Multi-Agent Flows
Authors: Paul Mineiro,
Abstract summary: I describe a procedure for online joint fine-tuning of an entire flow inspired by the Learning to Search framework. The approach leverages simulator access to reduce preferences over entire episodes to preferences over individual node outputs. I apply to the multi-hop QA dataset Musique achieving a state-of-the-art result.
Score: 12.851745991007169
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A Flow is a collection of component models ("Agents") which constructs the solution to a complex problem via iterative communication. Flows have emerged as state of the art architectures for code generation, and are the raison d'etre for frameworks like Autogen. However, flows are currently constructed via a combination of manual prompt engineering and stagewise supervised learning techniques; the latter is limited to acyclic flows with granular node supervision. In this writeup I describe a procedure for online joint fine-tuning of an entire flow inspired by the Learning to Search framework. The approach leverages simulator access to reduce preferences over entire episodes to preferences over individual node outputs; when the components are language models the latter is a well-studied problem. The approach is applicable to reward-free settings (e.g., text feedback) if an episode evaluator model is available. I apply to the multi-hop QA dataset Musique achieving a state-of-the-art result.

Related papers

COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z)
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation [87.39861573270173]
We introduce the novel task of prompt-adaptive workflow generation, where the goal is to automatically tailor a workflow to each user prompt. We propose two LLM-based approaches to tackle this task: a tuning-based method that learns from user-preference data, and a training-free method that uses the LLM to select existing flows. Our work shows that prompt-dependent flow prediction offers a new pathway to improving text-to-image generation quality, complementing existing research directions in the field.
arXiv Detail & Related papers (2024-10-02T16:43:24Z)
Let-It-Flow: Simultaneous Optimization of 3D Flow and Object Clustering [2.763111962660262]
We study the problem of self-supervised 3D scene flow estimation from real large-scale raw point cloud sequences. We propose a novel clustering approach that allows for combination of overlapping soft clusters as well as non-overlapping rigid clusters. Our method especially excels in resolving flow in complicated dynamic scenes with multiple independently moving objects close to each other.
arXiv Detail & Related papers (2024-04-12T10:04:03Z)
D-Flow: Differentiating through Flows for Controlled Generation [37.80603174399585]
We introduce D-Flow, a framework for controlling the generation process by differentiating through the flow. We motivate this framework by our key observation stating that for Diffusion/FM models trained with Gaussian probability paths, differentiating through the generation process projects gradient on the data manifold. We validate our framework on linear and non-linear controlled generation problems including: image and audio inverse problems and conditional molecule generation reaching state of the art performance across all.
arXiv Detail & Related papers (2024-02-21T18:56:03Z)
Multi-Scene Generalized Trajectory Global Graph Solver with Composite Nodes for Multiple Object Tracking [61.69892497726235]
Composite Node Message Passing Network (CoNo-Link) is a framework for modeling ultra-long frames information for association. In addition to the previous method of treating objects as nodes, the network innovatively treats object trajectories as nodes for information interaction. Our model can learn better predictions on longer-time scales by adding composite nodes.
arXiv Detail & Related papers (2023-12-14T14:00:30Z)
Guided Flows for Generative Modeling and Decision Making [55.42634941614435]
We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text synthesis-to-speech. Notably, we are first to apply flow models for plan generation in the offline reinforcement learning setting ax speedup in compared to diffusion models.
arXiv Detail & Related papers (2023-11-22T15:07:59Z)
Flows: Building Blocks of Reasoning and Collaborating AI [24.57836563784203]
Flows are self-contained building blocks of computation, with an isolated state. We demonstrate the potential of Flows on competitive coding, a challenging task on which even GPT-4 struggles. To support rapid and rigorous research, we introduce the aiFlows library embodying Flows.
arXiv Detail & Related papers (2023-08-02T17:14:22Z)
DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models. We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn. Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z)
Decoupled Multi-task Learning with Cyclical Self-Regulation for Face Parsing [71.19528222206088]
We propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation for face parsing. Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection. Our method achieves the new state-of-the-art performance on the Helen, CelebA-HQ, and LapaMask datasets.
arXiv Detail & Related papers (2022-03-28T02:12:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.