Related papers: Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework

Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework

URL: http://arxiv.org/abs/2511.21686v1
Date: Wed, 26 Nov 2025 18:59:28 GMT
Title: Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework
Authors: Dong Wang, Yang Li, Ansong Ni, Ching-Feng Yeh, Youssef Emad, Xinjie Lei, Liam Robbins, Karthik Padthe, Hu Xu, Xian Li, Asli Celikyilmaz, Ramya Raghavendra, Lifei Huang, Carole-Jean Wu, Shang-Wen Li,
Abstract summary: We present textbf Matrix, a decentralized framework for multi-agent synthesis.<n> Matrix represents both control and data flow as serialized messages pass through distributed queues.<n>We evaluate Matrix across diverse synthesis scenarios, such as multi-agent collaborative dialogue, web-based reasoning data extraction, and tool-use trajectory generation in customer service environments.
Score: 32.3041485160475
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Synthetic data has become increasingly important for training large language models, especially when real data is scarce, expensive, or privacy-sensitive. Many such generation tasks require coordinated multi-agent workflows, where specialized agents collaborate to produce data that is higher quality, more diverse, and structurally richer. However, existing frameworks for multi-agent synthesis often depend on a centralized orchestrator, creating scalability bottlenecks, or are hardcoded for specific domains, limiting flexibility. We present \textbf{Matrix}, a decentralized framework that represents both control and data flow as serialized messages passed through distributed queues. This peer-to-peer design eliminates the central orchestrator. Each task progresses independently through lightweight agents, while compute-intensive operations, such as LLM inference or containerized environments, are handled by distributed services. Built on Ray, Matrix scales to tens of thousands of concurrent agentic workflows and provides a modular, configurable design that enables easy adaptation to a wide range of data generation workflows. We evaluate Matrix across diverse synthesis scenarios, such as multi-agent collaborative dialogue, web-based reasoning data extraction, and tool-use trajectory generation in customer service environments. In all cases, Matrix achieves $2$--$15\times$ higher data generation throughput under identical hardware resources, without compromising output quality.

Related papers

Beyond Quantity: Trajectory Diversity Scaling for Code Agents [51.71414642763219]
Trajectory Diversity Scaling is a data synthesis framework for code agents that scales performance through diversity rather than raw volume.<n> TDScaling integrates four innovations: (1) a Business Cluster mechanism that captures real-service logical dependencies; (2) a blueprint-driven multi-agent paradigm that enforces trajectory coherence; and (3) an adaptive evolution mechanism that steers toward long-tail scenarios.
arXiv Detail & Related papers (2026-02-03T07:43:03Z)
ComAgent: Multi-LLM based Agentic AI Empowered Intelligent Wireless Networks [62.031889234230725]
6G networks rely on complex cross-layer optimization.<n> manually translating high-level intents into mathematical formulations remains a bottleneck.<n>We present ComAgent, a multi-LLM agentic AI framework.
arXiv Detail & Related papers (2026-01-27T13:43:59Z)
A Versatile Multimodal Agent for Multimedia Content Generation [66.86040734610073]
We propose a MultiMedia-Agent designed to automate complex content creation tasks.<n>Our agent system includes a data generation pipeline, a tool library for content creation, and a set of metrics for evaluating preference alignment.
arXiv Detail & Related papers (2026-01-06T18:49:47Z)
FABRIC: Framework for Agent-Based Realistic Intelligence Creation [3.940391073007047]
Large language models (LLMs) are increasingly deployed as agents, expected to decompose goals, invoke tools, and verify results in dynamic environments.<n>We present a unified framework for synthesizing agentic data using only LLMs, without any human-in-the-loop supervision.
arXiv Detail & Related papers (2025-10-20T18:20:22Z)
Multi-Agent Tool-Integrated Policy Optimization [67.12841355267678]
Large language models (LLMs) increasingly rely on multi-turn tool-integrated planning for knowledge-intensive and complex reasoning tasks.<n>Existing implementations typically rely on a single agent, but they suffer from limited context length and noisy tool responses.<n>No existing methods support effective reinforcement learning post-training of tool-integrated multi-agent frameworks.
arXiv Detail & Related papers (2025-10-06T10:44:04Z)
Efficient and Scalable Agentic AI with Heterogeneous Systems [1.8921715645847679]
AI agents are emerging as a dominant workload in a wide range of applications, promising to be the vehicle that delivers the promised benefits of AI to enterprises and consumers.<n>To scale AI agent usage, we need efficient and scalable deployment and agent-serving infrastructure.<n>We present a system design for dynamic orchestration of AI agent workloads on heterogeneous compute infrastructure.
arXiv Detail & Related papers (2025-07-25T19:02:42Z)
HAWK: A Hierarchical Workflow Framework for Multi-Agent Collaboration [3.2588674134593942]
Multi-agent systems face persistent challenges in cross-platform interoperability, dynamic task scheduling, and efficient resource sharing.<n>We propose Hierarchical Agent (Hawk), a modular framework comprising five layers-User, Operator, Agent, Resource-and supported by sixteen standardized interfaces.<n>Hawk delivers an end-to-end pipeline covering task parsing, workflow orchestration, intelligent scheduling, resource invocation, and data synchronization.
arXiv Detail & Related papers (2025-07-05T15:03:53Z)
Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach [1.297210402524609]
Split Learning partitions models at a designated cut-layer to offload compute-intensive operations to the server.<n>We present MPSL, a parallel SL approach for computational efficient fine-tuning of multimodal transformers in a distributed manner.<n>MPSL employs lightweight client-side tokenizers and a unified modality-agnostic encoder, allowing flexible adaptation to task-specific needs.
arXiv Detail & Related papers (2025-02-10T11:10:41Z)
A Collaborative Multi-Agent Approach to Retrieval-Augmented Generation Across Diverse Data [0.0]
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs)<n>Traditional RAG systems typically use a single-agent architecture to handle query generation, data retrieval, and response synthesis.<n>This paper proposes a multi-agent RAG system to address these limitations.
arXiv Detail & Related papers (2024-12-08T07:18:19Z)
Very Large-Scale Multi-Agent Simulation in AgentScope [112.98986800070581]
We develop new features and components for AgentScope, a user-friendly multi-agent platform. We propose an actor-based distributed mechanism towards great scalability and high efficiency. We also provide a web-based interface for conveniently monitoring and managing a large number of agents.
arXiv Detail & Related papers (2024-07-25T05:50:46Z)
TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations. We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z)
Multi-modal AsynDGAN: Learn From Distributed Medical Image Data without Sharing Private Information [55.866673486753115]
We propose an extendable and elastic learning framework to preserve privacy and security. The proposed framework is named distributed Asynchronized Discriminator Generative Adrial Networks (AsynDGAN)
arXiv Detail & Related papers (2020-12-15T20:41:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.