SpeechAgents: Human-Communication Simulation with Multi-Modal
Multi-Agent Systems
- URL: http://arxiv.org/abs/2401.03945v1
- Date: Mon, 8 Jan 2024 15:01:08 GMT
- Title: SpeechAgents: Human-Communication Simulation with Multi-Modal
Multi-Agent Systems
- Authors: Dong Zhang, Zhaowei Li, Pengyu Wang, Xin Zhang, Yaqian Zhou, Xipeng
Qiu
- Abstract summary: Large Language Model (LLM)-based multi-agent systems have demonstrated promising performance in simulating human society.
We propose SpeechAgents, a multi-modal LLM based multi-agent system designed for simulating human communication.
- Score: 53.94772445896213
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Human communication is a complex and diverse process that not only involves
multiple factors such as language, commonsense, and cultural backgrounds but
also requires the participation of multimodal information, such as speech.
Large Language Model (LLM)-based multi-agent systems have demonstrated
promising performance in simulating human society. Can we leverage LLM-based
multi-agent systems to simulate human communication? However, current LLM-based
multi-agent systems mainly rely on text as the primary medium. In this paper,
we propose SpeechAgents, a multi-modal LLM based multi-agent system designed
for simulating human communication. SpeechAgents utilizes multi-modal LLM as
the control center for individual agent and employes multi-modal signals as the
medium for exchanged messages among agents. Additionally, we propose
Multi-Agent Tuning to enhance the multi-agent capabilities of LLM without
compromising general abilities. To strengthen and evaluate the effectiveness of
human communication simulation, we build the Human-Communication Simulation
Benchmark. Experimental results demonstrate that SpeechAgents can simulate
human communication dialogues with consistent content, authentic rhythm, and
rich emotions and demonstrate excellent scalability even with up to 25 agents,
which can apply to tasks such as drama creation and audio novels generation.
Code and models will be open-sourced at https://github.
com/0nutation/SpeechAgents
Related papers
- Very Large-Scale Multi-Agent Simulation in AgentScope [115.83581238212611]
We develop new features and components for AgentScope, a user-friendly multi-agent platform.
We propose an actor-based distributed mechanism towards great scalability and high efficiency.
We provide a web-based interface for conveniently monitoring and managing a large number of agents.
arXiv Detail & Related papers (2024-07-25T05:50:46Z) - Hello Again! LLM-powered Personalized Agent for Long-term Dialogue [63.65128176360345]
We introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent)
It incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation.
The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated.
arXiv Detail & Related papers (2024-06-09T21:58:32Z) - Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing [17.92378239787507]
We present a decoder-only Discrete Multimodal Language Model (DMLM)
DMLM can be flexibly applied to multiple tasks (ASR, T2S, S2TT, etc.) and modalities (text, speech, vision)
Our results show that DMLM benefits significantly, across multiple tasks and datasets, from a combination of supervised and unsupervised training.
arXiv Detail & Related papers (2024-06-04T20:08:25Z) - LLMArena: Assessing Capabilities of Large Language Models in Dynamic
Multi-Agent Environments [35.926581910260076]
We introduce LLMArena, a framework for evaluating the capabilities of large language models in multi-agent dynamic environments.
LLArena employs Trueskill scoring to assess crucial abilities in LLM agents, including spatial reasoning, strategic planning, numerical reasoning, risk assessment, communication, opponent modeling, and team collaboration.
We conduct an extensive experiment and human evaluation among different sizes and types of LLMs, showing that LLMs still have a significant journey ahead in their development towards becoming fully autonomous agents.
arXiv Detail & Related papers (2024-02-26T11:31:48Z) - TESS: A Multi-intent Parser for Conversational Multi-Agent Systems with
Decentralized Natural Language Understanding Models [6.470108226184637]
Multi-agent systems complicate the natural language understanding of user intents.
We propose an efficient parsing and orchestration pipeline algorithm to service multi-intent utterances from the user.
arXiv Detail & Related papers (2023-12-19T03:39:23Z) - Large Language Model Enhanced Multi-Agent Systems for 6G Communications [94.45712802626794]
We propose a multi-agent system with customized communication knowledge and tools for solving communication related tasks using natural language.
We validate the effectiveness of the proposed multi-agent system by designing a semantic communication system.
arXiv Detail & Related papers (2023-12-13T02:35:57Z) - AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model [33.072967177313025]
We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals.
AnyMAL inherits the powerful text-based reasoning abilities of the state-of-the-art LLMs including LLaMA-2 (70B)
We conduct comprehensive empirical analysis comprising both human and automatic evaluations, and demonstrate state-of-the-art performance on various multimodal tasks.
arXiv Detail & Related papers (2023-09-27T22:50:51Z) - Large AI Model Empowered Multimodal Semantic Communications [51.17527319441436]
We propose a Large AI Model-based Multimodal SC (LAM-MSC) framework.
We first present the SC-based Multimodal Alignment (MMA)
Then, a personalized LLM-based Knowledge Base (LKB) is proposed.
Finally, we apply the Conditional Generative adversarial networks-based channel Estimation (CGE) to obtain Channel State Information (CSI)
arXiv Detail & Related papers (2023-09-03T19:24:34Z) - Building Cooperative Embodied Agents Modularly with Large Language
Models [104.57849816689559]
We address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments.
We harness the commonsense knowledge, reasoning ability, language comprehension, and text generation prowess of LLMs and seamlessly incorporate them into a cognitive-inspired modular framework.
Our experiments on C-WAH and TDW-MAT demonstrate that CoELA driven by GPT-4 can surpass strong planning-based methods and exhibit emergent effective communication.
arXiv Detail & Related papers (2023-07-05T17:59:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.