SpeechAgents: Human-Communication Simulation with Multi-Modal
Multi-Agent Systems
- URL: http://arxiv.org/abs/2401.03945v1
- Date: Mon, 8 Jan 2024 15:01:08 GMT
- Title: SpeechAgents: Human-Communication Simulation with Multi-Modal
Multi-Agent Systems
- Authors: Dong Zhang, Zhaowei Li, Pengyu Wang, Xin Zhang, Yaqian Zhou, Xipeng
Qiu
- Abstract summary: Large Language Model (LLM)-based multi-agent systems have demonstrated promising performance in simulating human society.
We propose SpeechAgents, a multi-modal LLM based multi-agent system designed for simulating human communication.
- Score: 53.94772445896213
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Human communication is a complex and diverse process that not only involves
multiple factors such as language, commonsense, and cultural backgrounds but
also requires the participation of multimodal information, such as speech.
Large Language Model (LLM)-based multi-agent systems have demonstrated
promising performance in simulating human society. Can we leverage LLM-based
multi-agent systems to simulate human communication? However, current LLM-based
multi-agent systems mainly rely on text as the primary medium. In this paper,
we propose SpeechAgents, a multi-modal LLM based multi-agent system designed
for simulating human communication. SpeechAgents utilizes multi-modal LLM as
the control center for individual agent and employes multi-modal signals as the
medium for exchanged messages among agents. Additionally, we propose
Multi-Agent Tuning to enhance the multi-agent capabilities of LLM without
compromising general abilities. To strengthen and evaluate the effectiveness of
human communication simulation, we build the Human-Communication Simulation
Benchmark. Experimental results demonstrate that SpeechAgents can simulate
human communication dialogues with consistent content, authentic rhythm, and
rich emotions and demonstrate excellent scalability even with up to 25 agents,
which can apply to tasks such as drama creation and audio novels generation.
Code and models will be open-sourced at https://github.
com/0nutation/SpeechAgents
Related papers
- Spontaneous Emergence of Agent Individuality through Social Interactions in LLM-Based Communities [0.0]
We study the emergence of agency from scratch by using Large Language Model (LLM)-based agents.
By analyzing this multi-agent simulation, we report valuable new insights into how social norms, cooperation, and personality traits can emerge spontaneously.
arXiv Detail & Related papers (2024-11-05T16:49:33Z) - Synergistic Simulations: Multi-Agent Problem Solving with Large Language Models [36.571597246832326]
Large Language Models (LLMs) have increasingly demonstrated the ability to facilitate the development of multi-agent systems.
This paper aims to integrate agents & world interaction into a single simulation where multiple agents can work together to solve a problem.
We implement two simulations: a physical studio apartment with two roommates, and another where agents collaborate to complete a programming task.
arXiv Detail & Related papers (2024-09-14T21:53:35Z) - Very Large-Scale Multi-Agent Simulation in AgentScope [112.98986800070581]
We develop new features and components for AgentScope, a user-friendly multi-agent platform.
We propose an actor-based distributed mechanism towards great scalability and high efficiency.
We also provide a web-based interface for conveniently monitoring and managing a large number of agents.
arXiv Detail & Related papers (2024-07-25T05:50:46Z) - Hello Again! LLM-powered Personalized Agent for Long-term Dialogue [63.65128176360345]
We introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent)
It incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation.
The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated.
arXiv Detail & Related papers (2024-06-09T21:58:32Z) - Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing [17.92378239787507]
We present a decoder-only Discrete Multimodal Language Model (DMLM)
DMLM can be flexibly applied to multiple tasks (ASR, T2S, S2TT, etc.) and modalities (text, speech, vision)
Our results show that DMLM benefits significantly, across multiple tasks and datasets, from a combination of supervised and unsupervised training.
arXiv Detail & Related papers (2024-06-04T20:08:25Z) - Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs [67.59291068131438]
Motion-Agent is a conversational framework designed for general human motion generation, editing, and understanding.
Motion-Agent employs an open-source pre-trained language model to develop a generative agent, MotionLLM, that bridges the gap between motion and text.
arXiv Detail & Related papers (2024-05-27T09:57:51Z) - TESS: A Multi-intent Parser for Conversational Multi-Agent Systems with
Decentralized Natural Language Understanding Models [6.470108226184637]
Multi-agent systems complicate the natural language understanding of user intents.
We propose an efficient parsing and orchestration pipeline algorithm to service multi-intent utterances from the user.
arXiv Detail & Related papers (2023-12-19T03:39:23Z) - Large Language Model Enhanced Multi-Agent Systems for 6G Communications [94.45712802626794]
We propose a multi-agent system with customized communication knowledge and tools for solving communication related tasks using natural language.
We validate the effectiveness of the proposed multi-agent system by designing a semantic communication system.
arXiv Detail & Related papers (2023-12-13T02:35:57Z) - Building Cooperative Embodied Agents Modularly with Large Language
Models [104.57849816689559]
We address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments.
We harness the commonsense knowledge, reasoning ability, language comprehension, and text generation prowess of LLMs and seamlessly incorporate them into a cognitive-inspired modular framework.
Our experiments on C-WAH and TDW-MAT demonstrate that CoELA driven by GPT-4 can surpass strong planning-based methods and exhibit emergent effective communication.
arXiv Detail & Related papers (2023-07-05T17:59:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.