Related papers: SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems

SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems

URL: http://arxiv.org/abs/2401.03945v1
Date: Mon, 8 Jan 2024 15:01:08 GMT
Title: SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems
Authors: Dong Zhang, Zhaowei Li, Pengyu Wang, Xin Zhang, Yaqian Zhou, Xipeng Qiu
Abstract summary: Large Language Model (LLM)-based multi-agent systems have demonstrated promising performance in simulating human society. We propose SpeechAgents, a multi-modal LLM based multi-agent system designed for simulating human communication.
Score: 53.94772445896213
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Human communication is a complex and diverse process that not only involves multiple factors such as language, commonsense, and cultural backgrounds but also requires the participation of multimodal information, such as speech. Large Language Model (LLM)-based multi-agent systems have demonstrated promising performance in simulating human society. Can we leverage LLM-based multi-agent systems to simulate human communication? However, current LLM-based multi-agent systems mainly rely on text as the primary medium. In this paper, we propose SpeechAgents, a multi-modal LLM based multi-agent system designed for simulating human communication. SpeechAgents utilizes multi-modal LLM as the control center for individual agent and employes multi-modal signals as the medium for exchanged messages among agents. Additionally, we propose Multi-Agent Tuning to enhance the multi-agent capabilities of LLM without compromising general abilities. To strengthen and evaluate the effectiveness of human communication simulation, we build the Human-Communication Simulation Benchmark. Experimental results demonstrate that SpeechAgents can simulate human communication dialogues with consistent content, authentic rhythm, and rich emotions and demonstrate excellent scalability even with up to 25 agents, which can apply to tasks such as drama creation and audio novels generation. Code and models will be open-sourced at https://github. com/0nutation/SpeechAgents

Related papers

MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind [17.2922544295112]
MultiMind is the first framework integrating multimodal information into social deduction agents. It processes facial expressions and vocal tones alongside verbal content, while employing a Theory of Mind (ToM) model. By combining this ToM model with Monte Carlo Tree Search (MCTS), our agent identifies communication strategies that minimize suspicion directed at itself.
arXiv Detail & Related papers (2025-04-25T03:12:43Z)
Towards Anthropomorphic Conversational AI Part I: A Practical Framework [49.62013440962072]
We introduce a multi- module framework designed to replicate the key aspects of human intelligence involved in conversations. In the second stage of our approach, these conversational data, after filtering and labeling, can serve as training and testing data for reinforcement learning.
arXiv Detail & Related papers (2025-02-28T03:18:39Z)
LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation [66.52371505566815]
Large language models (LLMs)-based AI agents have made significant progress, enabling them to achieve human-like intelligence. We present LMAgent, a very large-scale and multimodal agents society based on multimodal LLMs. In LMAgent, besides chatting with friends, the agents can autonomously browse, purchase, and review products, even perform live streaming e-commerce.
arXiv Detail & Related papers (2024-12-12T12:47:09Z)
Spontaneous Emergence of Agent Individuality through Social Interactions in LLM-Based Communities [0.0]
We study the emergence of agency from scratch by using Large Language Model (LLM)-based agents. By analyzing this multi-agent simulation, we report valuable new insights into how social norms, cooperation, and personality traits can emerge spontaneously.
arXiv Detail & Related papers (2024-11-05T16:49:33Z)
Synergistic Simulations: Multi-Agent Problem Solving with Large Language Models [36.571597246832326]
Large Language Models (LLMs) have increasingly demonstrated the ability to facilitate the development of multi-agent systems. This paper aims to integrate agents & world interaction into a single simulation where multiple agents can work together to solve a problem. We implement two simulations: a physical studio apartment with two roommates, and another where agents collaborate to complete a programming task.
arXiv Detail & Related papers (2024-09-14T21:53:35Z)
Very Large-Scale Multi-Agent Simulation in AgentScope [112.98986800070581]
We develop new features and components for AgentScope, a user-friendly multi-agent platform. We propose an actor-based distributed mechanism towards great scalability and high efficiency. We also provide a web-based interface for conveniently monitoring and managing a large number of agents.
arXiv Detail & Related papers (2024-07-25T05:50:46Z)
Hello Again! LLM-powered Personalized Agent for Long-term Dialogue [63.65128176360345]
We introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent) It incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation. The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated.
arXiv Detail & Related papers (2024-06-09T21:58:32Z)
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing [17.92378239787507]
We present a decoder-only Discrete Multimodal Language Model (DMLM) DMLM can be flexibly applied to multiple tasks (ASR, T2S, S2TT, etc.) and modalities (text, speech, vision) Our results show that DMLM benefits significantly, across multiple tasks and datasets, from a combination of supervised and unsupervised training.
arXiv Detail & Related papers (2024-06-04T20:08:25Z)
Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs [67.59291068131438]
Motion-Agent is a conversational framework designed for general human motion generation, editing, and understanding. Motion-Agent employs an open-source pre-trained language model to develop a generative agent, MotionLLM, that bridges the gap between motion and text.
arXiv Detail & Related papers (2024-05-27T09:57:51Z)
TESS: A Multi-intent Parser for Conversational Multi-Agent Systems with Decentralized Natural Language Understanding Models [6.470108226184637]
Multi-agent systems complicate the natural language understanding of user intents. We propose an efficient parsing and orchestration pipeline algorithm to service multi-intent utterances from the user.
arXiv Detail & Related papers (2023-12-19T03:39:23Z)
Large Language Model Enhanced Multi-Agent Systems for 6G Communications [94.45712802626794]
We propose a multi-agent system with customized communication knowledge and tools for solving communication related tasks using natural language. We validate the effectiveness of the proposed multi-agent system by designing a semantic communication system.
arXiv Detail & Related papers (2023-12-13T02:35:57Z)
Building Cooperative Embodied Agents Modularly with Large Language Models [104.57849816689559]
We address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments. We harness the commonsense knowledge, reasoning ability, language comprehension, and text generation prowess of LLMs and seamlessly incorporate them into a cognitive-inspired modular framework. Our experiments on C-WAH and TDW-MAT demonstrate that CoELA driven by GPT-4 can surpass strong planning-based methods and exhibit emergent effective communication.
arXiv Detail & Related papers (2023-07-05T17:59:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.