Estuary: A Framework For Building Multimodal Low-Latency Real-Time Socially Interactive Agents
- URL: http://arxiv.org/abs/2410.20116v1
- Date: Sat, 26 Oct 2024 08:08:12 GMT
- Title: Estuary: A Framework For Building Multimodal Low-Latency Real-Time Socially Interactive Agents
- Authors: Spencer Lin, Basem Rizk, Miru Jun, Andy Artze, Caitlin Sullivan, Sharon Mozgai, Scott Fisher,
- Abstract summary: Estuary is a framework for the development of low-latency, real-time Socially Interactive Agents.
It can be run entirely off-cloud to maximize configurability, controllability, of studies, and speed of agent response times.
- Score: 0.4711628883579317
- License:
- Abstract: The rise in capability and ubiquity of generative artificial intelligence (AI) technologies has enabled its application to the field of Socially Interactive Agents (SIAs). Despite rising interest in modern AI-powered components used for real-time SIA research, substantial friction remains due to the absence of a standardized and universal SIA framework. To target this absence, we developed Estuary: a multimodal (text, audio, and soon video) framework which facilitates the development of low-latency, real-time SIAs. Estuary seeks to reduce repeat work between studies and to provide a flexible platform that can be run entirely off-cloud to maximize configurability, controllability, reproducibility of studies, and speed of agent response times. We are able to do this by constructing a robust multimodal framework which incorporates current and future components seamlessly into a modular and interoperable architecture.
Related papers
- Transforming the Hybrid Cloud for Emerging AI Workloads [81.15269563290326]
This white paper envisions transforming hybrid cloud systems to meet the growing complexity of AI workloads.
The proposed framework addresses critical challenges in energy efficiency, performance, and cost-effectiveness.
This joint initiative aims to establish hybrid clouds as secure, efficient, and sustainable platforms.
arXiv Detail & Related papers (2024-11-20T11:57:43Z) - Asynchronous Tool Usage for Real-Time Agents [61.3041983544042]
We introduce asynchronous AI agents capable of parallel processing and real-time tool-use.
Our key contribution is an event-driven finite-state machine architecture for agent execution and prompting.
This work presents both a conceptual framework and practical tools for creating AI agents capable of fluid, multitasking interactions.
arXiv Detail & Related papers (2024-10-28T23:57:19Z) - OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization [66.22117723598872]
We introduce an open-source framework designed to facilitate the development of multimodal web agent.
We first train the base model with imitation learning to gain the basic abilities.
We then let the agent explore the open web and collect feedback on its trajectories.
arXiv Detail & Related papers (2024-10-25T15:01:27Z) - Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence [79.5316642687565]
Existing multi-agent frameworks often struggle with integrating diverse capable third-party agents.
We propose the Internet of Agents (IoA), a novel framework that addresses these limitations.
IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control.
arXiv Detail & Related papers (2024-07-09T17:33:24Z) - LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.
Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer.
We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z) - AgentScope: A Flexible yet Robust Multi-Agent Platform [66.64116117163755]
AgentScope is a developer-centric multi-agent platform with message exchange as its core communication mechanism.
The abundant syntactic tools, built-in agents and service functions, user-friendly interfaces for application demonstration and utility monitor, zero-code programming workstation, and automatic prompt tuning mechanism significantly lower the barriers to both development and deployment.
arXiv Detail & Related papers (2024-02-21T04:11:28Z) - Conceptual Framework for Autonomous Cognitive Entities [0.9285295512807729]
This paper introduces the Autonomous Cognitive Entity model, a novel framework for a cognitive architecture.
The model is designed to harness the capabilities of the latest generative AI technologies, including large language models (LLMs) and multimodal generative models (MMMs)
The ACE framework also incorporates mechanisms for handling failures and adapting actions, thereby enhancing the robustness and flexibility of autonomous agents.
arXiv Detail & Related papers (2023-10-03T15:53:55Z) - Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM
Agents [0.0]
We present a novel framework for enhancing the capabilities of large language models (LLMs) by leveraging the power of multi-agent systems.
Our framework introduces a collaborative environment where multiple intelligent agent components, each with distinctive attributes and roles, work together to handle complex tasks more efficiently and effectively.
arXiv Detail & Related papers (2023-06-05T23:55:37Z) - Accelerating the Development of Multimodal, Integrative-AI Systems with
Platform for Situated Intelligence [1.595445991573573]
We describe Platform for Situated Intelligence, an open-source framework for multimodal, integrative-AI systems.
We provide a brief, high-level overview of the framework and its main affordances, and discuss its implications for HRI.
arXiv Detail & Related papers (2020-10-12T23:53:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.