SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research
- URL: http://arxiv.org/abs/2405.13035v1
- Date: Thu, 16 May 2024 21:21:09 GMT
- Title: SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research
- Authors: Dan Bohus, Sean Andrist, Nick Saw, Ann Paradiso, Ishani Chakraborty, Mahdi Rad,
- Abstract summary: We introduce an open-source system called SIGMA as a platform for conducting research on task-assistive agents in mixed-reality scenarios.
We present the system's core capabilities, discuss its overall design and implementation, and outline directions for future research enabled by the system.
- Score: 5.27467559535251
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce an open-source system called SIGMA (short for "Situated Interactive Guidance, Monitoring, and Assistance") as a platform for conducting research on task-assistive agents in mixed-reality scenarios. The system leverages the sensing and rendering affordances of a head-mounted mixed-reality device in conjunction with large language and vision models to guide users step by step through procedural tasks. We present the system's core capabilities, discuss its overall design and implementation, and outline directions for future research enabled by the system. SIGMA is easily extensible and provides a useful basis for future research at the intersection of mixed reality and AI. By open-sourcing an end-to-end implementation, we aim to lower the barrier to entry, accelerate research in this space, and chart a path towards community-driven end-to-end evaluation of large language, vision, and multimodal models in the context of real-world interactive applications.
Related papers
- LEGENT: Open Platform for Embodied Agents [60.71847900126832]
We introduce LEGENT, an open, scalable platform for developing embodied agents using Large Language Models (LLMs) and Large Multimodal Models (LMMs)
LEGENT offers a rich, interactive 3D environment with communicable and actionable agents, paired with a user-friendly interface.
In experiments, an embryonic vision-language-action model trained on LEGENT-generated data surpasses GPT-4V in embodied tasks.
arXiv Detail & Related papers (2024-04-28T16:50:12Z) - LLM-Based Multi-Agent Systems for Software Engineering: Vision and the Road Ahead [14.834072370183106]
This paper envisions the evolution of Multi-Agent (LMA) systems in addressing complex and multi-faceted software engineering challenges.
By examining the role of LMA systems in future software engineering practices, this vision paper highlights the potential applications and emerging challenges.
arXiv Detail & Related papers (2024-04-07T07:05:40Z) - LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.
Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer.
We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z) - WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? [83.19032025950986]
We study the use of large language model-based agents for interacting with software via web browsers.
WorkArena is a benchmark of 33 tasks based on the widely-used ServiceNow platform.
BrowserGym is an environment for the design and evaluation of such agents.
arXiv Detail & Related papers (2024-03-12T14:58:45Z) - A Survey on Context-Aware Multi-Agent Systems: Techniques, Challenges
and Future Directions [1.1458366773578277]
Research interest in autonomous agents is on the rise as an emerging topic.
The challenge lies in enabling these agents to learn, reason, and navigate uncertainties in dynamic environments.
Context awareness emerges as a pivotal element in fortifying multi-agent systems.
arXiv Detail & Related papers (2024-02-03T00:27:22Z) - Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data.
We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z) - IMTLab: An Open-Source Platform for Building, Evaluating, and Diagnosing
Interactive Machine Translation Systems [94.39110258587887]
We present IMTLab, an open-source end-to-end interactive machine translation (IMT) system platform.
IMTLab treats the whole interactive translation process as a task-oriented dialogue with a human-in-the-loop setting.
arXiv Detail & Related papers (2023-10-17T11:29:04Z) - CSM-H-R: A Context Modeling Framework in Supporting Reasoning Automation for Interoperable Intelligent Systems and Privacy Protection [0.07499722271664144]
We propose a novel framework for automation of High Level Context (HLC) reasoning across intelligent systems at scale.
The design of the framework supports the sharing and inter context among intelligent systems and the components for handling CSMs and the management of hierarchy, relationship, and transition.
The implementation of the framework experiments on the HLC reasoning into vector and matrix computing and presents the potential to reach next level of automation.
arXiv Detail & Related papers (2023-08-21T22:21:15Z) - Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems [0.0]
We propose the integration of large language models (LLMs) into multiagent systems.
We anchor our methodology on the MAPE-K model, which is renowned for its robust support in monitoring, analyzing, planning, and executing system adaptations.
arXiv Detail & Related papers (2023-07-12T14:26:46Z) - Retrieval-Enhanced Machine Learning [110.5237983180089]
We describe a generic retrieval-enhanced machine learning framework, which includes a number of existing models as special cases.
REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization.
REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.
arXiv Detail & Related papers (2022-05-02T21:42:45Z) - INODE: Building an End-to-End Data Exploration System in Practice
[Extended Vision] [30.411996388471817]
INODE is an end-to-end data exploration system.
We demonstrate it in three significant use cases in the fields of Cancer Biomarker Reearch, Research and Innovation Policy Making, and Astrophysics.
arXiv Detail & Related papers (2021-04-09T05:04:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.