SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research
- URL: http://arxiv.org/abs/2405.13035v1
- Date: Thu, 16 May 2024 21:21:09 GMT
- Title: SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research
- Authors: Dan Bohus, Sean Andrist, Nick Saw, Ann Paradiso, Ishani Chakraborty, Mahdi Rad,
- Abstract summary: We introduce an open-source system called SIGMA as a platform for conducting research on task-assistive agents in mixed-reality scenarios.
We present the system's core capabilities, discuss its overall design and implementation, and outline directions for future research enabled by the system.
- Score: 5.27467559535251
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce an open-source system called SIGMA (short for "Situated Interactive Guidance, Monitoring, and Assistance") as a platform for conducting research on task-assistive agents in mixed-reality scenarios. The system leverages the sensing and rendering affordances of a head-mounted mixed-reality device in conjunction with large language and vision models to guide users step by step through procedural tasks. We present the system's core capabilities, discuss its overall design and implementation, and outline directions for future research enabled by the system. SIGMA is easily extensible and provides a useful basis for future research at the intersection of mixed reality and AI. By open-sourcing an end-to-end implementation, we aim to lower the barrier to entry, accelerate research in this space, and chart a path towards community-driven end-to-end evaluation of large language, vision, and multimodal models in the context of real-world interactive applications.
Related papers
- GUI Agents with Foundation Models: A Comprehensive Survey [52.991688542729385]
This survey consolidates recent research on (M)LLM-based GUI agents.
We highlight key innovations in data, frameworks, and applications.
We hope this paper will inspire further developments in the field of (M)LLM-based GUI agents.
arXiv Detail & Related papers (2024-11-07T17:28:10Z) - Constraining Participation: Affordances of Feedback Features in Interfaces to Large Language Models [49.74265453289855]
Large language models (LLMs) are now accessible to anyone with a computer, a web browser, and an internet connection via browser-based interfaces.
This paper examines the affordances of interactive feedback features in ChatGPT's interface, analysing how they shape user input and participation in iteration.
arXiv Detail & Related papers (2024-08-27T13:50:37Z) - LEGENT: Open Platform for Embodied Agents [60.71847900126832]
We introduce LEGENT, an open, scalable platform for developing embodied agents using Large Language Models (LLMs) and Large Multimodal Models (LMMs)
LEGENT offers a rich, interactive 3D environment with communicable and actionable agents, paired with a user-friendly interface.
In experiments, an embryonic vision-language-action model trained on LEGENT-generated data surpasses GPT-4V in embodied tasks.
arXiv Detail & Related papers (2024-04-28T16:50:12Z) - LLM-Based Multi-Agent Systems for Software Engineering: Vision and the Road Ahead [14.834072370183106]
This paper envisions the evolution of Multi-Agent (LMA) systems in addressing complex and multi-faceted software engineering challenges.
By examining the role of LMA systems in future software engineering practices, this vision paper highlights the potential applications and emerging challenges.
arXiv Detail & Related papers (2024-04-07T07:05:40Z) - LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.
Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer.
We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z) - WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? [83.19032025950986]
We study the use of large language model-based agents for interacting with software via web browsers.
WorkArena is a benchmark of 33 tasks based on the widely-used ServiceNow platform.
BrowserGym is an environment for the design and evaluation of such agents.
arXiv Detail & Related papers (2024-03-12T14:58:45Z) - A Survey on Context-Aware Multi-Agent Systems: Techniques, Challenges
and Future Directions [1.1458366773578277]
Research interest in autonomous agents is on the rise as an emerging topic.
The challenge lies in enabling these agents to learn, reason, and navigate uncertainties in dynamic environments.
Context awareness emerges as a pivotal element in fortifying multi-agent systems.
arXiv Detail & Related papers (2024-02-03T00:27:22Z) - CSM-H-R: A Context Modeling Framework in Supporting Reasoning Automation for Interoperable Intelligent Systems and Privacy Protection [0.07499722271664144]
We propose a novel framework for automation of High Level Context (HLC) reasoning across intelligent systems at scale.
The design of the framework supports the sharing and inter context among intelligent systems and the components for handling CSMs and the management of hierarchy, relationship, and transition.
The implementation of the framework experiments on the HLC reasoning into vector and matrix computing and presents the potential to reach next level of automation.
arXiv Detail & Related papers (2023-08-21T22:21:15Z) - Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems [0.0]
We propose the integration of large language models (LLMs) into multiagent systems.
We anchor our methodology on the MAPE-K model, which is renowned for its robust support in monitoring, analyzing, planning, and executing system adaptations.
arXiv Detail & Related papers (2023-07-12T14:26:46Z) - Retrieval-Enhanced Machine Learning [110.5237983180089]
We describe a generic retrieval-enhanced machine learning framework, which includes a number of existing models as special cases.
REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization.
REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.
arXiv Detail & Related papers (2022-05-02T21:42:45Z) - INODE: Building an End-to-End Data Exploration System in Practice
[Extended Vision] [30.411996388471817]
INODE is an end-to-end data exploration system.
We demonstrate it in three significant use cases in the fields of Cancer Biomarker Reearch, Research and Innovation Policy Making, and Astrophysics.
arXiv Detail & Related papers (2021-04-09T05:04:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.