SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research
- URL: http://arxiv.org/abs/2405.13035v1
- Date: Thu, 16 May 2024 21:21:09 GMT
- Title: SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research
- Authors: Dan Bohus, Sean Andrist, Nick Saw, Ann Paradiso, Ishani Chakraborty, Mahdi Rad,
- Abstract summary: We introduce an open-source system called SIGMA as a platform for conducting research on task-assistive agents in mixed-reality scenarios.
We present the system's core capabilities, discuss its overall design and implementation, and outline directions for future research enabled by the system.
- Score: 5.27467559535251
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce an open-source system called SIGMA (short for "Situated Interactive Guidance, Monitoring, and Assistance") as a platform for conducting research on task-assistive agents in mixed-reality scenarios. The system leverages the sensing and rendering affordances of a head-mounted mixed-reality device in conjunction with large language and vision models to guide users step by step through procedural tasks. We present the system's core capabilities, discuss its overall design and implementation, and outline directions for future research enabled by the system. SIGMA is easily extensible and provides a useful basis for future research at the intersection of mixed reality and AI. By open-sourcing an end-to-end implementation, we aim to lower the barrier to entry, accelerate research in this space, and chart a path towards community-driven end-to-end evaluation of large language, vision, and multimodal models in the context of real-world interactive applications.
Related papers
- Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems [11.522282769053817]
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in reasoning, planning, and decision-making.
Researchers have begun incorporating LLMs into multi-agent systems to tackle tasks beyond the scope of single-agent setups.
This survey serves as a catalyst for further innovation, fostering more robust, scalable, and intelligent multi-agent systems.
arXiv Detail & Related papers (2025-02-20T07:18:34Z) - GUI Agents: A Survey [129.94551809688377]
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction.
Motivated by the growing interest and fundamental importance of GUI agents, we provide a comprehensive survey that categorizes their benchmarks, evaluation metrics, architectures, and training methods.
arXiv Detail & Related papers (2024-12-18T04:48:28Z) - GUI Agents with Foundation Models: A Comprehensive Survey [91.97447457550703]
This survey consolidates recent research on (M)LLM-based GUI agents.
We identify key challenges and propose future research directions.
We hope this survey will inspire further advancements in the field of (M)LLM-based GUI agents.
arXiv Detail & Related papers (2024-11-07T17:28:10Z) - Constraining Participation: Affordances of Feedback Features in Interfaces to Large Language Models [49.74265453289855]
Large language models (LLMs) are now accessible to anyone with a computer, a web browser, and an internet connection via browser-based interfaces.
This paper examines the affordances of interactive feedback features in ChatGPT's interface, analysing how they shape user input and participation in iteration.
arXiv Detail & Related papers (2024-08-27T13:50:37Z) - LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead [14.834072370183106]
This paper explores the transformative potential of integrating Large Language Models into Multi-Agent (LMA) systems.
By leveraging the collaborative and specialized abilities of multiple agents, LMA systems enable autonomous problem-solving, improve robustness, and provide scalable solutions for managing the complexity of real-world software projects.
arXiv Detail & Related papers (2024-04-07T07:05:40Z) - LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.
Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer.
We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z) - WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? [83.19032025950986]
We study the use of large language model-based agents for interacting with software via web browsers.
WorkArena is a benchmark of 33 tasks based on the widely-used ServiceNow platform.
BrowserGym is an environment for the design and evaluation of such agents.
arXiv Detail & Related papers (2024-03-12T14:58:45Z) - A Survey on Context-Aware Multi-Agent Systems: Techniques, Challenges and Future Directions [1.0488897291370285]
Research interest in autonomous agents is on the rise as an emerging topic.
The challenge lies in enabling these agents to learn, reason, and navigate uncertainties in dynamic environments.
Context awareness emerges as a pivotal element in fortifying multi-agent systems.
arXiv Detail & Related papers (2024-02-03T00:27:22Z) - CSM-H-R: A Context Modeling Framework in Supporting Reasoning Automation for Interoperable Intelligent Systems and Privacy Protection [0.07499722271664144]
We propose a novel framework for automation of High Level Context (HLC) reasoning across intelligent systems at scale.
The design of the framework supports the sharing and inter context among intelligent systems and the components for handling CSMs and the management of hierarchy, relationship, and transition.
The implementation of the framework experiments on the HLC reasoning into vector and matrix computing and presents the potential to reach next level of automation.
arXiv Detail & Related papers (2023-08-21T22:21:15Z) - Retrieval-Enhanced Machine Learning [110.5237983180089]
We describe a generic retrieval-enhanced machine learning framework, which includes a number of existing models as special cases.
REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization.
REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.
arXiv Detail & Related papers (2022-05-02T21:42:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.