Related papers: SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research

Related papers

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use [101.57043903478257]
The dream to create AI assistants as capable and versatile as the fictional J.A.R.V.I.S from Iron Man has long captivated imaginations.<n>With the evolution of (multi-modal) large language models ((M)LLMs), this dream is closer to reality.<n>This survey aims to consolidate the state of OS Agents research, providing insights to guide both academic inquiry and industrial development.
arXiv Detail & Related papers (2025-08-06T14:33:45Z)
Deep Research Agents: A Systematic Examination And Roadmap [79.04813794804377]
Deep Research (DR) agents are designed to tackle complex, multi-turn informational research tasks.<n>In this paper, we conduct a detailed analysis of the foundational technologies and architectural components that constitute DR agents.
arXiv Detail & Related papers (2025-06-22T16:52:48Z)
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey [59.52058740470727]
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications.<n>Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems.<n>This survey provides a structured tutorial on fundamental architectures, enabling technologies, and emerging applications.
arXiv Detail & Related papers (2025-05-03T13:55:38Z)
A Survey on (M)LLM-Based GUI Agents [62.57899977018417]
Graphical User Interface (GUI) Agents have emerged as a transformative paradigm in human-computer interaction. Recent advances in large language models and multimodal learning have revolutionized GUI automation across desktop, mobile, and web platforms. This survey identifies key technical challenges, including accurate element localization, effective knowledge retrieval, long-horizon planning, and safety-aware execution control.
arXiv Detail & Related papers (2025-03-27T17:58:31Z)
Large Language Model Agent: A Survey on Methodology, Applications and Challenges [88.3032929492409]
Large Language Model (LLM) agents, with goal-driven behaviors and dynamic adaptation capabilities, potentially represent a critical pathway toward artificial general intelligence. This survey systematically deconstructs LLM agent systems through a methodology-centered taxonomy. Our work provides a unified architectural perspective, examining how agents are constructed, how they collaborate, and how they evolve over time.
arXiv Detail & Related papers (2025-03-27T12:50:17Z)
Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems [11.522282769053817]
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in reasoning, planning, and decision-making. Researchers have begun incorporating LLMs into multi-agent systems to tackle tasks beyond the scope of single-agent setups. This survey serves as a catalyst for further innovation, fostering more robust, scalable, and intelligent multi-agent systems.
arXiv Detail & Related papers (2025-02-20T07:18:34Z)
GUI Agents: A Survey [129.94551809688377]
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. Motivated by the growing interest and fundamental importance of GUI agents, we provide a comprehensive survey that categorizes their benchmarks, evaluation metrics, architectures, and training methods.
arXiv Detail & Related papers (2024-12-18T04:48:28Z)
GUI Agents with Foundation Models: A Comprehensive Survey [52.991688542729385]
This survey consolidates recent research on (M)LLM-based GUI agents. We highlight key innovations in data, frameworks, and applications. We hope this paper will inspire further developments in the field of (M)LLM-based GUI agents.
arXiv Detail & Related papers (2024-11-07T17:28:10Z)
Constraining Participation: Affordances of Feedback Features in Interfaces to Large Language Models [49.74265453289855]
Large language models (LLMs) are now accessible to anyone with a computer, a web browser, and an internet connection via browser-based interfaces. This paper examines the affordances of interactive feedback features in ChatGPT's interface, analysing how they shape user input and participation in iteration.
arXiv Detail & Related papers (2024-08-27T13:50:37Z)
LEGENT: Open Platform for Embodied Agents [60.71847900126832]
We introduce LEGENT, an open, scalable platform for developing embodied agents using Large Language Models (LLMs) and Large Multimodal Models (LMMs) LEGENT offers a rich, interactive 3D environment with communicable and actionable agents, paired with a user-friendly interface. In experiments, an embryonic vision-language-action model trained on LEGENT-generated data surpasses GPT-4V in embodied tasks.
arXiv Detail & Related papers (2024-04-28T16:50:12Z)
LLM-Based Multi-Agent Systems for Software Engineering: Vision and the Road Ahead [14.834072370183106]
This paper envisions the evolution of Multi-Agent (LMA) systems in addressing complex and multi-faceted software engineering challenges. By examining the role of LMA systems in future software engineering practices, this vision paper highlights the potential applications and emerging challenges.
arXiv Detail & Related papers (2024-04-07T07:05:40Z)
LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models. Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer. We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z)
WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? [83.19032025950986]
We study the use of large language model-based agents for interacting with software via web browsers. WorkArena is a benchmark of 33 tasks based on the widely-used ServiceNow platform. BrowserGym is an environment for the design and evaluation of such agents.
arXiv Detail & Related papers (2024-03-12T14:58:45Z)
A Survey on Context-Aware Multi-Agent Systems: Techniques, Challenges and Future Directions [1.1458366773578277]
Research interest in autonomous agents is on the rise as an emerging topic. The challenge lies in enabling these agents to learn, reason, and navigate uncertainties in dynamic environments. Context awareness emerges as a pivotal element in fortifying multi-agent systems.
arXiv Detail & Related papers (2024-02-03T00:27:22Z)
CSM-H-R: A Context Modeling Framework in Supporting Reasoning Automation for Interoperable Intelligent Systems and Privacy Protection [0.07499722271664144]
We propose a novel framework for automation of High Level Context (HLC) reasoning across intelligent systems at scale. The design of the framework supports the sharing and inter context among intelligent systems and the components for handling CSMs and the management of hierarchy, relationship, and transition. The implementation of the framework experiments on the HLC reasoning into vector and matrix computing and presents the potential to reach next level of automation.
arXiv Detail & Related papers (2023-08-21T22:21:15Z)
Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems [0.0]
We propose the integration of large language models (LLMs) into multiagent systems. We anchor our methodology on the MAPE-K model, which is renowned for its robust support in monitoring, analyzing, planning, and executing system adaptations.
arXiv Detail & Related papers (2023-07-12T14:26:46Z)
Retrieval-Enhanced Machine Learning [110.5237983180089]
We describe a generic retrieval-enhanced machine learning framework, which includes a number of existing models as special cases. REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization. REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.
arXiv Detail & Related papers (2022-05-02T21:42:45Z)
INODE: Building an End-to-End Data Exploration System in Practice [Extended Vision] [30.411996388471817]
INODE is an end-to-end data exploration system. We demonstrate it in three significant use cases in the fields of Cancer Biomarker Reearch, Research and Innovation Policy Making, and Astrophysics.
arXiv Detail & Related papers (2021-04-09T05:04:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.