Related papers: Hallmarks of Human-Machine Collaboration: A framework for assessment in the DARPA Communicating with Computers Program

Hallmarks of Human-Machine Collaboration: A framework for assessment in the DARPA Communicating with Computers Program

URL: http://arxiv.org/abs/2102.04958v1
Date: Tue, 9 Feb 2021 17:13:53 GMT
Title: Hallmarks of Human-Machine Collaboration: A framework for assessment in the DARPA Communicating with Computers Program
Authors: Robyn Kozierok, John Aberdeen, Cheryl Clark, Christopher Garay, Bradley Goodman, Tonia Korves, Lynette Hirschman, Patricia L. McDermott, Matthew W. Peterson
Abstract summary: We describe a framework for evaluating systems engaged in open-ended complex scenarios. We identify the Key Properties that must be exhibited by successful systems. Hallmarks are intended to serve as goals in guiding research direction.
Score: 0.851218146348961
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There is a growing desire to create computer systems that can communicate effectively to collaborate with humans on complex, open-ended activities. Assessing these systems presents significant challenges. We describe a framework for evaluating systems engaged in open-ended complex scenarios where evaluators do not have the luxury of comparing performance to a single right answer. This framework has been used to evaluate human-machine creative collaborations across story and music generation, interactive block building, and exploration of molecular mechanisms in cancer. These activities are fundamentally different from the more constrained tasks performed by most contemporary personal assistants as they are generally open-ended, with no single correct solution, and often no obvious completion criteria. We identified the Key Properties that must be exhibited by successful systems. From there we identified "Hallmarks" of success -- capabilities and features that evaluators can observe that would be indicative of progress toward achieving a Key Property. In addition to being a framework for assessment, the Key Properties and Hallmarks are intended to serve as goals in guiding research direction.

Related papers

Closing the Evaluation Gap: Developing a Behavior-Oriented Framework for Assessing Virtual Teamwork Competency [6.169364905804677]
This study develops a behavior-oriented framework for assessing virtual teamwork competencies among engineering students. Using focus group interviews combined with the Critical Incident Technique, the study identified three key dimensions. The resulting framework provides a foundation for more effective assessment practices.
arXiv Detail & Related papers (2025-04-20T08:12:27Z)
Autotelic Reinforcement Learning: Exploring Intrinsic Motivations for Skill Acquisition in Open-Ended Environments [1.104960878651584]
This paper presents a comprehensive overview of autotelic Reinforcement Learning (RL), emphasizing the role of intrinsic motivations in the open-ended formation of skill repertoires. We delineate the distinctions between knowledge-based and competence-based intrinsic motivations, illustrating how these concepts inform the development of autonomous agents capable of generating and pursuing self-defined goals.
arXiv Detail & Related papers (2025-02-06T14:37:46Z)
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration [51.452664740963066]
Collaborative Gym is a framework enabling asynchronous, tripartite interaction among agents, humans, and task environments. We instantiate Co-Gym with three representative tasks in both simulated and real-world conditions. Our findings reveal that collaborative agents consistently outperform their fully autonomous counterparts in task performance.
arXiv Detail & Related papers (2024-12-20T09:21:15Z)
GUI Agents: A Survey [129.94551809688377]
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. Motivated by the growing interest and fundamental importance of GUI agents, we provide a comprehensive survey that categorizes their benchmarks, evaluation metrics, architectures, and training methods.
arXiv Detail & Related papers (2024-12-18T04:48:28Z)
Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge [47.74313897705183]
CHAIC is an inclusive embodied social intelligence challenge designed to test social perception and cooperation in embodied agents. In CHAIC, the goal is for an embodied agent equipped with egocentric observations to assist a human who may be operating under physical constraints. We benchmark planning- and learning-based baselines on the challenge and introduce a new method that leverages large language models and behavior modeling.
arXiv Detail & Related papers (2024-11-04T04:41:12Z)
Fostering Microservice Maintainability Assurance through a Comprehensive Framework [0.0]
This project aims to offer maintainability assurance for microservice-based systems. It introduces an automated assessment framework tailored to microservice architecture. The framework addresses various levels, from artifacts to holistic views of system characteristics.
arXiv Detail & Related papers (2024-07-23T22:45:29Z)
Evaluating Human-AI Collaboration: A Review and Methodological Framework [4.41358655687435]
The use of artificial intelligence (AI) in working environments with individuals, known as Human-AI Collaboration (HAIC), has become essential. evaluating HAIC's effectiveness remains challenging due to the complex interaction of components involved. This paper provides a detailed analysis of existing HAIC evaluation approaches and develops a fresh paradigm for more effectively evaluating these systems.
arXiv Detail & Related papers (2024-07-09T12:52:22Z)
WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? [83.19032025950986]
We study the use of large language model-based agents for interacting with software via web browsers. WorkArena is a benchmark of 33 tasks based on the widely-used ServiceNow platform. BrowserGym is an environment for the design and evaluation of such agents.
arXiv Detail & Related papers (2024-03-12T14:58:45Z)
Understanding the Application of Utility Theory in Robotics and Artificial Intelligence: A Survey [5.168741399695988]
The utility is a unifying concept in economics, game theory, and operations research, even in the Robotics and AI field. This paper introduces a utility-orient needs paradigm to describe and evaluate inter and outer relationships among agents' interactions.
arXiv Detail & Related papers (2023-06-15T18:55:48Z)
Learning Action-Effect Dynamics for Hypothetical Vision-Language Reasoning Task [50.72283841720014]
We propose a novel learning strategy that can improve reasoning about the effects of actions. We demonstrate the effectiveness of our proposed approach and discuss its advantages over previous baselines in terms of performance, data efficiency, and generalization capability.
arXiv Detail & Related papers (2022-12-07T05:41:58Z)
Automatic Context-Driven Inference of Engagement in HMI: A Survey [6.479224589451863]
This paper presents a survey on engagement inference for human-machine interaction. It entails interdisciplinary definition, engagement components and factors, publicly available datasets, ground truth assessment, and most commonly used features and methods. It serves as a guide for the development of future human-machine interaction interfaces with reliable context-aware engagement inference capability.
arXiv Detail & Related papers (2022-09-30T10:46:13Z)
Autonomous Open-Ended Learning of Tasks with Non-Stationary Interdependencies [64.0476282000118]
Intrinsic motivations have proven to generate a task-agnostic signal to properly allocate the training time amongst goals. While the majority of works in the field of intrinsically motivated open-ended learning focus on scenarios where goals are independent from each other, only few of them studied the autonomous acquisition of interdependent tasks. In particular, we first deepen the analysis of a previous system, showing the importance of incorporating information about the relationships between tasks at a higher level of the architecture. Then we introduce H-GRAIL, a new system that extends the previous one by adding a new learning layer to store the autonomously acquired sequences
arXiv Detail & Related papers (2022-05-16T10:43:01Z)
Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness [92.26039686430204]
We show that even in carefully-designed systems, complementary performance can be elusive. First, we provide a theoretical framework for modeling simple human-algorithm systems. Next, we use this model to prove conditions where complementarity is impossible.
arXiv Detail & Related papers (2022-02-17T18:44:41Z)
Watch-And-Help: A Challenge for Social Perception and Human-AI Collaboration [116.28433607265573]
We introduce Watch-And-Help (WAH), a challenge for testing social intelligence in AI agents. In WAH, an AI agent needs to help a human-like agent perform a complex household task efficiently. We build VirtualHome-Social, a multi-agent household environment, and provide a benchmark including both planning and learning based baselines.
arXiv Detail & Related papers (2020-10-19T21:48:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.