Related papers: Improving the State of the Art for Training Human-AI Teams: Technical Report #3 -- Analysis of Testbed Alternatives

Improving the State of the Art for Training Human-AI Teams: Technical Report #3 -- Analysis of Testbed Alternatives

URL: http://arxiv.org/abs/2309.03213v1
Date: Tue, 29 Aug 2023 14:06:30 GMT
Title: Improving the State of the Art for Training Human-AI Teams: Technical Report #3 -- Analysis of Testbed Alternatives
Authors: Lillian Asiala, James E. McCarthy, Lixiao Huang
Abstract summary: Sonalysts is working on an initiative to expand its expertise in teaming to Human-Artificial Intelligence (AI) teams. To provide a foundation for that research, Sonalysts is investigating the development of a Synthetic Task Environment.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sonalysts is working on an initiative to expand our current expertise in teaming to Human-Artificial Intelligence (AI) teams by developing original research in this area. To provide a foundation for that research, Sonalysts is investigating the development of a Synthetic Task Environment (STE). In a previous report, we documented the findings of a recent outreach effort in which we asked military Subject Matter Experts (SMEs) and other researchers in the Human-AI teaming domain to identify the qualities that they most valued in a testbed. A surprising finding from that outreach was that several respondents recommended that our team look into existing human-AI teaming testbeds, rather than creating something new. Based on that recommendation, we conducted a systematic investigation of the associated landscape. In this report, we describe the results of that investigation. Building on the survey results, we developed testbed evaluation criteria, identified potential testbeds, and conducted qualitative and quantitative evaluations of candidate testbeds. The evaluation process led to five candidate testbeds for the research team to consider. In the coming months, we will assess the viability of the various alternatives and begin to execute our program of research.

Related papers

On Benchmarking Human-Like Intelligence in Machines [77.55118048492021]
We argue that current AI evaluation paradigms are insufficient for assessing human-like cognitive capabilities. We identify a set of key shortcomings: a lack of human-validated labels, inadequate representation of human response variability and uncertainty, and reliance on simplified and ecologically-invalid tasks.
arXiv Detail & Related papers (2025-02-27T20:21:36Z)
Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation [58.064940977804596]
A plethora of new AI models and tools has been proposed, promising to empower researchers and academics worldwide to conduct their research more effectively and efficiently. Ethical concerns regarding shortcomings of these tools and potential for misuse take a particularly prominent place in our discussion.
arXiv Detail & Related papers (2025-02-07T18:26:45Z)
A Decade of Action Quality Assessment: Largest Systematic Survey of Trends, Challenges, and Future Directions [8.27542607031299]
Action Quality Assessment (AQA) has far-reaching implications in areas such as low-cost physiotherapy, sports training, and workforce development. We systematically review over 200 research papers using the preferred reporting items for systematic reviews & meta-analyses (PRISMA) framework. This survey provides a detailed analysis of research trends, performance comparisons, challenges, & future directions.
arXiv Detail & Related papers (2025-02-05T01:33:24Z)
On Evaluating Explanation Utility for Human-AI Decision Making in NLP [39.58317527488534]
We review existing metrics suitable for application-grounded evaluation. We demonstrate the importance of reassessing the state of the art to form and study human-AI teams.
arXiv Detail & Related papers (2024-07-03T23:53:27Z)
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work. ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them. We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z)
SurveyAgent: A Conversational System for Personalized and Efficient Research Survey [50.04283471107001]
This paper introduces SurveyAgent, a novel conversational system designed to provide personalized and efficient research survey assistance to researchers. SurveyAgent integrates three key modules: Knowledge Management for organizing papers, Recommendation for discovering relevant literature, and Query Answering for engaging with content on a deeper level. Our evaluation demonstrates SurveyAgent's effectiveness in streamlining research activities, showcasing its capability to facilitate how researchers interact with scientific literature.
arXiv Detail & Related papers (2024-04-09T15:01:51Z)
Search-Based Fairness Testing: An Overview [4.453735522794044]
biases in AI systems raise ethical and societal concerns. This paper reviews current research on fairness testing, particularly its application through search-based testing.
arXiv Detail & Related papers (2023-11-10T16:47:56Z)
Improving the State of the Art for Training Human-AI Teams: Technical Report #2 -- Results of Researcher Knowledge Elicitation Survey [0.0]
Sonalysts has begun an internal initiative to explore the training of Human-AI teams. The first step in this effort is to develop a Synthetic Task Environment (STE) that is capable of facilitating research on Human-AI teams.
arXiv Detail & Related papers (2023-08-29T13:54:32Z)
Improving the State of the Art for Training Human-AI Teams: Technical Report #1 -- Results of Subject-Matter Expert Knowledge Elicitation Survey [0.0]
Sonalysts has begun an internal initiative to explore the training of human-AI teams. We decided to use Joint All-Domain Command and Control (JADC2) as a focus point. We engaged a number of Subject-Matter Experts (SMEs) with Command and Control experience to gain insight into developing a STE that embodied the teaming challenges associated with JADC2.
arXiv Detail & Related papers (2023-08-29T13:42:52Z)
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models. Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z)
Survey of Aspect-based Sentiment Analysis Datasets [55.61047894397937]
Aspect-based sentiment analysis (ABSA) is a natural language processing problem that requires analyzing user-generated reviews. Numerous yet scattered corpora for ABSA make it difficult for researchers to identify corpora best suited for a specific ABSA subtask quickly. This study aims to present a database of corpora that can be used to train and assess autonomous ABSA systems.
arXiv Detail & Related papers (2022-04-11T16:23:36Z)
An Uncommon Task: Participatory Design in Legal AI [64.54460979588075]
We examine a notable yet understudied AI design process in the legal domain that took place over a decade ago. We show how an interactive simulation methodology allowed computer scientists and lawyers to become co-designers.
arXiv Detail & Related papers (2022-03-08T15:46:52Z)
Scaling up Search Engine Audits: Practical Insights for Algorithm Auditing [68.8204255655161]
We set up experiments for eight search engines with hundreds of virtual agents placed in different regions. We demonstrate the successful performance of our research infrastructure across multiple data collections. We conclude that virtual agents are a promising venue for monitoring the performance of algorithms across long periods of time.
arXiv Detail & Related papers (2021-06-10T15:49:58Z)
Human-AI Symbiosis: A Survey of Current Approaches [18.252264744963394]
We highlight various aspects of works on the human-AI team such as the flow of complementing, task horizon, model representation, knowledge level, and teaming goal. We hope that the survey will provide a more clear connection between the works in the human-AI team and guidance to new researchers in this area.
arXiv Detail & Related papers (2021-03-18T02:39:28Z)
Robustness Gym: Unifying the NLP Evaluation Landscape [91.80175115162218]
Deep neural networks are often brittle when deployed in real-world systems. Recent research has focused on testing the robustness of such models. We propose a solution in the form of Robustness Gym, a simple and evaluation toolkit.
arXiv Detail & Related papers (2021-01-13T02:37:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.