Improving the State of the Art for Training Human-AI Teams: Technical
Report #3 -- Analysis of Testbed Alternatives
- URL: http://arxiv.org/abs/2309.03213v1
- Date: Tue, 29 Aug 2023 14:06:30 GMT
- Title: Improving the State of the Art for Training Human-AI Teams: Technical
Report #3 -- Analysis of Testbed Alternatives
- Authors: Lillian Asiala, James E. McCarthy, Lixiao Huang
- Abstract summary: Sonalysts is working on an initiative to expand its expertise in teaming to Human-Artificial Intelligence (AI) teams.
To provide a foundation for that research, Sonalysts is investigating the development of a Synthetic Task Environment.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sonalysts is working on an initiative to expand our current expertise in
teaming to Human-Artificial Intelligence (AI) teams by developing original
research in this area. To provide a foundation for that research, Sonalysts is
investigating the development of a Synthetic Task Environment (STE). In a
previous report, we documented the findings of a recent outreach effort in
which we asked military Subject Matter Experts (SMEs) and other researchers in
the Human-AI teaming domain to identify the qualities that they most valued in
a testbed. A surprising finding from that outreach was that several respondents
recommended that our team look into existing human-AI teaming testbeds, rather
than creating something new. Based on that recommendation, we conducted a
systematic investigation of the associated landscape. In this report, we
describe the results of that investigation. Building on the survey results, we
developed testbed evaluation criteria, identified potential testbeds, and
conducted qualitative and quantitative evaluations of candidate testbeds. The
evaluation process led to five candidate testbeds for the research team to
consider. In the coming months, we will assess the viability of the various
alternatives and begin to execute our program of research.
Related papers
- On Evaluating Explanation Utility for Human-AI Decision Making in NLP [39.58317527488534]
We review existing metrics suitable for application-grounded evaluation.
We demonstrate the importance of reassessing the state of the art to form and study human-AI teams.
arXiv Detail & Related papers (2024-07-03T23:53:27Z) - SurveyAgent: A Conversational System for Personalized and Efficient Research Survey [50.04283471107001]
This paper introduces SurveyAgent, a novel conversational system designed to provide personalized and efficient research survey assistance to researchers.
SurveyAgent integrates three key modules: Knowledge Management for organizing papers, Recommendation for discovering relevant literature, and Query Answering for engaging with content on a deeper level.
Our evaluation demonstrates SurveyAgent's effectiveness in streamlining research activities, showcasing its capability to facilitate how researchers interact with scientific literature.
arXiv Detail & Related papers (2024-04-09T15:01:51Z) - Search-Based Fairness Testing: An Overview [4.453735522794044]
biases in AI systems raise ethical and societal concerns.
This paper reviews current research on fairness testing, particularly its application through search-based testing.
arXiv Detail & Related papers (2023-11-10T16:47:56Z) - Improving the State of the Art for Training Human-AI Teams: Technical
Report #2 -- Results of Researcher Knowledge Elicitation Survey [0.0]
Sonalysts has begun an internal initiative to explore the training of Human-AI teams.
The first step in this effort is to develop a Synthetic Task Environment (STE) that is capable of facilitating research on Human-AI teams.
arXiv Detail & Related papers (2023-08-29T13:54:32Z) - Improving the State of the Art for Training Human-AI Teams: Technical
Report #1 -- Results of Subject-Matter Expert Knowledge Elicitation Survey [0.0]
Sonalysts has begun an internal initiative to explore the training of human-AI teams.
We decided to use Joint All-Domain Command and Control (JADC2) as a focus point.
We engaged a number of Subject-Matter Experts (SMEs) with Command and Control experience to gain insight into developing a STE that embodied the teaming challenges associated with JADC2.
arXiv Detail & Related papers (2023-08-29T13:42:52Z) - ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models.
Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z) - Survey of Aspect-based Sentiment Analysis Datasets [55.61047894397937]
Aspect-based sentiment analysis (ABSA) is a natural language processing problem that requires analyzing user-generated reviews.
Numerous yet scattered corpora for ABSA make it difficult for researchers to identify corpora best suited for a specific ABSA subtask quickly.
This study aims to present a database of corpora that can be used to train and assess autonomous ABSA systems.
arXiv Detail & Related papers (2022-04-11T16:23:36Z) - An Uncommon Task: Participatory Design in Legal AI [64.54460979588075]
We examine a notable yet understudied AI design process in the legal domain that took place over a decade ago.
We show how an interactive simulation methodology allowed computer scientists and lawyers to become co-designers.
arXiv Detail & Related papers (2022-03-08T15:46:52Z) - Scaling up Search Engine Audits: Practical Insights for Algorithm
Auditing [68.8204255655161]
We set up experiments for eight search engines with hundreds of virtual agents placed in different regions.
We demonstrate the successful performance of our research infrastructure across multiple data collections.
We conclude that virtual agents are a promising venue for monitoring the performance of algorithms across long periods of time.
arXiv Detail & Related papers (2021-06-10T15:49:58Z) - Human-AI Symbiosis: A Survey of Current Approaches [18.252264744963394]
We highlight various aspects of works on the human-AI team such as the flow of complementing, task horizon, model representation, knowledge level, and teaming goal.
We hope that the survey will provide a more clear connection between the works in the human-AI team and guidance to new researchers in this area.
arXiv Detail & Related papers (2021-03-18T02:39:28Z) - Robustness Gym: Unifying the NLP Evaluation Landscape [91.80175115162218]
Deep neural networks are often brittle when deployed in real-world systems.
Recent research has focused on testing the robustness of such models.
We propose a solution in the form of Robustness Gym, a simple and evaluation toolkit.
arXiv Detail & Related papers (2021-01-13T02:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.