Improving the State of the Art for Training Human-AI Teams: Technical
Report #3 -- Analysis of Testbed Alternatives
- URL: http://arxiv.org/abs/2309.03213v1
- Date: Tue, 29 Aug 2023 14:06:30 GMT
- Title: Improving the State of the Art for Training Human-AI Teams: Technical
Report #3 -- Analysis of Testbed Alternatives
- Authors: Lillian Asiala, James E. McCarthy, Lixiao Huang
- Abstract summary: Sonalysts is working on an initiative to expand its expertise in teaming to Human-Artificial Intelligence (AI) teams.
To provide a foundation for that research, Sonalysts is investigating the development of a Synthetic Task Environment.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sonalysts is working on an initiative to expand our current expertise in
teaming to Human-Artificial Intelligence (AI) teams by developing original
research in this area. To provide a foundation for that research, Sonalysts is
investigating the development of a Synthetic Task Environment (STE). In a
previous report, we documented the findings of a recent outreach effort in
which we asked military Subject Matter Experts (SMEs) and other researchers in
the Human-AI teaming domain to identify the qualities that they most valued in
a testbed. A surprising finding from that outreach was that several respondents
recommended that our team look into existing human-AI teaming testbeds, rather
than creating something new. Based on that recommendation, we conducted a
systematic investigation of the associated landscape. In this report, we
describe the results of that investigation. Building on the survey results, we
developed testbed evaluation criteria, identified potential testbeds, and
conducted qualitative and quantitative evaluations of candidate testbeds. The
evaluation process led to five candidate testbeds for the research team to
consider. In the coming months, we will assess the viability of the various
alternatives and begin to execute our program of research.
Related papers
- Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation [58.064940977804596]
A plethora of new AI models and tools has been proposed, promising to empower researchers and academics worldwide to conduct their research more effectively and efficiently.
Ethical concerns regarding shortcomings of these tools and potential for misuse take a particularly prominent place in our discussion.
arXiv Detail & Related papers (2025-02-07T18:26:45Z) - A Decade of Action Quality Assessment: Largest Systematic Survey of Trends, Challenges, and Future Directions [8.27542607031299]
Action Quality Assessment (AQA) has far-reaching implications in areas such as low-cost physiotherapy, sports training, and workforce development.
We systematically review over 200 research papers using the preferred reporting items for systematic reviews & meta-analyses (PRISMA) framework.
This survey provides a detailed analysis of research trends, performance comparisons, challenges, & future directions.
arXiv Detail & Related papers (2025-02-05T01:33:24Z) - On Evaluating Explanation Utility for Human-AI Decision Making in NLP [39.58317527488534]
We review existing metrics suitable for application-grounded evaluation.
We demonstrate the importance of reassessing the state of the art to form and study human-AI teams.
arXiv Detail & Related papers (2024-07-03T23:53:27Z) - ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.
ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.
We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z) - Search-Based Fairness Testing: An Overview [4.453735522794044]
biases in AI systems raise ethical and societal concerns.
This paper reviews current research on fairness testing, particularly its application through search-based testing.
arXiv Detail & Related papers (2023-11-10T16:47:56Z) - Improving the State of the Art for Training Human-AI Teams: Technical
Report #2 -- Results of Researcher Knowledge Elicitation Survey [0.0]
Sonalysts has begun an internal initiative to explore the training of Human-AI teams.
The first step in this effort is to develop a Synthetic Task Environment (STE) that is capable of facilitating research on Human-AI teams.
arXiv Detail & Related papers (2023-08-29T13:54:32Z) - Improving the State of the Art for Training Human-AI Teams: Technical
Report #1 -- Results of Subject-Matter Expert Knowledge Elicitation Survey [0.0]
Sonalysts has begun an internal initiative to explore the training of human-AI teams.
We decided to use Joint All-Domain Command and Control (JADC2) as a focus point.
We engaged a number of Subject-Matter Experts (SMEs) with Command and Control experience to gain insight into developing a STE that embodied the teaming challenges associated with JADC2.
arXiv Detail & Related papers (2023-08-29T13:42:52Z) - ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models.
Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z) - Survey of Aspect-based Sentiment Analysis Datasets [55.61047894397937]
Aspect-based sentiment analysis (ABSA) is a natural language processing problem that requires analyzing user-generated reviews.
Numerous yet scattered corpora for ABSA make it difficult for researchers to identify corpora best suited for a specific ABSA subtask quickly.
This study aims to present a database of corpora that can be used to train and assess autonomous ABSA systems.
arXiv Detail & Related papers (2022-04-11T16:23:36Z) - Scaling up Search Engine Audits: Practical Insights for Algorithm
Auditing [68.8204255655161]
We set up experiments for eight search engines with hundreds of virtual agents placed in different regions.
We demonstrate the successful performance of our research infrastructure across multiple data collections.
We conclude that virtual agents are a promising venue for monitoring the performance of algorithms across long periods of time.
arXiv Detail & Related papers (2021-06-10T15:49:58Z) - Robustness Gym: Unifying the NLP Evaluation Landscape [91.80175115162218]
Deep neural networks are often brittle when deployed in real-world systems.
Recent research has focused on testing the robustness of such models.
We propose a solution in the form of Robustness Gym, a simple and evaluation toolkit.
arXiv Detail & Related papers (2021-01-13T02:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.