Test and Evaluation Framework for Multi-Agent Systems of Autonomous
Intelligent Agents
- URL: http://arxiv.org/abs/2101.10430v1
- Date: Mon, 25 Jan 2021 21:42:27 GMT
- Title: Test and Evaluation Framework for Multi-Agent Systems of Autonomous
Intelligent Agents
- Authors: Erin Lanus, Ivan Hernandez, Adam Dachowicz, Laura Freeman, Melanie
Grande, Andrew Lang, Jitesh H. Panchal, Anthony Patrick, Scott Welch
- Abstract summary: We consider the challenges of developing a unifying test and evaluation framework for complex ensembles of cyber-physical systems with embedded artificial intelligence.
We propose a framework that incorporates test and evaluation throughout not only the development life cycle, but continues into operation as the system learns and adapts.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Test and evaluation is a necessary process for ensuring that engineered
systems perform as intended under a variety of conditions, both expected and
unexpected. In this work, we consider the unique challenges of developing a
unifying test and evaluation framework for complex ensembles of cyber-physical
systems with embedded artificial intelligence. We propose a framework that
incorporates test and evaluation throughout not only the development life
cycle, but continues into operation as the system learns and adapts in a noisy,
changing, and contended environment. The framework accounts for the challenges
of testing the integration of diverse systems at various hierarchical scales of
composition while respecting that testing time and resources are limited. A
generic use case is provided for illustrative purposes and research directions
emerging as a result of exploring the use case via the framework are suggested.
Related papers
- AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems [26.605694684145313]
In this study, we design and implement a testing tool, tool, to comprehensively and effectively evaluate AI systems.
The tool extensively assesses adversarial robustness, model interpretability, and performs neuron analysis.
Our research sheds light on a general solution for AI systems testing landscape.
arXiv Detail & Related papers (2024-11-09T11:15:17Z) - Algorithmic Scenario Generation as Quality Diversity Optimization [8.010900084313414]
The increasing complexity of robots and autonomous agents that interact with people highlights the critical need for approaches that systematically test them before deployment.
This review paper describes the insights that we have gained from working on each component of the framework, and shows how integrating these components leads to the discovery of a diverse range of realistic and challenging scenarios.
arXiv Detail & Related papers (2024-09-07T05:20:41Z) - Coupled Requirements-driven Testing of CPS: From Simulation To Reality [5.7736484832934325]
Failures in safety-critical Cyber-Physical Systems (CPS) can lead to severe incidents impacting physical infrastructure or even harming humans.
Current simulation and field testing practices, particularly in the domain of small Unmanned Aerial Systems (sUAS), are ad-hoc and lack a thorough, structured testing process.
We have developed an initial framework for validating CPS, specifically focusing on sUAS and robotic applications.
arXiv Detail & Related papers (2024-03-24T20:32:12Z) - Evaluating General-Purpose AI with Psychometrics [43.85432514910491]
We discuss the need for a comprehensive and accurate evaluation of general-purpose AI systems such as large language models.
Current evaluation methodology, mostly based on benchmarks of specific tasks, falls short of adequately assessing these versatile AI systems.
To tackle these challenges, we suggest transitioning from task-oriented evaluation to construct-oriented evaluation.
arXiv Detail & Related papers (2023-10-25T05:38:38Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Testing System Intelligence [0.902877390685954]
We argue that building intelligent systems passing the replacement test involves a series of technical problems that are outside the scope of current AI.
We suggest that the replacement test, based on the complementarity of skills between human and machine, can lead to a multitude of intelligence concepts.
arXiv Detail & Related papers (2023-05-19T06:46:32Z) - Interactive System-wise Anomaly Detection [66.3766756452743]
Anomaly detection plays a fundamental role in various applications.
It is challenging for existing methods to handle the scenarios where the instances are systems whose characteristics are not readily observed as data.
We develop an end-to-end approach which includes an encoder-decoder module that learns system embeddings.
arXiv Detail & Related papers (2023-04-21T02:20:24Z) - A Domain-Agnostic Approach for Characterization of Lifelong Learning
Systems [128.63953314853327]
"Lifelong Learning" systems are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability.
We show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems.
arXiv Detail & Related papers (2023-01-18T21:58:54Z) - Multi Agent System for Machine Learning Under Uncertainty in Cyber
Physical Manufacturing System [78.60415450507706]
Recent advancements in predictive machine learning has led to its application in various use cases in manufacturing.
Most research focused on maximising predictive accuracy without addressing the uncertainty associated with it.
In this paper, we determine the sources of uncertainty in machine learning and establish the success criteria of a machine learning system to function well under uncertainty.
arXiv Detail & Related papers (2021-07-28T10:28:05Z) - Integrated Benchmarking and Design for Reproducible and Accessible
Evaluation of Robotic Agents [61.36681529571202]
We describe a new concept for reproducible robotics research that integrates development and benchmarking.
One of the central components of this setup is the Duckietown Autolab, a standardized setup that is itself relatively low-cost and reproducible.
We validate the system by analyzing the repeatability of experiments conducted using the infrastructure and show that there is low variance across different robot hardware and across different remote labs.
arXiv Detail & Related papers (2020-09-09T15:31:29Z) - Is Your Goal-Oriented Dialog Model Performing Really Well? Empirical
Analysis of System-wise Evaluation [114.48767388174218]
This paper presents an empirical analysis on different types of dialog systems composed of different modules in different settings.
Our results show that a pipeline dialog system trained using fine-grained supervision signals at different component levels often obtains better performance than the systems that use joint or end-to-end models trained on coarse-grained labels.
arXiv Detail & Related papers (2020-05-15T05:20:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.