Related papers: Assessing Trustworthiness of Autonomous Systems

Assessing Trustworthiness of Autonomous Systems

URL: http://arxiv.org/abs/2305.03411v2
Date: Thu, 11 May 2023 10:14:05 GMT
Title: Assessing Trustworthiness of Autonomous Systems
Authors: Gregory Chance and Dhaminda B. Abeywickrama and Beckett LeClair and Owen Kerr and Kerstin Eder
Abstract summary: As Autonomous Systems (AS) become more ubiquitous in society, more responsible for our safety and our interaction with them more frequent, it is essential that they are trustworthy. Assessing the trustworthiness of AS is a mandatory challenge for the verification and development community. This will require appropriate standards and suitable metrics that may serve to objectively and comparatively judge trustworthiness of AS across the broad range of current and future applications.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As Autonomous Systems (AS) become more ubiquitous in society, more responsible for our safety and our interaction with them more frequent, it is essential that they are trustworthy. Assessing the trustworthiness of AS is a mandatory challenge for the verification and development community. This will require appropriate standards and suitable metrics that may serve to objectively and comparatively judge trustworthiness of AS across the broad range of current and future applications. The meta-expression `trustworthiness' is examined in the context of AS capturing the relevant qualities that comprise this term in the literature. Recent developments in standards and frameworks that support assurance of autonomous systems are reviewed. A list of key challenges are identified for the community and we present an outline of a process that can be used as a trustworthiness assessment framework for AS.

Related papers

Towards Trustworthy GUI Agents: A Survey [64.6445117343499]
This survey examines the trustworthiness of GUI agents in five critical dimensions. We identify major challenges such as vulnerability to adversarial attacks, cascading failure modes in sequential decision-making. As GUI agents become more widespread, establishing robust safety standards and responsible development practices is essential.
arXiv Detail & Related papers (2025-03-30T13:26:00Z)
REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models [59.445672459851274]
REVAL is a comprehensive benchmark designed to evaluate the textbfREliability and textbfVALue of Large Vision-Language Models. REVAL encompasses over 144K image-text Visual Question Answering (VQA) samples, structured into two primary sections: Reliability and Values. We evaluate 26 models, including mainstream open-source LVLMs and prominent closed-source models like GPT-4o and Gemini-1.5-Pro.
arXiv Detail & Related papers (2025-03-20T07:54:35Z)
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective [333.9220561243189]
Generative Foundation Models (GenFMs) have emerged as transformative tools. Their widespread adoption raises critical concerns regarding trustworthiness across dimensions. This paper presents a comprehensive framework to address these challenges through three key contributions.
arXiv Detail & Related papers (2025-02-20T06:20:36Z)
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability. The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z)
Assessing the Trustworthiness of Electronic Identity Management Systems: Framework and Insights from Inception to Deployment [9.132025152225447]
This paper introduces an integrated Digital Identity Systems Trustworthiness Assessment Framework (DISTAF) It is supported by over 65 mechanisms and over 400 metrics derived from international standards and technical guidelines. We demonstrate the application of DISTAF through a real-world implementation using a Modular Open Source Identity Platform (MOSIP) instance.
arXiv Detail & Related papers (2025-02-15T11:26:30Z)
Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey [92.36487127683053]
Retrieval-Augmented Generation (RAG) is an advanced technique designed to address the challenges of Artificial Intelligence-Generated Content (AIGC) RAG provides reliable and up-to-date external knowledge, reduces hallucinations, and ensures relevant context across a wide range of tasks. Despite RAG's success and potential, recent studies have shown that the RAG paradigm also introduces new risks, including privacy concerns, adversarial attacks, and accountability issues.
arXiv Detail & Related papers (2025-02-08T06:50:47Z)
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows" [74.7488607599921]
FaithEval is a benchmark to evaluate the faithfulness of large language models (LLMs) in contextual scenarios. FaithEval comprises 4.9K high-quality problems in total, validated through a rigorous four-stage context construction and validation framework.
arXiv Detail & Related papers (2024-09-30T06:27:53Z)
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey [59.26328612791924]
Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the development of Large Language Models (LLMs) We propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy.
arXiv Detail & Related papers (2024-09-16T09:06:44Z)
Trustworthiness for an Ultra-Wideband Localization Service [2.4979362117484714]
This paper proposes a holistic trustworthiness assessment framework for ultra-wideband self-localization. Our goal is to provide guidance for evaluating a system's trustworthiness based on objective evidence. Our approach guarantees that the resulting trustworthiness indicators correspond to chosen real-world threats.
arXiv Detail & Related papers (2024-08-10T11:57:10Z)
On Specifying for Trustworthiness [39.845582350253515]
We look across a range of AS domains with consideration of the resilience, trust, functionality, verifiability, security, and governance and regulation of AS. We highlight the intellectual challenges that are involved with specifying for trustworthiness in AS that cut across domains and are exacerbated by the inherent uncertainty involved with the environments in which AS need to operate.
arXiv Detail & Related papers (2022-06-22T23:37:18Z)
Towards a multi-stakeholder value-based assessment framework for algorithmic systems [76.79703106646967]
We develop a value-based assessment framework that visualizes closeness and tensions between values. We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
arXiv Detail & Related papers (2022-05-09T19:28:32Z)
Designing for Responsible Trust in AI Systems: A Communication Perspective [56.80107647520364]
We draw from communication theories and literature on trust in technologies to develop a conceptual model called MATCH. We highlight transparency and interaction as AI systems' affordances that present a wide range of trustworthiness cues to users. We propose a checklist of requirements to help technology creators identify appropriate cues to use.
arXiv Detail & Related papers (2022-04-29T00:14:33Z)
Defining Security Requirements with the Common Criteria: Applications, Adoptions, and Challenges [17.700647389830774]
The adoption of ICT products with security properties depends on consumers' confidence and markets' trust in the security functionalities. Common Criteria for Information Technology Security Evaluation (often referred to as Common Criteria or CC) is an international standard for cyber security certification. Best practices on developing Protection Profiles, recommendations, and future directions for trusted cybersecurity advancement are presented.
arXiv Detail & Related papers (2022-01-19T05:05:33Z)
Reliability Testing for Natural Language Processing Systems [14.393308846231083]
We argue for the need for reliability testing and contextualize it among existing work on improving accountability. We show how adversarial attacks can be reframed for this goal, via a framework for developing reliability tests.
arXiv Detail & Related papers (2021-05-06T11:24:58Z)
How Trustworthy are Performance Evaluations for Basic Vision Tasks? [46.0590176230731]
This paper examines performance evaluation criteria for basic vision tasks involving sets of objects namely, object detection, instance-level segmentation and multi-object tracking. The rankings of algorithms by an existing criterion can fluctuate with different choices of parameters, making their evaluations unreliable. This work suggests a notion of trustworthiness for performance criteria, which requires (i) robustness to parameters for reliability, (ii) contextual meaningfulness in sanity tests, and (iii) consistency with mathematical requirements such as the metric properties.
arXiv Detail & Related papers (2020-08-08T14:21:15Z)
Quantifying Assurance in Learning-enabled Systems [3.0938904602244355]
Dependability assurance of systems embedding machine learning components is a key step for their use in safety-critical applications. This paper develops a quantitative notion of assurance that an LES is dependable, as a core component of its assurance case. We illustrate the utility of assurance measures by application to a real world autonomous aviation system.
arXiv Detail & Related papers (2020-06-18T08:11:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.