Related papers: Rethinking Certification for Trustworthy Machine Learning-Based Applications

Rethinking Certification for Trustworthy Machine Learning-Based Applications

URL: http://arxiv.org/abs/2305.16822v4
Date: Sun, 22 Oct 2023 19:31:17 GMT
Title: Rethinking Certification for Trustworthy Machine Learning-Based Applications
Authors: Marco Anisetti and Claudio A. Ardagna and Nicola Bena and Ernesto Damiani
Abstract summary: Machine Learning (ML) is increasingly used to implement advanced applications with non-deterministic behavior. Existing certification schemes are not immediately applicable to non-deterministic applications built on ML models. This article analyzes the challenges and deficiencies of current certification schemes, discusses open research issues, and proposes a first certification scheme for ML-based applications.
Score: 3.886429361348165
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine Learning (ML) is increasingly used to implement advanced applications with non-deterministic behavior, which operate on the cloud-edge continuum. The pervasive adoption of ML is urgently calling for assurance solutions assessing applications non-functional properties (e.g., fairness, robustness, privacy) with the aim to improve their trustworthiness. Certification has been clearly identified by policymakers, regulators, and industrial stakeholders as the preferred assurance technique to address this pressing need. Unfortunately, existing certification schemes are not immediately applicable to non-deterministic applications built on ML models. This article analyzes the challenges and deficiencies of current certification schemes, discusses open research issues, and proposes a first certification scheme for ML-based applications.

Related papers

Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling [48.15636223774418]
Large language models (LLMs) frequently hallucinate due to misaligned self-awareness. Existing approaches mitigate hallucinations via uncertainty estimation or query rejection. We propose the Explicit Knowledge Boundary Modeling framework to integrate fast and slow reasoning systems.
arXiv Detail & Related papers (2025-03-04T03:16:02Z)
Approach Towards Semi-Automated Certification for Low Criticality ML-Enabled Airborne Applications [0.0]
This paper proposes a semi automated certification approach, specifically for low criticality ML systems. Key aspects include structured classification to guide certification rigor on system attributes, an Assurance Profile that consolidates evaluation outcomes into a confidence measure the ML component, and methodologies for integrating human oversight into certification activities.
arXiv Detail & Related papers (2025-01-28T15:49:51Z)
Powering LLM Regulation through Data: Bridging the Gap from Compute Thresholds to Customer Experiences [0.0]
This paper argues that current regulatory approaches, which focus on compute-level thresholds and generalized model evaluations, are insufficient to ensure the safety and effectiveness of specific LLM-based user experiences. We propose a shift towards a certification process centered on actual user-facing experiences and the curation of high-quality datasets for evaluation.
arXiv Detail & Related papers (2025-01-12T16:20:40Z)
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs. Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z)
SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models [75.67623347512368]
We propose toolns, a comprehensive framework designed for conducting safety evaluations of MLLMs. Our framework consists of a comprehensive harmful query dataset and an automated evaluation protocol. Based on our framework, we conducted large-scale experiments on 15 widely-used open-source MLLMs and 6 commercial MLLMs.
arXiv Detail & Related papers (2024-10-24T17:14:40Z)
Simulation-based Safety Assurance for an AVP System incorporating Learning-Enabled Components [0.6526824510982802]
Testing, verification and validation AD/ADAS safety-critical applications remain as one the main challenges. We explain the simulation-based development platform that is designed to verify and validate safety-critical learning-enabled systems.
arXiv Detail & Related papers (2023-09-28T09:00:31Z)
MLGuard: Defend Your Machine Learning Model! [3.4069804433026314]
We propose MLGuard, a new approach to specify contracts for Machine Learning applications. Our work is intended to provide the overarching framework required for building ML applications and monitoring their safety.
arXiv Detail & Related papers (2023-09-04T06:08:11Z)
Vulnerability of Machine Learning Approaches Applied in IoT-based Smart Grid: A Review [51.31851488650698]
Machine learning (ML) sees an increasing prevalence of being used in the internet-of-things (IoT)-based smart grid. adversarial distortion injected into the power signal will greatly affect the system's normal control and operation. It is imperative to conduct vulnerability assessment for MLsgAPPs applied in the context of safety-critical power systems.
arXiv Detail & Related papers (2023-08-30T03:29:26Z)
AdvCat: Domain-Agnostic Robustness Assessment for Cybersecurity-Critical Applications with Categorical Inputs [29.907921481157974]
robustness against adversarial attacks is one of the key trust concerns for Machine Learning deployment. We propose a provably optimal yet highly efficient adversarial robustness assessment protocol for a wide band of ML-driven cybersecurity-critical applications. We demonstrate the use of the domain-agnostic robustness assessment method with substantial experimental study on fake news detection and intrusion detection problems.
arXiv Detail & Related papers (2022-12-13T18:12:02Z)
Toward Certification of Machine-Learning Systems for Low Criticality Airborne Applications [0.0]
Possible airborne applications of machine learning (ML) include safety-critical functions. Current certification standards for the aviation industry were developed prior to the ML renaissance. There are some fundamental incompatibilities between traditional design assurance approaches and certain aspects of ML-based systems.
arXiv Detail & Related papers (2022-09-28T10:13:28Z)
Joint Differentiable Optimization and Verification for Certified Reinforcement Learning [91.93635157885055]
In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties. We propose a framework that jointly conducts reinforcement learning and formal verification.
arXiv Detail & Related papers (2022-01-28T16:53:56Z)
Trusted Artificial Intelligence: Towards Certification of Machine Learning Applications [5.7576910363986]
The T"UV AUSTRIA Group in cooperation with the Institute for Machine Learning at the Johannes Kepler University Linz proposes a certification process. The holistic approach attempts to evaluate and verify the aspects of secure software development, functional requirements, data quality, data protection, and ethics. The audit catalog can be applied to low-risk applications within the scope of supervised learning.
arXiv Detail & Related papers (2021-03-31T08:59:55Z)
White Paper Machine Learning in Certified Systems [70.24215483154184]
DEEL Project set-up the ML Certification 3 Workgroup (WG) set-up by the Institut de Recherche Technologique Saint Exup'ery de Toulouse (IRT)
arXiv Detail & Related papers (2021-03-18T21:14:30Z)
Explanations of Machine Learning predictions: a mandatory step for its application to Operational Processes [61.20223338508952]
Credit Risk Modelling plays a paramount role. Recent machine and deep learning techniques have been applied to the task. We suggest to use LIME technique to tackle the explainability problem in this field.
arXiv Detail & Related papers (2020-12-30T10:27:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.