Operationalizing Assurance Cases for Data Scientists: A Showcase of
Concepts and Tooling in the Context of Test Data Quality for Machine Learning
- URL: http://arxiv.org/abs/2312.04917v1
- Date: Fri, 8 Dec 2023 09:34:46 GMT
- Title: Operationalizing Assurance Cases for Data Scientists: A Showcase of
Concepts and Tooling in the Context of Test Data Quality for Machine Learning
- Authors: Lisa J\"ockel, Michael Kl\"as, Janek Gro{\ss}, Pascal Gerber, Markus
Scholz, Jonathan Eberle, Marc Teschner, Daniel Seifert, Richard Hawkins, John
Molloy, Jens Ottnad
- Abstract summary: Assurance Cases (ACs) are an established approach in safety engineering to argue quality claims in a structured way.
We propose a framework to support the operationalization of ACs for Machine Learning (ML) components based on technologies that data scientists use on a daily basis: Python and Jupyter Notebook.
Results from the application of the framework, documented through notebooks, can be integrated into existing AC tools.
- Score: 1.6403311770639912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Assurance Cases (ACs) are an established approach in safety engineering to
argue quality claims in a structured way. In the context of quality assurance
for Machine Learning (ML)-based software components, ACs are also being
discussed and appear promising. Tools for operationalizing ACs do exist, yet
mainly focus on supporting safety engineers on the system level. However,
assuring the quality of an ML component within the system is commonly the
responsibility of data scientists, who are usually less familiar with these
tools. To address this gap, we propose a framework to support the
operationalization of ACs for ML components based on technologies that data
scientists use on a daily basis: Python and Jupyter Notebook. Our aim is to
make the process of creating ML-related evidence in ACs more effective. Results
from the application of the framework, documented through notebooks, can be
integrated into existing AC tools. We illustrate the application of the
framework on an example excerpt concerned with the quality of the test data.
Related papers
- OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use [101.57043903478257]
The dream to create AI assistants as capable and versatile as the fictional J.A.R.V.I.S from Iron Man has long captivated imaginations.<n>With the evolution of (multi-modal) large language models ((M)LLMs), this dream is closer to reality.<n>This survey aims to consolidate the state of OS Agents research, providing insights to guide both academic inquiry and industrial development.
arXiv Detail & Related papers (2025-08-06T14:33:45Z) - SPARQL Query Generation with LLMs: Measuring the Impact of Training Data Memorization and Knowledge Injection [81.78173888579941]
Large Language Models (LLMs) are considered a well-suited method to increase the quality of the question-answering functionality.<n>LLMs are trained on web data, where researchers have no control over whether the benchmark or the knowledge graph was already included in the training data.<n>This paper introduces a novel method that evaluates the quality of LLMs by generating a SPARQL query from a natural-language question.
arXiv Detail & Related papers (2025-07-18T12:28:08Z) - Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z) - OntoGSN: An Ontology for Dynamic Management of Assurance Cases [0.3999851878220878]
We present OntoGSN: an ontology and supporting OWL for managing ACs in the Goalcturing Notation (GSN) standard.<n>OntoGSN offers a knowledge representation and a queryable graph that can be automatically, evaluated, and updated.<n>We demonstrate the utility of our contributions in an example involving assurance of robustness in large language models.
arXiv Detail & Related papers (2025-05-20T08:15:16Z) - Learning to Ask: When LLM Agents Meet Unclear Instruction [55.65312637965779]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone.
We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench.
We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z) - SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering [79.07755560048388]
SWE-agent is a system that facilitates LM agents to autonomously use computers to solve software engineering tasks.
SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs.
We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively.
arXiv Detail & Related papers (2024-05-06T17:41:33Z) - ChatSOS: LLM-based knowledge Q&A system for safety engineering [0.0]
This study introduces an LLM-based Q&A system for safety engineering, enhancing the comprehension and response accuracy of the model.
We employ prompt engineering to incorporate external knowledge databases, thus enriching the LLM with up-to-date and reliable information.
Our findings indicate that the integration of external knowledge significantly augments the capabilities of LLM for in-depth problem analysis and autonomous task assignment.
arXiv Detail & Related papers (2023-12-14T03:25:23Z) - Machine Learning-Enabled Software and System Architecture Frameworks [48.87872564630711]
The stakeholders with data science and Machine Learning related concerns, such as data scientists and data engineers, are yet to be included in existing architecture frameworks.
We surveyed 61 subject matter experts from over 25 organizations in 10 countries.
arXiv Detail & Related papers (2023-08-09T21:54:34Z) - ECS -- an Interactive Tool for Data Quality Assurance [63.379471124899915]
We present a novel approach for the assurance of data quality.
For this purpose, the mathematical basics are first discussed and the approach is presented using multiple examples.
This results in the detection of data points with potentially harmful properties for the use in safety-critical systems.
arXiv Detail & Related papers (2023-07-10T06:49:18Z) - Quality Assurance in MLOps Setting: An Industrial Perspective [0.11470070927586014]
Machine learning (ML) is widely used in industry to provide the core functionality of production systems.
Due to production demand and time constraints, automated software engineering practices are highly applicable.
This paper examines the QA challenges that arise in industrial MLOps and conceptualizes modular strategies to deal with data integrity and Data Quality.
arXiv Detail & Related papers (2022-11-23T05:02:24Z) - An Empirical Evaluation of Flow Based Programming in the Machine
Learning Deployment Context [11.028123436097616]
Data Oriented Architecture (DOA) is an emerging approach that can support data scientists and software developers when addressing challenges.
This paper proposes to consider Flow-Based Programming (FBP) as a paradigm for creating DOA applications.
We empirically evaluate FBP in the context of ML deployment on four applications that represent typical data science projects.
arXiv Detail & Related papers (2022-04-27T09:08:48Z) - What is Software Quality for AI Engineers? Towards a Thinning of the Fog [9.401273164668092]
The goal of this study is to investigate the software quality assurance strategies adopted during the development, integration, and maintenance of AI/ML components and code.
A qualitative analysis of the interview data identified 12 issues in the development of AI/ML components.
The results of this study should guide future work on software quality assurance processes and techniques for AI/ML components.
arXiv Detail & Related papers (2022-03-23T19:43:35Z) - Realistic simulation of users for IT systems in cyber ranges [63.20765930558542]
We instrument each machine by means of an external agent to generate user activity.
This agent combines both deterministic and deep learning based methods to adapt to different environment.
We also propose conditional text generation models to facilitate the creation of conversations and documents.
arXiv Detail & Related papers (2021-11-23T10:53:29Z) - Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
We have developed a proven systems engineering approach for machine learning development and deployment.
Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z) - Collective Knowledge: organizing research projects as a database of
reusable components and portable workflows with common APIs [0.2538209532048866]
This article provides the motivation and overview of the Collective Knowledge framework (CK or cKnowledge)
The CK concept is to decompose research projects into reusable components that encapsulate research artifacts.
The long-term goal is to accelerate innovation by connecting researchers and practitioners to share and reuse all their knowledge.
arXiv Detail & Related papers (2020-11-02T17:42:59Z) - Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results.
We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.