Related papers: Engineering Safety Requirements for Autonomous Driving with Large Language Models

Engineering Safety Requirements for Autonomous Driving with Large Language Models

URL: http://arxiv.org/abs/2403.16289v1
Date: Sun, 24 Mar 2024 20:40:51 GMT
Title: Engineering Safety Requirements for Autonomous Driving with Large Language Models
Authors: Ali Nouri, Beatriz Cabrero-Daniel, Fredrik Törner, Hȧkan Sivencrona, Christian Berger,
Abstract summary: Large Language Models (LLMs) can play a key role in automatically refining and decomposing requirements after each update. This study proposes a prototype of a pipeline of prompts and LLMs that receives an item definition and outputs solutions in the form of safety requirements.
Score: 0.6699222582814232
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Changes and updates in the requirement artifacts, which can be frequent in the automotive domain, are a challenge for SafetyOps. Large Language Models (LLMs), with their impressive natural language understanding and generating capabilities, can play a key role in automatically refining and decomposing requirements after each update. In this study, we propose a prototype of a pipeline of prompts and LLMs that receives an item definition and outputs solutions in the form of safety requirements. This pipeline also performs a review of the requirement dataset and identifies redundant or contradictory requirements. We first identified the necessary characteristics for performing HARA and then defined tests to assess an LLM's capability in meeting these criteria. We used design science with multiple iterations and let experts from different companies evaluate each cycle quantitatively and qualitatively. Finally, the prototype was implemented at a case company and the responsible team evaluated its efficiency.

Related papers

VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks [100.3234156027118]
We present VLABench, an open-source benchmark for evaluating universal LCM task learning. VLABench provides 100 carefully designed categories of tasks, with strong randomization in each category of task and a total of 2000+ objects. The benchmark assesses multiple competencies including understanding of mesh&texture, spatial relationship, semantic instruction, physical laws, knowledge transfer and reasoning.
arXiv Detail & Related papers (2024-12-24T06:03:42Z)
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z)
Welcome Your New AI Teammate: On Safety Analysis by Leashing Large Language Models [0.6699222582814232]
"Hazard Analysis & Risk Assessment" (HARA) is an essential step to start the safety requirements specification. We propose a framework to support a higher degree of automation of HARA with Large Language Models (LLMs)
arXiv Detail & Related papers (2024-03-14T16:56:52Z)
Quantitative Assurance and Synthesis of Controllers from Activity Diagrams [4.419843514606336]
Probabilistic model checking is a widely used formal verification technique to automatically verify qualitative and quantitative properties. This makes it not accessible for researchers and engineers who may not have the required knowledge. We propose a comprehensive verification framework for ADs, including a new profile for probability time, quality annotations, a semantics interpretation of ADs in three Markov models, and a set of transformation rules from activity diagrams to the PRISM language. Most importantly, we developed algorithms for transformation and implemented them in a tool, called QASCAD, using model-based techniques, for fully automated verification.
arXiv Detail & Related papers (2024-02-29T22:40:39Z)
PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models [72.57329554067195]
ProxyQA is an innovative framework dedicated to assessing longtext generation. It comprises in-depth human-curated meta-questions spanning various domains, each accompanied by specific proxy-questions with pre-annotated answers. It assesses the generated content's quality through the evaluator's accuracy in addressing the proxy-questions.
arXiv Detail & Related papers (2024-01-26T18:12:25Z)
TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation. Specifically, task decomposition, tool selection, and parameter prediction are assessed. Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z)
CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark. In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship. We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z)
Advancing Requirements Engineering through Generative AI: Assessing the Role of LLMs [10.241642683713467]
Large-language models (LLMs) have shown significant promise in diverse domains, including natural language processing, code generation, and program understanding. This chapter explores the potential of LLMs in driving Requirements Engineering processes, aiming to improve the efficiency and accuracy of requirements-related tasks.
arXiv Detail & Related papers (2023-10-21T11:29:31Z)
Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges. Our model is trained on user queries and LLM-generated responses under massive real-world scenarios. Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z)
Natural Language Processing for Requirements Formalization: How to Derive New Approaches? [0.32885740436059047]
We present and discuss principal ideas and state-of-the-art methodologies from the field of NLP. We discuss two different approaches in detail and highlight the iterative development of rule sets. The presented methods are demonstrated on two industrial use cases from the automotive and railway domains.
arXiv Detail & Related papers (2023-09-23T05:45:19Z)
Technical Report on Neural Language Models and Few-Shot Learning for Systematic Requirements Processing in MDSE [1.6286277560322266]
This paper is based on the analysis of an open-source set of automotive requirements. We derive domain-specific language constructs helping us to avoid ambiguities in requirements and increase the level of formality.
arXiv Detail & Related papers (2022-11-16T18:06:25Z)
A Research Agenda for Artificial Intelligence in the Field of Flexible Production Systems [53.47496941841855]
Production companies face problems when it comes to quickly adapting their production control to fluctuating demands or changing requirements. Control approaches aiming to encapsulate production functions in the sense of services have shown to be promising in order to increase flexibility of Cyber-Physical Production Systems. But an existing challenge of such approaches is finding production plans based on provided functionalities for a set of requirements, especially when there is no direct (i.e., syntactic) match between demanded and provided functions.
arXiv Detail & Related papers (2021-12-31T14:38:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.