Related papers: A Survey on the Application of Large Language Models in Scenario-Based Testing of Automated Driving Systems

A Survey on the Application of Large Language Models in Scenario-Based Testing of Automated Driving Systems

URL: http://arxiv.org/abs/2505.16587v1
Date: Thu, 22 May 2025 12:25:44 GMT
Title: A Survey on the Application of Large Language Models in Scenario-Based Testing of Automated Driving Systems
Authors: Yongqi Zhao, Ji Zhou, Dong Bi, Tomislav Mihalj, Jia Hu, Arno Eichberger,
Abstract summary: The paper concludes by outlining five open challenges and potential research directions.<n>The emergence of Large Language Models (LLMs) has introduced new opportunities to reinforce scenario-based testing.
Score: 6.608557716494977
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The safety and reliability of Automated Driving Systems (ADSs) must be validated prior to large-scale deployment. Among existing validation approaches, scenario-based testing has been regarded as a promising method to improve testing efficiency and reduce associated costs. Recently, the emergence of Large Language Models (LLMs) has introduced new opportunities to reinforce this approach. While an increasing number of studies have explored the use of LLMs in the field of automated driving, a dedicated review focusing on their application within scenario-based testing remains absent. This survey addresses this gap by systematically categorizing the roles played by LLMs across various phased of scenario-based testing, drawing from both academic research and industrial practice. In addition, key characteristics of LLMs and corresponding usage strategies are comprehensively summarized. The paper concludes by outlining five open challenges and potential research directions. To support ongoing research efforts, a continuously updated repository of recent advancements and relevant open-source tools is made available at: https://github.com/ftgTUGraz/LLM4ADSTest.

Related papers

Let the Barbarians In: How AI Can Accelerate Systems Performance Research [80.43506848683633]
We term this iterative cycle of generation, evaluation, and refinement AI-Driven Research for Systems.<n>We demonstrate that ADRS-generated solutions can match or even outperform human state-of-the-art designs.
arXiv Detail & Related papers (2025-12-16T18:51:23Z)
Large Language Models for Unit Test Generation: Achievements, Challenges, and the Road Ahead [15.43943391801509]
Unit testing is an essential yet laborious technique for verifying software.<n>Large Language Models (LLMs) address this limitation by utilizing by leveraging their data-driven knowledge of code semantics and programming patterns.<n>This framework analyzes the literature regarding core generative strategies and a set of enhancement techniques.
arXiv Detail & Related papers (2025-11-26T13:30:11Z)
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs [78.09559830840595]
We present the first systematic study on quantizing diffusion-based language models.<n>We identify the presence of activation outliers, characterized by abnormally large activation values.<n>We implement state-of-the-art PTQ methods and conduct a comprehensive evaluation.
arXiv Detail & Related papers (2025-08-20T17:59:51Z)
MLLM-CL: Continual Learning for Multimodal Large Language Models [62.90736445575181]
We introduce MLLM-CL, a novel benchmark encompassing domain and ability continual learning.<n>Our approach can integrate domain-specific knowledge and functional abilities with minimal forgetting, significantly outperforming existing methods.
arXiv Detail & Related papers (2025-06-05T17:58:13Z)
Requirements-Driven Automated Software Testing: A Systematic Review [13.67495800498868]
This study synthesizes the current state of REDAST research, highlights trends, and proposes future directions.<n>This systematic literature review ( SLR) explores the landscape of REDAST by analyzing requirements input, transformation techniques, test outcomes, evaluation methods, and existing limitations.
arXiv Detail & Related papers (2025-02-25T23:13:09Z)
The Potential of LLMs in Automating Software Testing: From Generation to Reporting [0.0]
Manual testing, while effective, can be time consuming and costly, leading to an increased demand for automated methods.<n>Recent advancements in Large Language Models (LLMs) have significantly influenced software engineering.<n>This paper explores an agent-oriented approach to automated software testing, using LLMs to reduce human intervention and enhance testing efficiency.
arXiv Detail & Related papers (2024-12-31T02:06:46Z)
Exploring Critical Testing Scenarios for Decision-Making Policies: An LLM Approach [14.32199539218175]
This paper proposes an adaptable Large Language Model (LLM)-driven online testing framework to explore critical and diverse testing scenarios.<n>Specifically, we design a "generate-test-feedback" pipeline with templated prompt engineering to harness the world knowledge and reasoning abilities of LLMs.
arXiv Detail & Related papers (2024-12-09T17:27:04Z)
PentestAgent: Incorporating LLM Agents to Automated Penetration Testing [6.815381197173165]
Manual penetration testing is time-consuming and expensive. Recent advancements in large language models (LLMs) offer new opportunities for enhancing penetration testing. We propose PentestAgent, a novel LLM-based automated penetration testing framework.
arXiv Detail & Related papers (2024-11-07T21:10:39Z)
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs. Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z)
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification is a key element of machine learning applications.<n>We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines.<n>We conduct a large-scale empirical investigation of UQ and normalization techniques across eleven tasks, identifying the most effective approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z)
Automatic benchmarking of large multimodal models via iterative experiment programming [71.78089106671581]
We present APEx, the first framework for automatic benchmarking of LMMs. Given a research question expressed in natural language, APEx leverages a large language model (LLM) and a library of pre-specified tools to generate a set of experiments for the model at hand. The report drives the testing procedure: based on the current status of the investigation, APEx chooses which experiments to perform and whether the results are sufficient to draw conclusions.
arXiv Detail & Related papers (2024-06-18T06:43:46Z)
AutoSurvey: Large Language Models Can Automatically Write Surveys [77.0458309675818]
This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys. Traditional survey paper creation faces challenges due to the vast volume and complexity of information. Our contributions include a comprehensive solution to the survey problem, a reliable evaluation method, and experimental validation demonstrating AutoSurvey's effectiveness.
arXiv Detail & Related papers (2024-06-10T12:56:06Z)
Test Oracle Automation in the era of LLMs [52.69509240442899]
Large Language Models (LLMs) have demonstrated remarkable proficiency in tackling diverse software testing tasks. This paper aims to enable discussions on the potential of using LLMs for test oracle automation, along with the challenges that may emerge during the generation of various types of oracles.
arXiv Detail & Related papers (2024-05-21T13:19:10Z)
Automating Research Synthesis with Domain-Specific Large Language Model Fine-Tuning [0.9110413356918055]
This research pioneers the use of fine-tuned Large Language Models (LLMs) to automate Systematic Literature Reviews ( SLRs) Our study employed the latest fine-tuning methodologies together with open-sourced LLMs, and demonstrated a practical and efficient approach to automating the final execution stages of an SLR process. The results maintained high fidelity in factual accuracy in LLM responses, and were validated through the replication of an existing PRISMA-conforming SLR.
arXiv Detail & Related papers (2024-04-08T00:08:29Z)
A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts [117.72709110877939]
Test-time adaptation (TTA) has the potential to adapt a pre-trained model to unlabeled data during testing, before making predictions.<n>We categorize TTA into several distinct groups based on the form of test data, namely, test-time domain adaptation, test-time batch adaptation, and online test-time adaptation.
arXiv Detail & Related papers (2023-03-27T16:32:21Z)
A Survey on Scenario-Based Testing for Automated Driving Systems in High-Fidelity Simulation [26.10081199009559]
Testing the system on the road is the closest to real-world and desirable approach, but it is incredibly costly. A popular alternative is to evaluate an ADS's performance in some well-designed challenging scenarios, a.k.a. scenario-based testing. High-fidelity simulators have been widely used in this setting to maximize flexibility and convenience in testing what-if scenarios.
arXiv Detail & Related papers (2021-12-02T03:41:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.