Can Large Language Models Automate the Refinement of Cellular Network Specifications?
- URL: http://arxiv.org/abs/2507.04214v1
- Date: Sun, 06 Jul 2025 02:40:04 GMT
- Title: Can Large Language Models Automate the Refinement of Cellular Network Specifications?
- Authors: Jianshuo Dong, Tianyi Zhang, Feng Yan, Yuanjie Li, Hewu Li, Han Qiu,
- Abstract summary: This paper investigates the feasibility of Large Language Models (LLMs) for automated cellular network specification refinement.<n>Top models can discover security-related weaknesses in over 127 out of 200 test cases within five trials.<n> Evaluations on 30 cellular attacks identify open challenges for achieving full automation.
- Score: 12.648016409257501
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cellular networks serve billions of users globally, yet concerns about reliability and security persist due to weaknesses in 3GPP standards. However, traditional analysis methods, including manual inspection and automated tools, struggle with increasingly expanding cellular network specifications. This paper investigates the feasibility of Large Language Models (LLMs) for automated cellular network specification refinement. To advance it, we leverage 200,000+ approved 3GPP Change Requests (CRs) that document specification revisions, constructing a valuable dataset for domain tasks. We introduce CR-eval, a principled evaluation framework, and benchmark 16 state-of-the-art LLMs, demonstrating that top models can discover security-related weaknesses in over 127 out of 200 test cases within five trials. To bridge potential gaps, we explore LLM specialization techniques, including fine-tuning an 8B model to match or surpass advanced LLMs like GPT-4o and DeepSeek-R1. Evaluations on 30 cellular attacks identify open challenges for achieving full automation. These findings confirm that LLMs can automate the refinement of cellular network specifications and provide valuable insights to guide future research in this direction.
Related papers
- The Digital Cybersecurity Expert: How Far Have We Come? [49.89857422097055]
We develop CSEBenchmark, a fine-grained cybersecurity evaluation framework based on 345 knowledge points expected of cybersecurity experts.<n>We evaluate 12 popular large language models (LLMs) on CSEBenchmark and find that even the best-performing model achieves only 85.42% overall accuracy.<n>By identifying and addressing specific knowledge gaps in each LLM, we achieve up to an 84% improvement in correcting previously incorrect predictions.
arXiv Detail & Related papers (2025-04-16T05:36:28Z) - AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs.
Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z) - AttackQA: Development and Adoption of a Dataset for Assisting Cybersecurity Operations using Fine-tuned and Open-Source LLMs [0.0]
Large language models (LLMs) are fine-tuned for generating responses to user queries.
In this study, we develop a cybersecurity question-answering (Q&A) dataset, called AttackQA.
We employ it to build a RAG-based Q&A system designed for analysts in security operations centers.
arXiv Detail & Related papers (2024-11-01T23:03:40Z) - ADVLLM: Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities [63.603861880022954]
We introduce ADV-LLM, an iterative self-tuning process that crafts adversarial LLMs with enhanced jailbreak ability.<n>Our framework significantly reduces the computational cost of generating adversarial suffixes while achieving nearly 100% ASR on various open-source LLMs.<n>It exhibits strong attack transferability to closed-source models, achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4, despite being optimized solely on Llama3.
arXiv Detail & Related papers (2024-10-24T06:36:12Z) - Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements [1.4433703131122861]
This paper introduces an open benchmark for automated penetration testing using large language models (LLMs)<n>We first evaluate the performance of LLMs, including GPT-4o and LLama 3.1-405B, using the state-of-the-art PentestGPT tool.<n>Our findings reveal that while LLama 3.1 demonstrates an edge over GPT-4o, both models currently fall short of performing end-to-end penetration testing even with some minimal human assistance.
arXiv Detail & Related papers (2024-10-22T16:18:41Z) - Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - CellularLint: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network Specifications [12.370608043864944]
We introduce a semi-automatic framework for inconsistency detection within the standards of 4G and 5G.
Our proposed method uses a revamped few-shot learning mechanism on domain-adapted large language models.
In our investigation, we focused on the Non-Access Stratum (NAS) and the security specifications of 4G and 5G networks, ultimately uncovering 157 inconsistencies with 82.67% accuracy.
arXiv Detail & Related papers (2024-07-18T17:48:46Z) - AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models [95.09157454599605]
Large Language Models (LLMs) are becoming increasingly powerful, but they still exhibit significant but subtle weaknesses.<n>Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies.<n>We introduce a unified framework, AutoDetect, to automatically expose weaknesses in LLMs across various tasks.
arXiv Detail & Related papers (2024-06-24T15:16:45Z) - NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security [28.125179435861316]
Large Language Models (LLMs) are being deployed across various domains, but their capacity to solve Capture the Flag (CTF) challenges has not been thoroughly evaluated.<n>We develop a novel method to assess LLMs in solving CTF challenges by creating a scalable, open-source benchmark database.<n>This database includes metadata for LLM testing and adaptive learning, compiling a diverse range of CTF challenges from popular competitions.
arXiv Detail & Related papers (2024-06-08T22:21:42Z) - Large Language Models in Wireless Application Design: In-Context Learning-enhanced Automatic Network Intrusion Detection [11.509880721677156]
We propose a pre-trained LLM-empowered framework to perform fully automatic network intrusion detection.
With experiments on a real network intrusion detection dataset, in-context learning proves to be highly beneficial.
We show that for GPT-4, testing accuracy and F1-Score can be improved by 90%.
arXiv Detail & Related papers (2024-05-17T02:56:31Z) - Large Language Models as Automated Aligners for benchmarking
Vision-Language Models [48.4367174400306]
Vision-Language Models (VLMs) have reached a new level of sophistication, showing notable competence in executing intricate cognition and reasoning tasks.
Existing evaluation benchmarks, primarily relying on rigid, hand-crafted datasets, face significant limitations in assessing the alignment of these increasingly anthropomorphic models with human intelligence.
In this work, we address the limitations via Auto-Bench, which delves into exploring LLMs as proficient curation, measuring the alignment betweenVLMs and human intelligence and value through automatic data curation and assessment.
arXiv Detail & Related papers (2023-11-24T16:12:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.