Related papers: Retrieval Augmented Generation Based LLM Evaluation For Protocol State Machine Inference With Chain-of-Thought Reasoning

Retrieval Augmented Generation Based LLM Evaluation For Protocol State Machine Inference With Chain-of-Thought Reasoning

URL: http://arxiv.org/abs/2502.15727v1
Date: Thu, 30 Jan 2025 01:03:49 GMT
Title: Retrieval Augmented Generation Based LLM Evaluation For Protocol State Machine Inference With Chain-of-Thought Reasoning
Authors: Youssef Maklad, Fares Wael, Wael Elsersy, Ali Hamdi,
Abstract summary: This paper presents a novel approach to evaluate the efficiency of a RAG-based agentic Large Language Model (LLM) architecture in network packet seed generation for protocol fuzzing.<n>Our experiments demonstrate significant improvements of up to 18.19%, 14.81%, and 23.45% in BLEU, ROUGE, and WER, respectively, over baseline models.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents a novel approach to evaluate the efficiency of a RAG-based agentic Large Language Model (LLM) architecture in network packet seed generation for network protocol fuzzing. Enhanced by chain-of-thought (COT) prompting techniques, the proposed approach focuses on the improvement of the seeds structural quality in order to guide protocol fuzzing frameworks through a wide exploration of the protocol state space. Our method leverages RAG and text embeddings in a two-stages. In the first stage, the agent dynamically refers to the Request For Comments (RFC) documents knowledge base for answering queries regarding the protocol Finite State Machine (FSM), then it iteratively reasons through the retrieved knowledge, for output refinement and proper seed placement. In the second stage, we evaluate the response structure quality of the agent's output, based on metrics as BLEU, ROUGE, and Word Error Rate (WER) by comparing the generated packets against the ground truth packets. Our experiments demonstrate significant improvements of up to 18.19%, 14.81%, and 23.45% in BLEU, ROUGE, and WER, respectively, over baseline models. These results confirm the potential of such approach, improving LLM-based protocol fuzzing frameworks for the identification of hidden vulnerabilities.

Related papers

Efficient Document Retrieval with G-Retriever [0.0]
We propose an enhanced approach that replaces the PCST method with an attention-based sub-graph construction technique. We encode both node and edge attributes, leading to richer graph representations. Experimental evaluations on the WebQSP dataset demonstrate that our approach is competitive and marginally better results compared to the original method.
arXiv Detail & Related papers (2025-04-21T08:27:26Z)
Chain-of-Retrieval Augmented Generation [72.06205327186069]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer.<n>Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z)
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [66.93260816493553]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios. With a focus on factual accuracy, we propose three novel metrics: Completeness, Hallucination, and Irrelevance. Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z)
On Unified Prompt Tuning for Request Quality Assurance in Public Code Review [19.427661961488404]
We propose a unified framework called UniPCR to complete developer-based request quality assurance (i.e., predicting request necessity and recommending tags subtask) under a Masked Language Model (MLM) Experimental results on the Public Code Review dataset for the time span 2011-2022 demonstrate that our UniPCR framework adapts to the two subtasks and outperforms comparable accuracy-based results with state-of-the-art methods for request quality assurance.
arXiv Detail & Related papers (2024-04-11T17:41:28Z)
Sequencing Matters: A Generate-Retrieve-Generate Model for Building Conversational Agents [9.191944519634111]
The Georgetown InfoSense group has done in regard to solving the challenges presented by TREC iKAT 2023. Our submitted runs outperform the median runs by a significant margin, exhibiting superior performance in nDCG across various cut numbers and in overall success rate. Our solution involves the use of Large Language Models (LLMs) for initial answers, answer grounding by BM25, passage quality filtering by logistic regression, and answer generation by LLMs again.
arXiv Detail & Related papers (2023-11-16T02:37:58Z)
Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals [67.64770842323966]
Causal explanations of predictions of NLP systems are essential to ensure safety and establish trust. Existing methods often fall short of explaining model predictions effectively or efficiently. We propose two approaches for counterfactual (CF) approximation.
arXiv Detail & Related papers (2023-10-01T07:31:04Z)
Re-Reading Improves Reasoning in Large Language Models [87.46256176508376]
We introduce a simple, yet general and effective prompting method, Re2, to enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs) Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process. We evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality.
arXiv Detail & Related papers (2023-09-12T14:36:23Z)
Pointwise Mutual Information Based Metric and Decoding Strategy for Faithful Generation in Document Grounded Dialogs [17.691689809414843]
Existing metrics measure the degree of similarity between the generated response and the document's content. We propose a new metric that utilizes (Conditional) Point-wise Mutual Information (PMI) between the generated response and the source document. PMI quantifies the extent to which the document influences the generated response. We build upon this idea to create a new decoding technique that incorporates PMI into the response generation process to predict more faithful responses.
arXiv Detail & Related papers (2023-05-20T13:34:34Z)
Parallel Hierarchical Transformer with Attention Alignment for Abstractive Multi-Document Summarization [4.035753155957699]
Abstractive Multi-Document Summarization (MDS) brings challenges on the representation and coverage of its lengthy and linked sources. This study develops a Parallel Hierarchical Transformer (PHT) with attention alignment for MDS.
arXiv Detail & Related papers (2022-08-16T17:02:48Z)
UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query. Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z)
Contextual Fine-to-Coarse Distillation for Coarse-grained Response Selection in Open-Domain Conversations [48.046725390986595]
We propose a Contextual Fine-to-Coarse (CFC) distilled model for coarse-grained response selection in open-domain conversations. To evaluate the performance of our proposed model, we construct two new datasets based on the Reddit comments dump and Twitter corpus.
arXiv Detail & Related papers (2021-09-24T08:22:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.