Related papers: Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study

Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study

URL: http://arxiv.org/abs/2602.20202v1
Date: Sun, 22 Feb 2026 18:20:49 GMT
Title: Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study
Authors: Jeel Piyushkumar Khatiwala, Daniel Kwaku Ntiamoah Addai, Weifeng Xu,
Abstract summary: This paper proposes a structured framework that automates forensic artifact extraction, refines data through large language models (LLMs) analysis, and validates results using a Digital Forensic Knowledge Graph (DFKG)<n> evaluated on a 13 GB forensic image dataset containing 61 applications, 2,864 databases, and 5,870 tables.<n>Case study shows framework's effectiveness, achieving over 95 percent accuracy in artifact extraction, strong support of chain-of-custody adherence, and robust contextual consistency in forensic relationships.
Score: 1.7102309907119588
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The growing reliance on AI-identified digital evidence raises significant concerns about its reliability, particularly as large language models (LLMs) are increasingly integrated into forensic investigations. This paper proposes a structured framework that automates forensic artifact extraction, refines data through LLM-driven analysis, and validates results using a Digital Forensic Knowledge Graph (DFKG). Evaluated on a 13 GB forensic image dataset containing 61 applications, 2,864 databases, and 5,870 tables, the framework ensures artifact traceability and evidentiary consistency through deterministic Unique Identifiers (UIDs) and forensic cross-referencing. We propose this methodology to address challenges in ensuring the credibility and forensic integrity of AI-identified evidence, reducing classification errors, and advancing scalable, auditable methodologies. A comprehensive case study on this dataset demonstrates the framework's effectiveness, achieving over 95 percent accuracy in artifact extraction, strong support of chain-of-custody adherence, and robust contextual consistency in forensic relationships. Key results validate the framework's ability to enhance reliability, reduce errors, and establish a legally sound paradigm for AI-assisted digital forensics.

Related papers

MERMAID: Memory-Enhanced Retrieval and Reasoning with Multi-Agent Iterative Knowledge Grounding for Veracity Assessment [8.649665560258702]
We propose a memory-enhanced veracity assessment framework that tightly couples the retrieval and reasoning processes.<n> MERMAID integrates agent-driven search, structured knowledge representations, and a persistent memory module within a Reason-Action style iterative process.<n>We evaluate MERMAID on three fact-checking benchmarks and two claim-verification datasets using multiple LLMs.
arXiv Detail & Related papers (2026-01-29T22:12:33Z)
REVEAL: Reasoning-enhanced Forensic Evidence Analysis for Explainable AI-generated Image Detection [30.963994372913092]
We introduce textbfREVEAL-Bench, the first reasoning-enhanced multimodal benchmark for AI-generated image detection.<n>Our framework integrates detection with a novel expert-grounded reinforcement learning.<n> REVEAL significantly enhances detection accuracy, explanation fidelity, and robust cross-model generalization.
arXiv Detail & Related papers (2025-11-28T13:11:08Z)
AutoMalDesc: Large-Scale Script Analysis for Cyber Threat Research [81.04845910798387]
Generating natural language explanations for threat detections remains an open problem in cybersecurity research.<n>We present AutoMalDesc, an automated static analysis summarization framework that operates independently at scale.<n>We publish our complete dataset of more than 100K script samples, including annotated seed (0.9K) datasets, along with our methodology and evaluation framework.
arXiv Detail & Related papers (2025-11-17T13:05:25Z)
AI-Generated Image Detection: An Empirical Study and Future Research Directions [6.891145787446519]
Threats posed by AI-generated media, particularly deepfakes, are raising significant challenges for forensics.<n>Several forensic methods have been proposed, they suffer from three critical gaps.<n>These limitations hinder fair comparison, obscure true robustness, and restrict deployment in security-critical applications.
arXiv Detail & Related papers (2025-11-04T18:13:48Z)
ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search [69.60882125603133]
We present ReliabilityRAG, a framework for adversarial robustness that explicitly leverages reliability information of retrieved documents.<n>Our work is a significant step towards more effective, provably robust defenses against retrieved corpus corruption in RAG.
arXiv Detail & Related papers (2025-09-27T22:36:42Z)
ForensicsData: A Digital Forensics Dataset for Large Language Models [0.0]
ForensicsData is an extensive Question-Context-Answer (Q-C-A) dataset sourced from actual malware analysis reports.<n>A unique workflow was used to create the dataset, which extracts structured data.<n> Gemini 2 Flash demonstrated the best performance in aligning generated content with forensic terminology.
arXiv Detail & Related papers (2025-08-31T19:58:24Z)
Propose and Rectify: A Forensics-Driven MLLM Framework for Image Manipulation Localization [49.71303998618939]
This paper presents a novel Propose-Rectify framework that bridges semantic reasoning with forensic-specific analysis.<n>Our framework ensures that initial semantic proposals are systematically validated and enhanced through concrete technical evidence, resulting in comprehensive detection accuracy and localization precision.
arXiv Detail & Related papers (2025-08-25T12:43:53Z)
Deep Learning Models for Robust Facial Liveness Detection [56.08694048252482]
This study introduces a robust solution through novel deep learning models addressing the deficiencies in contemporary anti-spoofing techniques.<n>By innovatively integrating texture analysis and reflective properties associated with genuine human traits, our models distinguish authentic presence from replicas with remarkable precision.
arXiv Detail & Related papers (2025-08-12T17:19:20Z)
Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation [96.78845113346809]
Retrieval-augmented language models (RALMs) have shown strong performance and wide applicability in knowledge-intensive tasks. This paper proposes SynCheck, a lightweight monitor that leverages fine-grained decoding dynamics to detect unfaithful sentences. We also introduce FOD, a faithfulness-oriented decoding algorithm guided by beam search for long-form retrieval-augmented generation.
arXiv Detail & Related papers (2024-06-19T16:42:57Z)
Forensicability Assessment of Questioned Images in Recapturing Detection [78.45849869266834]
We propose a forensicability assessment network to quantify the forensicability of the questioned samples. The low-forensicability samples are rejected before the actual recapturing detection process to improve the efficiency of recapturing systems. We integrate the trained FANet with practical recapturing detection schemes in face anti-spoofing and recaptured document detection tasks.
arXiv Detail & Related papers (2022-09-05T12:26:01Z)
Hierarchical Decision Ensembles- An inferential framework for uncertain Human-AI collaboration in forensic examinations [0.8122270502556371]
We present an inferential framework for assessing the model and its output. The framework is designed to calibrate trust in forensic experts by bridging the gap between domain specific knowledge and predictive model results.
arXiv Detail & Related papers (2021-10-31T08:07:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.