Related papers: InsightX Agent: An LMM-based Agentic Framework with Integrated Tools for Reliable X-ray NDT Analysis

InsightX Agent: An LMM-based Agentic Framework with Integrated Tools for Reliable X-ray NDT Analysis

URL: http://arxiv.org/abs/2507.14899v2
Date: Mon, 18 Aug 2025 07:15:10 GMT
Title: InsightX Agent: An LMM-based Agentic Framework with Integrated Tools for Reliable X-ray NDT Analysis
Authors: Jiale Liu, Huan Wang, Yue Zhang, Xiaoyu Luo, Jiaxiang Hu, Zhiliang Liu, Min Xie,
Abstract summary: Non-destructive testing (NDT) is vital for industrial quality assurance.<n>Existing deep-learning-based approaches often lack interactivity, interpretability, and the capacity for critical self-assessment.<n>This paper proposes InsightX Agent, a novel LMM-based agentic framework designed to deliver reliable, interpretable, and interactive NDT analysis.
Score: 16.686848727476644
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Non-destructive testing (NDT), particularly X-ray inspection, is vital for industrial quality assurance, yet existing deep-learning-based approaches often lack interactivity, interpretability, and the capacity for critical self-assessment, limiting their reliability and operator trust. To address these shortcomings, this paper proposes InsightX Agent, a novel LMM-based agentic framework designed to deliver reliable, interpretable, and interactive X-ray NDT analysis. Unlike typical sequential pipelines, InsightX Agent positions a Large Multimodal Model (LMM) as a central orchestrator, coordinating between the Sparse Deformable Multi-Scale Detector (SDMSD) and the Evidence-Grounded Reflection (EGR) tool. The SDMSD generates dense defect region proposals for multi-scale feature maps and sparsifies them through Non-Maximum Suppression (NMS), optimizing detection of small, dense targets in X-ray images while maintaining computational efficiency. The EGR tool guides the LMM agent through a chain-of-thought-inspired review process, incorporating context assessment, individual defect analysis, false positive elimination, confidence recalibration and quality assurance to validate and refine the SDMSD's initial proposals. By strategically employing and intelligently using tools, InsightX Agent moves beyond passive data processing to active reasoning, enhancing diagnostic reliability and providing interpretations that integrate diverse information sources. Experimental evaluations on the GDXray+ dataset demonstrate that InsightX Agent not only achieves a high object detection F1-score of 96.35% but also offers significantly improved interpretability and trustworthiness in its analyses, highlighting the transformative potential of agentic LLM frameworks for industrial inspection tasks.

Related papers

AgentTrace: A Structured Logging Framework for Agent System Observability [0.0]
AgentTrace is a dynamic observability and telemetry framework designed to fill this gap.<n>Unlike traditional logging systems, AgentTrace emphasizes continuous, introspectable trace capture.<n>Our research highlights how AgentTrace can enable more reliable agent deployment, fine-grained risk analysis, and informed trust calibration.
arXiv Detail & Related papers (2026-02-07T04:04:59Z)
Agentic Confidence Calibration [67.50096917021521]
Holistic Trajectory (HTC) is a novel diagnostic framework for AI agents.<n>HTC consistently surpasses strong baselines in both calibration and discrimination.<n>HTC provides interpretability by revealing the signals behind failure.
arXiv Detail & Related papers (2026-01-22T09:08:25Z)
Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection [76.91230292971115]
Large language model (LLM)-based multi-agent systems (MAS) have shown strong capabilities in solving complex tasks.<n>XG-Guard is an explainable and fine-grained safeguarding framework for detecting malicious agents in MAS.
arXiv Detail & Related papers (2025-12-21T13:46:36Z)
LLM-YOLOMS: Large Language Model-based Semantic Interpretation and Fault Diagnosis for Wind Turbine Components [5.383947139043873]
This study proposes an integrated framework that combines YOLOMS with a large language model (LLM) for intelligent fault analysis and diagnosis.<n> Specifically, YOLOMS employs multi-scale detection and sliding-window cropping to enhance fault feature extraction.<n>This module converts YOLOMS detection results into structured textual representations enriched with both qualitative and quantitative attributes.
arXiv Detail & Related papers (2025-11-13T15:14:34Z)
CXRAgent: Director-Orchestrated Multi-Stage Reasoning for Chest X-Ray Interpretation [62.0150409256153]
We propose CXRAgent, a director-orchestrated, multi-stage agent for CXR interpretation.<n>The agent strategically orchestrates a set of CXR-analysis tools, with outputs normalized and verified by the Evidence-driven Validator.<n>Experiments on various CXR interpretation tasks show that CXRAgent delivers strong performance, providing visual evidence and generalizes well to clinical tasks of different complexity.
arXiv Detail & Related papers (2025-10-24T10:31:30Z)
MAVUL: Multi-Agent Vulnerability Detection via Contextual Reasoning and Interactive Refinement [9.377934769326416]
MAVUL is a novel multi-agent vulnerability detection system that integrates contextual reasoning and interactive refinement.<n>We show MAVUL significantly outperforms existing multi-agent systems with over 62% higher pairwise accuracy and single-agent systems with over 600% higher average performance.
arXiv Detail & Related papers (2025-09-30T22:21:43Z)
Rethinking Evaluation of Infrared Small Target Detection [105.59753496831739]
This paper introduces a hybrid-level metric incorporating pixel- and target-level performance, proposing a systematic error analysis method, and emphasizing the importance of cross-dataset evaluation.<n>An open-source toolkit has be released to facilitate standardized benchmarking.
arXiv Detail & Related papers (2025-09-21T02:45:07Z)
Diagnostics of cognitive failures in multi-agent expert systems using dynamic evaluation protocols and subsequent mutation of the processing context [0.0]
This work introduces a diagnostic framework for expert systems that not only evaluates but also facilitates the transfer of expert behaviour into LLM-powered agents.<n>We demonstrate the framework on a multi-agent recruiter-assistant system, showing that it uncovers latent cognitive failures.
arXiv Detail & Related papers (2025-09-18T19:08:03Z)
DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models [60.713908578319256]
We propose Direct Discrepancy Learning (DDL) to optimize the detector with task-oriented knowledge.<n>Built upon this, we introduce DetectAnyLLM, a unified detection framework that achieves state-of-the-art MGTD performance.<n>MIRAGE samples human-written texts from 10 corpora across 5 text-domains, which are then re-generated or revised using 17 cutting-edge LLMs.
arXiv Detail & Related papers (2025-09-15T10:59:57Z)
A Large Language Model-Empowered Agent for Reliable and Robust Structural Analysis [14.754785659805869]
Large language models (LLMs) have exhibited remarkable capabilities across diverse open-domain tasks, yet their application in specialized domains such as civil engineering remains largely unexplored.<n>This paper starts bridging this gap by evaluating and enhancing the reliability and robustness of LLMs in structural analysis of beams.<n> Experimental results demonstrate that the agent achieves accuracy exceeding 99.0% on the benchmark dataset, exhibiting reliable and robust performance across diverse conditions.
arXiv Detail & Related papers (2025-06-27T04:16:53Z)
RadFabric: Agentic AI System with Reasoning Capability for Radiology [61.25593938175618]
RadFabric is a multi agent, multimodal reasoning framework that unifies visual and textual analysis for comprehensive CXR interpretation.<n>System employs specialized CXR agents for pathology detection, an Anatomical Interpretation Agent to map visual findings to precise anatomical structures, and a Reasoning Agent powered by large multimodal reasoning models to synthesize visual, anatomical, and clinical data into transparent and evidence based diagnoses.
arXiv Detail & Related papers (2025-06-17T03:10:33Z)
Agent-based Condition Monitoring Assistance with Multimodal Industrial Database Retrieval Augmented Generation [3.8451399765175016]
Condition monitoring (CM) plays a crucial role in ensuring reliability and efficiency in the process industry.<n>This work integrates large language model (LLM)-based reasoning agents with CM to address analyst and industry needs.<n>We propose MindRAG, a modular framework combining multimodal retrieval-augmented generation (RAG) with novel vector store structures designed specifically for CM data.
arXiv Detail & Related papers (2025-06-10T21:04:18Z)
Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework [0.0]
We propose a comprehensive framework for evaluating the quality of clinical data extracted by large language models (LLMs)<n>The framework integrates variable-level performance benchmarking against expert human abstraction, automated verification checks for internal consistency and plausibility, and replication analyses.<n>This multidimensional approach enables the identification of variables most in need of improvement, systematic detection of latent errors, and confirmation of dataset fitness-for-purpose in real-world research.
arXiv Detail & Related papers (2025-06-09T20:59:16Z)
IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis [60.32962597618861]
IDA-Bench is a novel benchmark evaluating large language models in multi-round interactive scenarios.<n>Agent performance is judged by comparing its final numerical output to the human-derived baseline.<n>Even state-of-the-art coding agents (like Claude-3.7-thinking) succeed on 50% of the tasks, highlighting limitations not evident in single-turn tests.
arXiv Detail & Related papers (2025-05-23T09:37:52Z)
CBM-RAG: Demonstrating Enhanced Interpretability in Radiology Report Generation with Multi-Agent RAG and Concept Bottleneck Models [1.7042756021131187]
This paper presents an automated radiology report generation framework that combines Concept Bottleneck Models (CBMs) with a Multi-Agent Retrieval-Augmented Generation (RAG) system.<n>CBMs map chest X-ray features to human-understandable clinical concepts, enabling transparent disease classification.<n>RAG system integrates multi-agent collaboration and external knowledge to produce contextually rich, evidence-based reports.
arXiv Detail & Related papers (2025-04-29T16:14:55Z)
AlignRAG: Leveraging Critique Learning for Evidence-Sensitive Retrieval-Augmented Reasoning [61.28113271728859]
RAG has become a widely adopted paradigm for enabling knowledge-grounded large language models (LLMs)<n>Standard RAG pipelines often fail to ensure that model reasoning remains consistent with the evidence retrieved, leading to factual inconsistencies or unsupported conclusions.<n>In this work, we reinterpret RAG as Retrieval-Augmented Reasoning and identify a central but underexplored problem: textitReasoning Misalignment.
arXiv Detail & Related papers (2025-04-21T04:56:47Z)
Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling [48.15636223774418]
Large language models (LLMs) are prone to hallucination stemming from misaligned self-awareness.<n>We propose the Explicit Knowledge Boundary Modeling framework to integrate fast and slow reasoning systems to harmonize reliability and usability.
arXiv Detail & Related papers (2025-03-04T03:16:02Z)
Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance. Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z)
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents [74.16170899755281]
We introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents.<n>AgentBoard offers a fine-grained progress rate metric that captures incremental advancements as well as a comprehensive evaluation toolkit.<n>This not only sheds light on the capabilities and limitations of LLM agents but also propels the interpretability of their performance to the forefront.
arXiv Detail & Related papers (2024-01-24T01:51:00Z)
Word-Level ASR Quality Estimation for Efficient Corpus Sampling and Post-Editing through Analyzing Attentions of a Reference-Free Metric [5.592917884093537]
The potential of quality estimation (QE) metrics is introduced and evaluated as a novel tool to enhance explainable artificial intelligence (XAI) in ASR systems. The capabilities of the NoRefER metric are explored in identifying word-level errors to aid post-editors in refining ASR hypotheses.
arXiv Detail & Related papers (2024-01-20T16:48:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.