Related papers: Interpretable AI for Time-Series: Multi-Model Heatmap Fusion with Global Attention and NLP-Generated Explanations

Interpretable AI for Time-Series: Multi-Model Heatmap Fusion with Global Attention and NLP-Generated Explanations

URL: http://arxiv.org/abs/2507.00234v1
Date: Mon, 30 Jun 2025 20:04:35 GMT
Title: Interpretable AI for Time-Series: Multi-Model Heatmap Fusion with Global Attention and NLP-Generated Explanations
Authors: Jiztom Kavalakkatt Francis, Matthew J Darr,
Abstract summary: We present a novel framework for enhancing model interpretability by integrating heatmaps produced by ResNet and a restructured 2D Transformer with globally weighted input saliency.<n>Our method merges gradient-weighted activation maps (ResNet) and Transformer attention rollout into a unified visualization, achieving full spatial-temporal alignment.<n> Empirical evaluations on clinical (ECG arrhythmia detection) and industrial datasets demonstrate significant improvements.
Score: 1.331812695405053
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we present a novel framework for enhancing model interpretability by integrating heatmaps produced separately by ResNet and a restructured 2D Transformer with globally weighted input saliency. We address the critical problem of spatial-temporal misalignment in existing interpretability methods, where convolutional networks fail to capture global context and Transformers lack localized precision - a limitation that impedes actionable insights in safety-critical domains like healthcare and industrial monitoring. Our method merges gradient-weighted activation maps (ResNet) and Transformer attention rollout into a unified visualization, achieving full spatial-temporal alignment while preserving real-time performance. Empirical evaluations on clinical (ECG arrhythmia detection) and industrial (energy consumption prediction) datasets demonstrate significant improvements: the hybrid framework achieves 94.1% accuracy (F1 0.93) on the PhysioNet dataset and reduces regression error to RMSE = 0.28 kWh (R2 = 0.95) on the UCI Energy Appliance dataset-outperforming standalone ResNet, Transformer, and InceptionTime baselines by 3.8-12.4%. An NLP module translates fused heatmaps into domain-specific narratives (e.g., "Elevated ST-segment between 2-4 seconds suggests myocardial ischemia"), validated via BLEU-4 (0.586) and ROUGE-L (0.650) scores. By formalizing interpretability as causal fidelity and spatial-temporal alignment, our approach bridges the gap between technical outputs and stakeholder understanding, offering a scalable solution for transparent, time-aware decision-making.

Related papers

FUTransUNet-GradCAM: A Hybrid Transformer-U-Net with Self-Attention and Explainable Visualizations for Foot Ulcer Segmentation [0.0]
Automated segmentation of diabetic foot ulcers (DFUs) plays a critical role in clinical diagnosis, therapeutic planning, and longitudinal wound monitoring.<n>Traditional convolutional neural networks (CNNs) provide strong localization capabilities but struggle to model long-range spatial dependencies.<n>We propose FUTransUNet, a hybrid architecture that integrates the global attention mechanism of Vision Transformers (ViTs) into the U-Net framework.
arXiv Detail & Related papers (2025-08-04T11:05:14Z)
Large Language Models for Automating Clinical Data Standardization: HL7 FHIR Use Case [0.2516393111664279]
We introduce a semi-automated approach to convert structured clinical datasets into HL7 FHIR format.<n>In an initial benchmark, resource identification achieved a perfect F1-score, with GPT-4o outperforming Llama 3.2.<n>Error analysis revealed occasional hallucinations of non-existent attributes and mismatches in granularity, which more detailed prompts can mitigate.
arXiv Detail & Related papers (2025-07-03T17:32:57Z)
ReconMOST: Multi-Layer Sea Temperature Reconstruction with Observations-Guided Diffusion [48.540756751934836]
ReconMOST is a data-driven guided diffusion model framework for multi-layer sea temperature reconstruction.<n>Our method extends ML-based SST reconstruction to a global, multi-layer setting, handling over 92.5% missing data.
arXiv Detail & Related papers (2025-06-12T06:27:22Z)
Graph-Based Fault Diagnosis for Rotating Machinery: Adaptive Segmentation and Structural Feature Integration [0.0]
This paper proposes a graph-based framework for robust and interpretable multiclass fault diagnosis in rotating machinery.<n>It integrates entropy-optimized signal segmentation, time-frequency feature extraction, and graph-theoretic modeling to transform vibration signals into structured representations.<n>The proposed method achieves high diagnostic accuracy when evaluated on two benchmark datasets.
arXiv Detail & Related papers (2025-04-29T13:34:52Z)
Memory-efficient Low-latency Remote Photoplethysmography through Temporal-Spatial State Space Duality [15.714133129768323]
ME-r is a memory-efficient algorithm built on temporal-spatial state space duality.<n>It efficiently captures subtle periodic variations across facial frames while maintaining minimal computational overhead.<n>Our solution enables real-time inference with only 3.6 MB memory usage and 9.46 ms latency.
arXiv Detail & Related papers (2025-04-02T14:34:04Z)
An Interpretable Implicit-Based Approach for Modeling Local Spatial Effects: A Case Study of Global Gross Primary Productivity [9.352810748734157]
In Earth sciences, unobserved factors exhibit non-stationary distributions, causing the relationships between features and targets to display spatial heterogeneity.<n>In geographic machine learning tasks, conventional statistical learning methods often struggle to capture spatial heterogeneity.<n>We propose a novel perspective - that is, simultaneously modeling common features across different locations alongside spatial differences using deep neural networks.
arXiv Detail & Related papers (2025-02-10T05:44:54Z)
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction [77.8576094863446]
We propose a new detextbfCoupled dutextbfAl-interactive lineatextbfR atttextbfEntion (CARE) mechanism. We first propose an asymmetrical feature decoupling strategy that asymmetrically decouples the learning process for local inductive bias and long-range dependencies. By adopting a decoupled learning way and fully exploiting complementarity across features, our method can achieve both high efficiency and accuracy.
arXiv Detail & Related papers (2024-11-25T07:56:13Z)
Cross Space and Time: A Spatio-Temporal Unitized Model for Traffic Flow Forecasting [16.782154479264126]
Predicting backbone-temporal traffic flow presents challenges due to complex interactions between temporal factors. Existing approaches address these dimensions in isolation, neglecting their critical interdependencies. In this paper, we introduce Sanonymous-Temporal Unitized Unitized Cell (ASTUC), a unified framework designed to capture both spatial and temporal dependencies.
arXiv Detail & Related papers (2024-11-14T07:34:31Z)
Upscaling Global Hourly GPP with Temporal Fusion Transformer (TFT) [0.0]
Gross Primary Productivity is crucial for evaluating climate change initiatives. Estimates are currently only available from sparsely distributed eddy covariance tower sites. This research explored a novel upscaling solution using Temporal Fusion Transformer (TFT)
arXiv Detail & Related papers (2023-06-23T23:29:05Z)
Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation [53.04781510348416]
Video-based 3D human pose and shape estimations are evaluated by intra-frame accuracy and inter-frame smoothness. We propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT) Our GLoT surpasses previous state-of-the-art methods with the lowest model parameters on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M.
arXiv Detail & Related papers (2023-03-26T14:57:49Z)
Inertial Hallucinations -- When Wearable Inertial Devices Start Seeing Things [82.15959827765325]
We propose a novel approach to multimodal sensor fusion for Ambient Assisted Living (AAL) We address two major shortcomings of standard multimodal approaches, limited area coverage and reduced reliability. Our new framework fuses the concept of modality hallucination with triplet learning to train a model with different modalities to handle missing sensors at inference time.
arXiv Detail & Related papers (2022-07-14T10:04:18Z)
Federated Learning for Energy-limited Wireless Networks: A Partial Model Aggregation Approach [79.59560136273917]
limited communication resources, bandwidth and energy, and data heterogeneity across devices are main bottlenecks for federated learning (FL) We first devise a novel FL framework with partial model aggregation (PMA) The proposed PMA-FL improves 2.72% and 11.6% accuracy on two typical heterogeneous datasets.
arXiv Detail & Related papers (2022-04-20T19:09:52Z)
Speaker Representation Learning using Global Context Guided Channel and Time-Frequency Transformations [67.18006078950337]
We use the global context information to enhance important channels and recalibrate salient time-frequency locations. The proposed modules, together with a popular ResNet based model, are evaluated on the VoxCeleb1 dataset.
arXiv Detail & Related papers (2020-09-02T01:07:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.