Related papers: Grad-ELLM: Gradient-based Explanations for Decoder-only LLMs

Grad-ELLM: Gradient-based Explanations for Decoder-only LLMs

URL: http://arxiv.org/abs/2601.03089v1
Date: Tue, 06 Jan 2026 15:22:39 GMT
Title: Grad-ELLM: Gradient-based Explanations for Decoder-only LLMs
Authors: Xin Huang, Antoni B. Chan,
Abstract summary: Grad-ELLM is a gradient-based attribution method for decoder-only transformer-based Large Language Models.<n>We introduce two faithfulneses metrics $$-Soft-NC and $$-Soft-NS, which provide fairer comparisons.<n>Experiment results show that Grad-ELLM consistently achieves superior faithfulness than other attribution methods.
Score: 52.15785423211181
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse tasks, yet their black-box nature raises concerns about transparency and faithfulness. Input attribution methods aim to highlight each input token's contributions to the model's output, but existing approaches are typically model-agnostic, and do not focus on transformer-specific architectures, leading to limited faithfulness. To address this, we propose Grad-ELLM, a gradient-based attribution method for decoder-only transformer-based LLMs. By aggregating channel importance from gradients of the output logit with respect to attention layers and spatial importance from attention maps, Grad-ELLM generates heatmaps at each generation step without requiring architectural modifications. Additionally, we introduce two faithfulneses metrics $π$-Soft-NC and $π$-Soft-NS, which are modifications of Soft-NC/NS that provide fairer comparisons by controlling the amount of information kept when perturbing the text. We evaluate Grad-ELLM on sentiment classification, question answering, and open-generation tasks using different models. Experiment results show that Grad-ELLM consistently achieves superior faithfulness than other attribution methods.

Related papers

GradAlign: Gradient-Aligned Data Selection for LLM Reinforcement Learning [55.03441672267886]
We propose GradAlign, a gradient-aligned data selection method for reinforcement learning.<n>We evaluate GradAlign across three data regimes: unreliable reward signals, distribution imbalance, and low-utility training corpus.
arXiv Detail & Related papers (2026-02-25T01:54:50Z)
GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs [56.93583799109029]
GrAInS is an inference-time steering approach that operates across both language-only and vision-language models and tasks.<n>During inference, GrAInS hidden activations at transformer layers guided by token-level attribution signals, and normalizes activations to preserve representational scale.<n>It consistently outperforms both fine-tuning and existing steering baselines.
arXiv Detail & Related papers (2025-07-24T02:34:13Z)
GradMetaNet: An Equivariant Architecture for Learning on Gradients [27.271084807773107]
We introduce GradMetaNet, a novel architecture for learning on gradients.<n>We prove results for GradMetaNet, and show that previous approaches cannot approximate natural gradient-based functions.<n>We then demonstrate GradMetaNet's effectiveness on a diverse set of gradient-based tasks.
arXiv Detail & Related papers (2025-07-02T12:22:39Z)
Can Gradient Descent Simulate Prompting? [56.60154660021178]
gradient updates the effects of conditioning on new information.<n> gradient descent training recovers some (and occasionally all) of prompted model performance.<n>Results suggest new avenues for long-context modeling.
arXiv Detail & Related papers (2025-06-26T04:06:20Z)
LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer Attributions [17.88069510398486]
gradient-based explanations struggle with Transformers, and how can we improve them? We identify flow imbalances in Transformers that violate FullGrad-completeness, a critical property for attribution gradient that CNNs naturally possess. We introduce LibraGrad -- a theoretically grounded post-hoc approach that corrects gradient imbalances through pruning and scaling of backward paths.
arXiv Detail & Related papers (2024-11-24T15:02:52Z)
Classifier-guided Gradient Modulation for Enhanced Multimodal Learning [50.7008456698935]
Gradient-Guided Modulation (CGGM) is a novel method to balance multimodal learning with gradients. We conduct extensive experiments on four multimodal datasets: UPMC-Food 101, CMU-MOSI, IEMOCAP and BraTS. CGGM outperforms all the baselines and other state-of-the-art methods consistently.
arXiv Detail & Related papers (2024-11-03T02:38:43Z)
Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [56.00251589760559]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.<n>We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.<n>Experiments indicate that LM-GC surpasses existing state-of-the-art lossless compression methods.
arXiv Detail & Related papers (2024-09-26T13:38:33Z)
No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations [30.9134119244757]
FUNGI is a method to enhance the features of transformer encoders by leveraging self-supervised gradients. Our method is simple: given any pretrained model, we first compute gradients from various self-supervised objectives for each input. The resulting features are evaluated on k-nearest neighbor classification over 11 datasets from vision, 5 from natural language processing, and 2 from audio.
arXiv Detail & Related papers (2024-07-15T17:58:42Z)
Enhancing Large Language Model Performance with Gradient-Based Parameter Selection [32.88329156118533]
Gradient-Mask Tuning (GMT) is a method that selectively updates parameters during training based on their gradient information.<n>Our empirical results across various tasks demonstrate that GMT not only outperforms traditional fine-tuning methods but also elevates the upper limits of LLM performance.
arXiv Detail & Related papers (2024-06-21T17:42:52Z)
Contextual Gradient Scaling for Few-Shot Learning [24.19934081878197]
We propose contextual gradient scaling (CxGrad) for model-agnostic meta-learning (MAML) CxGrad scales gradient norms of the backbone to facilitate learning task-specific knowledge in the inner-loop. Experimental results show that CxGrad effectively encourages the backbone to learn task-specific knowledge in the inner-loop.
arXiv Detail & Related papers (2021-10-20T03:05:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.