Grad-ELLM: Gradient-based Explanations for Decoder-only LLMs
- URL: http://arxiv.org/abs/2601.03089v1
- Date: Tue, 06 Jan 2026 15:22:39 GMT
- Title: Grad-ELLM: Gradient-based Explanations for Decoder-only LLMs
- Authors: Xin Huang, Antoni B. Chan,
- Abstract summary: Grad-ELLM is a gradient-based attribution method for decoder-only transformer-based Large Language Models.<n>We introduce two faithfulneses metrics $$-Soft-NC and $$-Soft-NS, which provide fairer comparisons.<n>Experiment results show that Grad-ELLM consistently achieves superior faithfulness than other attribution methods.
- Score: 52.15785423211181
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse tasks, yet their black-box nature raises concerns about transparency and faithfulness. Input attribution methods aim to highlight each input token's contributions to the model's output, but existing approaches are typically model-agnostic, and do not focus on transformer-specific architectures, leading to limited faithfulness. To address this, we propose Grad-ELLM, a gradient-based attribution method for decoder-only transformer-based LLMs. By aggregating channel importance from gradients of the output logit with respect to attention layers and spatial importance from attention maps, Grad-ELLM generates heatmaps at each generation step without requiring architectural modifications. Additionally, we introduce two faithfulneses metrics $π$-Soft-NC and $π$-Soft-NS, which are modifications of Soft-NC/NS that provide fairer comparisons by controlling the amount of information kept when perturbing the text. We evaluate Grad-ELLM on sentiment classification, question answering, and open-generation tasks using different models. Experiment results show that Grad-ELLM consistently achieves superior faithfulness than other attribution methods.
Related papers
- GradAlign: Gradient-Aligned Data Selection for LLM Reinforcement Learning [55.03441672267886]
We propose GradAlign, a gradient-aligned data selection method for reinforcement learning.<n>We evaluate GradAlign across three data regimes: unreliable reward signals, distribution imbalance, and low-utility training corpus.
arXiv Detail & Related papers (2026-02-25T01:54:50Z) - GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs [56.93583799109029]
GrAInS is an inference-time steering approach that operates across both language-only and vision-language models and tasks.<n>During inference, GrAInS hidden activations at transformer layers guided by token-level attribution signals, and normalizes activations to preserve representational scale.<n>It consistently outperforms both fine-tuning and existing steering baselines.
arXiv Detail & Related papers (2025-07-24T02:34:13Z) - GradMetaNet: An Equivariant Architecture for Learning on Gradients [27.271084807773107]
We introduce GradMetaNet, a novel architecture for learning on gradients.<n>We prove results for GradMetaNet, and show that previous approaches cannot approximate natural gradient-based functions.<n>We then demonstrate GradMetaNet's effectiveness on a diverse set of gradient-based tasks.
arXiv Detail & Related papers (2025-07-02T12:22:39Z) - Can Gradient Descent Simulate Prompting? [56.60154660021178]
gradient updates the effects of conditioning on new information.<n> gradient descent training recovers some (and occasionally all) of prompted model performance.<n>Results suggest new avenues for long-context modeling.
arXiv Detail & Related papers (2025-06-26T04:06:20Z) - LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer Attributions [17.88069510398486]
gradient-based explanations struggle with Transformers, and how can we improve them?
We identify flow imbalances in Transformers that violate FullGrad-completeness, a critical property for attribution gradient that CNNs naturally possess.
We introduce LibraGrad -- a theoretically grounded post-hoc approach that corrects gradient imbalances through pruning and scaling of backward paths.
arXiv Detail & Related papers (2024-11-24T15:02:52Z) - Classifier-guided Gradient Modulation for Enhanced Multimodal Learning [50.7008456698935]
Gradient-Guided Modulation (CGGM) is a novel method to balance multimodal learning with gradients.
We conduct extensive experiments on four multimodal datasets: UPMC-Food 101, CMU-MOSI, IEMOCAP and BraTS.
CGGM outperforms all the baselines and other state-of-the-art methods consistently.
arXiv Detail & Related papers (2024-11-03T02:38:43Z) - Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [56.00251589760559]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.<n>We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.<n>Experiments indicate that LM-GC surpasses existing state-of-the-art lossless compression methods.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations [30.9134119244757]
FUNGI is a method to enhance the features of transformer encoders by leveraging self-supervised gradients.
Our method is simple: given any pretrained model, we first compute gradients from various self-supervised objectives for each input.
The resulting features are evaluated on k-nearest neighbor classification over 11 datasets from vision, 5 from natural language processing, and 2 from audio.
arXiv Detail & Related papers (2024-07-15T17:58:42Z) - Enhancing Large Language Model Performance with Gradient-Based Parameter Selection [32.88329156118533]
Gradient-Mask Tuning (GMT) is a method that selectively updates parameters during training based on their gradient information.<n>Our empirical results across various tasks demonstrate that GMT not only outperforms traditional fine-tuning methods but also elevates the upper limits of LLM performance.
arXiv Detail & Related papers (2024-06-21T17:42:52Z) - Contextual Gradient Scaling for Few-Shot Learning [24.19934081878197]
We propose contextual gradient scaling (CxGrad) for model-agnostic meta-learning (MAML)
CxGrad scales gradient norms of the backbone to facilitate learning task-specific knowledge in the inner-loop.
Experimental results show that CxGrad effectively encourages the backbone to learn task-specific knowledge in the inner-loop.
arXiv Detail & Related papers (2021-10-20T03:05:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.