Related papers: AtP*: An efficient and scalable method for localizing LLM behaviour to components

AtP*: An efficient and scalable method for localizing LLM behaviour to components

URL: http://arxiv.org/abs/2403.00745v1
Date: Fri, 1 Mar 2024 18:43:51 GMT
Title: AtP*: An efficient and scalable method for localizing LLM behaviour to components
Authors: J\'anos Kram\'ar, Tom Lieberum, Rohin Shah, Neel Nanda (Google DeepMind)
Abstract summary: Attribution Patching (AtP) is a fast gradient-based approximation to Activation Patching. We present the first systematic study of AtP and alternative methods for faster activation patching.
Score: 6.47684348405662
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Activation Patching is a method of directly computing causal attributions of behavior to model components. However, applying it exhaustively requires a sweep with cost scaling linearly in the number of model components, which can be prohibitively expensive for SoTA Large Language Models (LLMs). We investigate Attribution Patching (AtP), a fast gradient-based approximation to Activation Patching and find two classes of failure modes of AtP which lead to significant false negatives. We propose a variant of AtP called AtP*, with two changes to address these failure modes while retaining scalability. We present the first systematic study of AtP and alternative methods for faster activation patching and show that AtP significantly outperforms all other investigated methods, with AtP* providing further significant improvement. Finally, we provide a method to bound the probability of remaining false negatives of AtP* estimates.

Related papers

Improved Methods for Model Pruning and Knowledge Distillation [3.8993503758122663]
MAMA Pruning is a performance optimization technique for large language models like R1 or o3-mini.<n>It effectively reduces model size and computational complexity while maintaining performance comparable to the original unpruned model even at extreme pruned levels.<n>Preliminary experimental results show that our method outperforms and be comparable to state-of-the-art methods across various pruning levels and different downstream computational linguistics tasks.
arXiv Detail & Related papers (2025-05-20T07:53:40Z)
AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse [19.13826316844611]
Diffusion models have demonstrated remarkable success in generative tasks, yet their iterative denoising process results in slow inference. We provide a theoretical understanding by analyzing the denoising process through the second-order Adams-Bashforth method. We propose a novel caching-based acceleration approach for diffusion models, instead of directly reusing cached results.
arXiv Detail & Related papers (2025-04-13T08:29:58Z)
Joint Transmit and Pinching Beamforming for Pinching Antenna Systems (PASS): Optimization-Based or Learning-Based? [89.05848771674773]
A novel antenna system ()-enabled downlink multi-user multiple-input single-output (MISO) framework is proposed. It consists of multiple waveguides, which equip numerous low-cost antennas, named (PAs) The positions of PAs can be reconfigured to both spanning large-scale path and space.
arXiv Detail & Related papers (2025-02-12T18:54:10Z)
FTP: A Fine-grained Token-wise Pruner for Large Language Models via Token Routing [17.01412432658081]
Large language models (LLMs) have demonstrated superior performance across various tasks by adhering to scaling laws. We propose a fine-grained token-wise pruning approach for the LLMs, which presents a learnable router to adaptively identify the less important tokens. Our approach achieves state-of-the-art (SOTA) pruning results, surpassing other existing pruning methods.
arXiv Detail & Related papers (2024-12-16T07:09:46Z)
DPPA: Pruning Method for Large Language Model to Model Merging [39.13317231533299]
We introduce a dual-stage method termed Dynamic Pruning Partition Amplification (DPPA) to tackle the challenge of merging complex fine-tuned models. We show that our method maintains a mere 20% of domain-specific parameters and yet delivers a performance comparable to other methodologies. Our method displays outstanding performance post-pruning, leading to a significant improvement of nearly 20% performance in model merging.
arXiv Detail & Related papers (2024-03-05T09:12:49Z)
Fine-Tuning Language Models with Advantage-Induced Policy Alignment [80.96507425217472]
We propose a novel algorithm for aligning large language models to human preferences. We show that it consistently outperforms PPO in language tasks by a large margin. We also provide a theoretical justification supporting the design of our loss function.
arXiv Detail & Related papers (2023-06-04T01:59:40Z)
Post-Processing Temporal Action Detection [134.26292288193298]
Temporal Action Detection (TAD) methods typically take a pre-processing step in converting an input varying-length video into a fixed-length snippet representation sequence. This pre-processing step would temporally downsample the video, reducing the inference resolution and hampering the detection performance in the original temporal resolution. We introduce a novel model-agnostic post-processing method without model redesign and retraining.
arXiv Detail & Related papers (2022-11-27T19:50:37Z)
An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification [97.28167655721766]
We propose a novel doubly accelerated gradient descent (ADSGD) method for sparsity regularized loss minimization problems. We first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity.
arXiv Detail & Related papers (2022-08-11T22:27:22Z)
Plug-and-Play Few-shot Object Detection with Meta Strategy and Explicit Localization Inference [78.41932738265345]
This paper proposes a plug detector that can accurately detect the objects of novel categories without fine-tuning process. We introduce two explicit inferences into the localization process to reduce its dependence on annotated data. It shows a significant lead in both efficiency, precision, and recall under varied evaluation protocols.
arXiv Detail & Related papers (2021-10-26T03:09:57Z)
AgFlow: Fast Model Selection of Penalized PCA via Implicit Regularization Effects of Gradient Flow [64.81110234990888]
Principal component analysis (PCA) has been widely used as an effective technique for feature extraction and dimension reduction. In the High Dimension Low Sample Size (HDLSS) setting, one may prefer modified principal components, with penalized loadings. We propose Approximated Gradient Flow (AgFlow) as a fast model selection method for penalized PCA.
arXiv Detail & Related papers (2021-10-07T08:57:46Z)
The Effect of Multi-step Methods on Overestimation in Deep Reinforcement Learning [6.181642248900806]
Multi-step (also called n-step) methods in reinforcement learning have been shown to be more efficient than the 1-step method. We show that both MDDPG and MMDDPG are significantly less affected by the overestimation problem than DDPG with 1-step backup. We also discuss the advantages and disadvantages of different ways to do multi-step expansion in order to reduce approximation error.
arXiv Detail & Related papers (2020-06-23T01:35:54Z)
BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model. Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.