Related papers: ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs

ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs

URL: http://arxiv.org/abs/2505.17495v1
Date: Fri, 23 May 2025 05:44:01 GMT
Title: ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs
Authors: Landon Butler, Abhineet Agarwal, Justin Singh Kang, Yigit Efe Erginbas, Bin Yu, Kannan Ramchandran,
Abstract summary: Large Language Models (LLMs) have achieved remarkable performance by capturing complex interactions between input features.<n>To identify these interactions, most existing approaches require enumerating all possible combinations of features up to a given order.<n>We propose ProxySPEX, an interaction attribution algorithm that fits gradient boosted trees to masked outputs and then extracts the important interactions.
Score: 14.222006330730311
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have achieved remarkable performance by capturing complex interactions between input features. To identify these interactions, most existing approaches require enumerating all possible combinations of features up to a given order, causing them to scale poorly with the number of inputs $n$. Recently, Kang et al. (2025) proposed SPEX, an information-theoretic approach that uses interaction sparsity to scale to $n \approx 10^3$ features. SPEX greatly improves upon prior methods but requires tens of thousands of model inferences, which can be prohibitive for large models. In this paper, we observe that LLM feature interactions are often hierarchical -- higher-order interactions are accompanied by their lower-order subsets -- which enables more efficient discovery. To exploit this hierarchy, we propose ProxySPEX, an interaction attribution algorithm that first fits gradient boosted trees to masked LLM outputs and then extracts the important interactions. Experiments across four challenging high-dimensional datasets show that ProxySPEX more faithfully reconstructs LLM outputs by 20% over marginal attribution approaches while using $10\times$ fewer inferences than SPEX. By accounting for interactions, ProxySPEX identifies features that influence model output over 20% more than those selected by marginal approaches. Further, we apply ProxySPEX to two interpretability tasks. Data attribution, where we identify interactions among CIFAR-10 training samples that influence test predictions, and mechanistic interpretability, where we uncover interactions between attention heads, both within and across layers, on a question-answering task. ProxySPEX identifies interactions that enable more aggressive pruning of heads than marginal approaches.

Related papers

SPEX: Scaling Feature Interaction Explanations for LLMs [22.651273612351346]
Spectral Explainer (SPEX) is a model-agnostic interaction attribution algorithm that efficiently scales to large input lengths.<n>For large inputs, SPEX outperforms marginal attribution methods by up to 20%.<n>For one of our datasets, HotpotQA, SPEX provides interactions that align with human annotations.
arXiv Detail & Related papers (2025-02-19T16:49:55Z)
iLOCO: Distribution-Free Inference for Feature Interactions [4.56754610152086]
We develop a new model-agnostic metric for measuring the importance of pairwise feature interactions.<n>We also introduce an ensemble learning method for calculating the iLOCO metric and confidence intervals.<n>We validate our iLOCO metric and our confidence intervals on both synthetic and real data sets.
arXiv Detail & Related papers (2025-02-10T16:49:46Z)
MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts [92.76662894585809]
We introduce an approach to enhance multimodal models, which we call Multimodal Mixtures of Experts (MMoE) MMoE is able to be applied to various types of models to gain improvement.
arXiv Detail & Related papers (2023-11-16T05:31:21Z)
AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems [112.76941157194544]
We propose AgentCF for simulating user-item interactions in recommender systems through agent-based collaborative filtering. We creatively consider not only users but also items as agents, and develop a collaborative learning approach that optimize both kinds of agents together. Overall, the optimized agents exhibit diverse interaction behaviors within our framework, including user-item, user-user, item-item, and collective interactions.
arXiv Detail & Related papers (2023-10-13T16:37:14Z)
Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images. We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy. Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z)
Boundary-aware Supervoxel-level Iteratively Refined Interactive 3D Image Segmentation with Multi-agent Reinforcement Learning [33.181732857907384]
We propose to model interactive image segmentation with a Markov decision process (MDP) and solve it with reinforcement learning (RL) Considering the large exploration space for voxel-wise prediction, multi-agent reinforcement learning is adopted, where the voxel-level policy is shared among agents. Experimental results on four benchmark datasets have shown that the proposed method significantly outperforms the state-of-the-arts.
arXiv Detail & Related papers (2023-03-19T15:52:56Z)
IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction [73.25645602768158]
IPCC-TP is a novel relevance-aware module based on Incremental Pearson Correlation Coefficient to improve multi-agent interaction modeling. Our module can be conveniently embedded into existing multi-agent prediction methods to extend original motion distribution decoders.
arXiv Detail & Related papers (2023-03-01T15:16:56Z)
Detecting Arbitrary Order Beneficial Feature Interactions for Recommender Systems [15.824220659063046]
HIRS is the first work that directly generates beneficial feature interactions of arbitrary orders. We exploit three properties of beneficial feature interactions, and propose deep-infomax-based methods to guide the interaction generation. Our experimental results show that HIRS outperforms state-of-the-art algorithms by up to 5% in terms of recommendation accuracy.
arXiv Detail & Related papers (2022-06-28T05:27:45Z)
Unlimited Neighborhood Interaction for Heterogeneous Trajectory Prediction [97.40338982628094]
We propose a simple yet effective Unlimited Neighborhood Interaction Network (UNIN) which predicts trajectories of heterogeneous agents in multiply categories. Specifically, the proposed unlimited neighborhood interaction module generates the fused-features of all agents involved in an interaction simultaneously. A hierarchical graph attention module is proposed to obtain category-tocategory interaction and agent-to-agent interaction.
arXiv Detail & Related papers (2021-07-31T13:36:04Z)
Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning [92.05556163518999]
MARL exacerbates matters by imposing various constraints on communication and observability. For value-based methods, it poses challenges in accurately representing the optimal value function. For policy gradient methods, it makes training the critic difficult and exacerbates the problem of the lagging critic. We show that from a learning theory perspective, both problems can be addressed by accurately representing the associated action-value function.
arXiv Detail & Related papers (2021-05-31T23:08:05Z)
Neural Graph Matching based Collaborative Filtering [13.086302251856756]
We identify two different types of attribute interactions, inner and cross interactions. Existing models do not distinguish these two types of attribute interactions. We propose a neural Graph Matching based Collaborative Filtering model (GMCF) Our model outperforms state-of-the-art models.
arXiv Detail & Related papers (2021-05-10T01:51:46Z)
Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding. At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network. With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.