Related papers: Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control

Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control

URL: http://arxiv.org/abs/2507.12202v1
Date: Wed, 16 Jul 2025 12:57:43 GMT
Title: Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control
Authors: Anton Klenitskiy, Konstantin Polev, Daria Denisova, Alexey Vasilev, Dmitry Simakov, Gleb Gusev,
Abstract summary: sparse autoencoders (SAE) have been shown to be a promising unsupervised approach for extracting interpretable features from language models.<n>We show that SAE can be successfully applied to the transformer trained on a sequential recommendation task.<n>We demonstrate that the features learned by SAE can be used to effectively and flexibly control the model's behavior.
Score: 5.045834768309381
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many current state-of-the-art models for sequential recommendations are based on transformer architectures. Interpretation and explanation of such black box models is an important research question, as a better understanding of their internals can help understand, influence, and control their behavior, which is very important in a variety of real-world applications. Recently sparse autoencoders (SAE) have been shown to be a promising unsupervised approach for extracting interpretable features from language models. These autoencoders learn to reconstruct hidden states of the transformer's internal layers from sparse linear combinations of directions in their activation space. This paper is focused on the application of SAE to the sequential recommendation domain. We show that this approach can be successfully applied to the transformer trained on a sequential recommendation task: learned directions turn out to be more interpretable and monosemantic than the original hidden state dimensions. Moreover, we demonstrate that the features learned by SAE can be used to effectively and flexibly control the model's behavior, providing end-users with a straightforward method to adjust their recommendations to different custom scenarios and contexts.

Related papers

Recurrent Expansion: A Pathway Toward the Next Generation of Deep Learning [0.26107298043931204]
Recurrent Expansion (RE) is a new learning paradigm that advances beyond conventional Machine Learning (ML) and Deep Learning (DL)<n>RE emphasizes multiple mappings of data through identical deep architectures and analyzes their internal representations (i.e., feature maps) in conjunction with observed performance signals such as loss.<n>A scalable and adaptive variant, Sc-HMVRE, introduces selective mechanisms and scale diversity for real-world deployment.
arXiv Detail & Related papers (2025-07-04T19:26:48Z)
Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [55.914891182214475]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z)
Interpret the Internal States of Recommendation Model with Sparse Autoencoder [28.234859617081295]
RecSAE is an automated and generalizable probing framework that interprets Recommenders with Sparse AutoEncoder.<n>It extracts interpretable latents from the internal states of recommendation models and links them to semantic concepts for interpretation.<n> RecSAE does not alter original models during interpretation and also enables targeted de-biasing to models based on interpreted results.
arXiv Detail & Related papers (2024-11-09T08:22:31Z)
Breaking Determinism: Fuzzy Modeling of Sequential Recommendation Using Discrete State Space Diffusion Model [66.91323540178739]
Sequential recommendation (SR) aims to predict items that users may be interested in based on their historical behavior. We revisit SR from a novel information-theoretic perspective and find that sequential modeling methods fail to adequately capture randomness and unpredictability of user behavior. Inspired by fuzzy information processing theory, this paper introduces the fuzzy sets of interaction sequences to overcome the limitations and better capture the evolution of users' real interests.
arXiv Detail & Related papers (2024-10-31T14:52:01Z)
Causality-Enhanced Behavior Sequence Modeling in LLMs for Personalized Recommendation [47.29682938439268]
We propose a novel Counterfactual Fine-Tuning (CFT) method to improve user preference modeling. We employ counterfactual reasoning to identify the causal effects of behavior sequences on model output. Experiments on real-world datasets demonstrate that CFT effectively improves behavior sequence modeling.
arXiv Detail & Related papers (2024-10-30T08:41:13Z)
Attention Mechanisms Don't Learn Additive Models: Rethinking Feature Importance for Transformers [12.986126243018452]
transformers are structurally incapable of representing linear or additive surrogate models used for feature attribution.<n>We introduce the Softmax-Linked Additive Log Odds Model (SLALOM), a novel surrogate model specifically designed to align with the transformer framework.<n>We highlight SLALOM's unique efficiency-quality curve by showing that SLALOM can produce explanations with substantially higher fidelity than competing surrogate models.
arXiv Detail & Related papers (2024-05-22T11:14:00Z)
RecExplainer: Aligning Large Language Models for Explaining Recommendation Models [50.74181089742969]
Large language models (LLMs) have demonstrated remarkable intelligence in understanding, reasoning, and instruction following. This paper presents the initial exploration of using LLMs as surrogate models to explain black-box recommender models. To facilitate an effective alignment, we introduce three methods: behavior alignment, intention alignment, and hybrid alignment.
arXiv Detail & Related papers (2023-11-18T03:05:43Z)
Reformulating Sequential Recommendation: Learning Dynamic User Interest with Content-enriched Language Modeling [18.297332953450514]
We propose LANCER, which leverages the semantic understanding capabilities of pre-trained language models to generate personalized recommendations. Our approach bridges the gap between language models and recommender systems, resulting in more human-like recommendations.
arXiv Detail & Related papers (2023-09-19T08:54:47Z)
MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation [61.45986275328629]
We propose MISSRec, a multi-modal pre-training and transfer learning framework for sequential recommendation. On the user side, we design a Transformer-based encoder-decoder model, where the contextual encoder learns to capture the sequence-level multi-modal user interests. On the candidate item side, we adopt a dynamic fusion module to produce user-adaptive item representation.
arXiv Detail & Related papers (2023-08-22T04:06:56Z)
Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z)
Towards Universal Sequence Representation Learning for Recommender Systems [98.02154164251846]
We present a novel universal sequence representation learning approach, named UniSRec. The proposed approach utilizes the associated description text of items to learn transferable representations across different recommendation scenarios. Our approach can be effectively transferred to new recommendation domains or platforms in a parameter-efficient way.
arXiv Detail & Related papers (2022-06-13T07:21:56Z)
InteL-VAEs: Adding Inductive Biases to Variational Auto-Encoders via Intermediary Latents [60.785317191131284]
We introduce a simple and effective method for learning VAEs with controllable biases by using an intermediary set of latent variables. In particular, it allows us to impose desired properties like sparsity or clustering on learned representations. We show that this, in turn, allows InteL-VAEs to learn both better generative models and representations.
arXiv Detail & Related papers (2021-06-25T16:34:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.