Behavior Tokens Speak Louder: Disentangled Explainable Recommendation with Behavior Vocabulary
- URL: http://arxiv.org/abs/2512.15614v1
- Date: Wed, 17 Dec 2025 17:24:24 GMT
- Title: Behavior Tokens Speak Louder: Disentangled Explainable Recommendation with Behavior Vocabulary
- Authors: Xinshun Feng, Mingzhe Liu, Yi Qiao, Tongyu Zhu, Leilei Sun, Shuai Wang,
- Abstract summary: BEAT is a framework that tokenizes user and item behaviors into discrete, interpretable sequences.<n>We show that BEAT improves zero-shot recommendation performance while generating coherent and informative explanations.
- Score: 22.925582428795437
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in explainable recommendations have explored the integration of language models to analyze natural language rationales for user-item interactions. Despite their potential, existing methods often rely on ID-based representations that obscure semantic meaning and impose structural constraints on language models, thereby limiting their applicability in open-ended scenarios. These challenges are intensified by the complex nature of real-world interactions, where diverse user intents are entangled and collaborative signals rarely align with linguistic semantics. To overcome these limitations, we propose BEAT, a unified and transferable framework that tokenizes user and item behaviors into discrete, interpretable sequences. We construct a behavior vocabulary via a vector-quantized autoencoding process that disentangles macro-level interests and micro-level intentions from graph-based representations. We then introduce multi-level semantic supervision to bridge the gap between behavioral signals and language space. A semantic alignment regularization mechanism is designed to embed behavior tokens directly into the input space of frozen language models. Experiments on three public datasets show that BEAT improves zero-shot recommendation performance while generating coherent and informative explanations. Further analysis demonstrates that our behavior tokens capture fine-grained semantics and offer a plug-and-play interface for integrating complex behavior patterns into large language models.
Related papers
- SignVLA: A Gloss-Free Vision-Language-Action Framework for Real-Time Sign Language-Guided Robotic Manipulation [1.4175612723267692]
We present the first sign language-driven Vision-Language-Action (VLA) framework for intuitive human-robot interaction.<n>Unlike conventional approaches that rely on gloss annotations as intermediate supervision, the proposed system adopts a gloss-free paradigm.<n>We focus on a real-time alphabet-level finger-spelling interface that provides a robust and low-latency communication channel for robotic control.
arXiv Detail & Related papers (2026-02-26T01:16:27Z) - Stable Language Guidance for Vision-Language-Action Models [62.80963701282789]
Residual Semantic Steering is a probabilistic framework that disentangles physical affordance from semantic execution.<n> RSS achieves state-of-the-art robustness, maintaining performance even under adversarial linguistic perturbations.
arXiv Detail & Related papers (2026-01-07T16:16:10Z) - Language-Guided Grasp Detection with Coarse-to-Fine Learning for Robotic Manipulation [31.386822229629455]
We propose Language-Guided Grasp Detection (LGGD) with a coarse-to-fine learning paradigm for robotic manipulation.<n>This design enables fine-grained visual-semantic alignment and improves the feasibility of the predicted grasps with respect to task instructions.<n>Experiments on the OCID-VLG and Grasp-Anything++ datasets show that LGGD surpasses existing language-guided grasping methods.
arXiv Detail & Related papers (2025-12-24T09:16:42Z) - Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability [31.30541946703775]
Translating internal representations and computations of models into concepts that humans can understand is a key goal of interpretability.<n>Recent dictionary learning methods such as Sparse Autoencoders provide a promising route to discover human-interpretable features.<n>But they exhibit a bias towards shallow, token-specific, or noisy features, such as "the phrase 'The' at the start of sentences"
arXiv Detail & Related papers (2025-10-30T17:59:30Z) - SimStep: Chain-of-Abstractions for Incremental Specification and Debugging of AI-Generated Interactive Simulations [16.00479720281197]
Chain-of-Abstractions (CoA) is a way to recover programming's core affordances.<n>CoA decomposes the synthesis process into a sequence of cognitively meaningful, task-aligned representations.<n>SimStep is an authoring environment for teachers that scaffolds simulation creation through four intermediate abstractions.
arXiv Detail & Related papers (2025-07-13T14:54:17Z) - CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity [23.77040677368575]
We introduce a novel robotic manipulation framework that can accomplish tasks specified by potentially ambiguous natural language.<n>This framework employs a Vision-Language Model (VLM) to interpret abstract concepts in natural language instructions.<n>We show that our approach excels across challenging manipulation tasks involving language ambiguity, contact-rich manipulation, and multi-object interactions.
arXiv Detail & Related papers (2025-06-19T23:42:03Z) - Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models [55.19497659895122]
We introduce methods for discovering and applying sparse feature circuits.<n>These are causally implicatedworks of human-interpretable features for explaining language model behaviors.
arXiv Detail & Related papers (2024-03-28T17:56:07Z) - Understanding Before Recommendation: Semantic Aspect-Aware Review Exploitation via Large Language Models [53.337728969143086]
Recommendation systems harness user-item interactions like clicks and reviews to learn their representations.
Previous studies improve recommendation accuracy and interpretability by modeling user preferences across various aspects and intents.
We introduce a chain-based prompting approach to uncover semantic aspect-aware interactions.
arXiv Detail & Related papers (2023-12-26T15:44:09Z) - Neural Abstructions: Abstractions that Support Construction for Grounded
Language Learning [69.1137074774244]
Leveraging language interactions effectively requires addressing limitations in the two most common approaches to language grounding.
We introduce the idea of neural abstructions: a set of constraints on the inference procedure of a label-conditioned generative model.
We show that with this method a user population is able to build a semantic modification for an open-ended house task in Minecraft.
arXiv Detail & Related papers (2021-07-20T07:01:15Z) - Unsupervised Speech Representation Learning for Behavior Modeling using
Triplet Enhanced Contextualized Networks [28.957236790411585]
We exploit the stationary properties of human behavior within an interaction and present a representation learning method to capture behavioral information from speech.
We present an encoder-decoder based Deep Contextualized Network (DCN) as well as a Triplet-Enhanced DCN (TE-DCN) framework to capture the behavioral context.
arXiv Detail & Related papers (2021-04-01T22:44:23Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Inferring Temporal Compositions of Actions Using Probabilistic Automata [61.09176771931052]
We propose to express temporal compositions of actions as semantic regular expressions and derive an inference framework using probabilistic automata.
Our approach is different from existing works that either predict long-range complex activities as unordered sets of atomic actions, or retrieve videos using natural language sentences.
arXiv Detail & Related papers (2020-04-28T00:15:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.