Related papers: Behavior Tokens Speak Louder: Disentangled Explainable Recommendation with Behavior Vocabulary

Behavior Tokens Speak Louder: Disentangled Explainable Recommendation with Behavior Vocabulary

URL: http://arxiv.org/abs/2512.15614v1
Date: Wed, 17 Dec 2025 17:24:24 GMT
Title: Behavior Tokens Speak Louder: Disentangled Explainable Recommendation with Behavior Vocabulary
Authors: Xinshun Feng, Mingzhe Liu, Yi Qiao, Tongyu Zhu, Leilei Sun, Shuai Wang,
Abstract summary: BEAT is a framework that tokenizes user and item behaviors into discrete, interpretable sequences.<n>We show that BEAT improves zero-shot recommendation performance while generating coherent and informative explanations.
Score: 22.925582428795437
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in explainable recommendations have explored the integration of language models to analyze natural language rationales for user-item interactions. Despite their potential, existing methods often rely on ID-based representations that obscure semantic meaning and impose structural constraints on language models, thereby limiting their applicability in open-ended scenarios. These challenges are intensified by the complex nature of real-world interactions, where diverse user intents are entangled and collaborative signals rarely align with linguistic semantics. To overcome these limitations, we propose BEAT, a unified and transferable framework that tokenizes user and item behaviors into discrete, interpretable sequences. We construct a behavior vocabulary via a vector-quantized autoencoding process that disentangles macro-level interests and micro-level intentions from graph-based representations. We then introduce multi-level semantic supervision to bridge the gap between behavioral signals and language space. A semantic alignment regularization mechanism is designed to embed behavior tokens directly into the input space of frozen language models. Experiments on three public datasets show that BEAT improves zero-shot recommendation performance while generating coherent and informative explanations. Further analysis demonstrates that our behavior tokens capture fine-grained semantics and offer a plug-and-play interface for integrating complex behavior patterns into large language models.

Related papers

SignVLA: A Gloss-Free Vision-Language-Action Framework for Real-Time Sign Language-Guided Robotic Manipulation [1.4175612723267692]
We present the first sign language-driven Vision-Language-Action (VLA) framework for intuitive human-robot interaction.<n>Unlike conventional approaches that rely on gloss annotations as intermediate supervision, the proposed system adopts a gloss-free paradigm.<n>We focus on a real-time alphabet-level finger-spelling interface that provides a robust and low-latency communication channel for robotic control.
arXiv Detail & Related papers (2026-02-26T01:16:27Z)
Stable Language Guidance for Vision-Language-Action Models [62.80963701282789]
Residual Semantic Steering is a probabilistic framework that disentangles physical affordance from semantic execution.<n> RSS achieves state-of-the-art robustness, maintaining performance even under adversarial linguistic perturbations.
arXiv Detail & Related papers (2026-01-07T16:16:10Z)
Language-Guided Grasp Detection with Coarse-to-Fine Learning for Robotic Manipulation [31.386822229629455]
We propose Language-Guided Grasp Detection (LGGD) with a coarse-to-fine learning paradigm for robotic manipulation.<n>This design enables fine-grained visual-semantic alignment and improves the feasibility of the predicted grasps with respect to task instructions.<n>Experiments on the OCID-VLG and Grasp-Anything++ datasets show that LGGD surpasses existing language-guided grasping methods.
arXiv Detail & Related papers (2025-12-24T09:16:42Z)
Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability [31.30541946703775]
Translating internal representations and computations of models into concepts that humans can understand is a key goal of interpretability.<n>Recent dictionary learning methods such as Sparse Autoencoders provide a promising route to discover human-interpretable features.<n>But they exhibit a bias towards shallow, token-specific, or noisy features, such as "the phrase 'The' at the start of sentences"
arXiv Detail & Related papers (2025-10-30T17:59:30Z)
SimStep: Chain-of-Abstractions for Incremental Specification and Debugging of AI-Generated Interactive Simulations [16.00479720281197]
Chain-of-Abstractions (CoA) is a way to recover programming's core affordances.<n>CoA decomposes the synthesis process into a sequence of cognitively meaningful, task-aligned representations.<n>SimStep is an authoring environment for teachers that scaffolds simulation creation through four intermediate abstractions.
arXiv Detail & Related papers (2025-07-13T14:54:17Z)
CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity [23.77040677368575]
We introduce a novel robotic manipulation framework that can accomplish tasks specified by potentially ambiguous natural language.<n>This framework employs a Vision-Language Model (VLM) to interpret abstract concepts in natural language instructions.<n>We show that our approach excels across challenging manipulation tasks involving language ambiguity, contact-rich manipulation, and multi-object interactions.
arXiv Detail & Related papers (2025-06-19T23:42:03Z)
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models [55.19497659895122]
We introduce methods for discovering and applying sparse feature circuits.<n>These are causally implicatedworks of human-interpretable features for explaining language model behaviors.
arXiv Detail & Related papers (2024-03-28T17:56:07Z)
Understanding Before Recommendation: Semantic Aspect-Aware Review Exploitation via Large Language Models [53.337728969143086]
Recommendation systems harness user-item interactions like clicks and reviews to learn their representations. Previous studies improve recommendation accuracy and interpretability by modeling user preferences across various aspects and intents. We introduce a chain-based prompting approach to uncover semantic aspect-aware interactions.
arXiv Detail & Related papers (2023-12-26T15:44:09Z)
Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning [69.1137074774244]
Leveraging language interactions effectively requires addressing limitations in the two most common approaches to language grounding. We introduce the idea of neural abstructions: a set of constraints on the inference procedure of a label-conditioned generative model. We show that with this method a user population is able to build a semantic modification for an open-ended house task in Minecraft.
arXiv Detail & Related papers (2021-07-20T07:01:15Z)
Unsupervised Speech Representation Learning for Behavior Modeling using Triplet Enhanced Contextualized Networks [28.957236790411585]
We exploit the stationary properties of human behavior within an interaction and present a representation learning method to capture behavioral information from speech. We present an encoder-decoder based Deep Contextualized Network (DCN) as well as a Triplet-Enhanced DCN (TE-DCN) framework to capture the behavioral context.
arXiv Detail & Related papers (2021-04-01T22:44:23Z)
Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models. We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)
Inferring Temporal Compositions of Actions Using Probabilistic Automata [61.09176771931052]
We propose to express temporal compositions of actions as semantic regular expressions and derive an inference framework using probabilistic automata. Our approach is different from existing works that either predict long-range complex activities as unordered sets of atomic actions, or retrieve videos using natural language sentences.
arXiv Detail & Related papers (2020-04-28T00:15:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.