Learning Unified User Quantized Tokenizers for User Representation
- URL: http://arxiv.org/abs/2508.00956v1
- Date: Fri, 01 Aug 2025 08:35:32 GMT
- Title: Learning Unified User Quantized Tokenizers for User Representation
- Authors: Chuan He, Yang Chen, Wuliang Huang, Tianyi Zheng, Jianhu Chen, Bin Dou, Yice Luo, Yun Zhu, Baokun Wang, Yongchao Liu, Xing Fu, Yu Cheng, Chuntao Hong, Weiqiang Wang, Xin-Wei Yao,
- Abstract summary: U2QT (Unified User Quantized Tokenizers) is a novel framework that integrates cross-domain knowledge transfer with early fusion of heterogeneous domains.<n>Our framework employs a two-stage architecture: first, a causal Q-Former projects domain-specific features into a shared causal representation space to preserve inter-modality dependencies; second, a multi-view RQ-VAE discretizes causal embeddings into compact tokens through shared and source-specific codebooks, enabling efficient storage while maintaining semantic coherence.
- Score: 33.38662746945411
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-source user representation learning plays a critical role in enabling personalized services on web platforms (e.g., Alipay). While prior works have adopted late-fusion strategies to combine heterogeneous data sources, they suffer from three key limitations: lack of unified representation frameworks, scalability and storage issues in data compression, and inflexible cross-task generalization. To address these challenges, we propose U^2QT (Unified User Quantized Tokenizers), a novel framework that integrates cross-domain knowledge transfer with early fusion of heterogeneous domains. Our framework employs a two-stage architecture: first, a causal Q-Former projects domain-specific features into a shared causal representation space to preserve inter-modality dependencies; second, a multi-view RQ-VAE discretizes causal embeddings into compact tokens through shared and source-specific codebooks, enabling efficient storage while maintaining semantic coherence. Experimental results showcase U^2QT's advantages across diverse downstream tasks, outperforming task-specific baselines in future behavior prediction and recommendation tasks while achieving efficiency gains in storage and computation. The unified tokenization framework enables seamless integration with language models and supports industrial-scale applications.
Related papers
- Edge-Assisted Collaborative Fine-Tuning for Multi-User Personalized Artificial Intelligence Generated Content (AIGC) [38.59865959433328]
Cloud-based solutions aid in computation but often fall short in addressing privacy risks, personalization efficiency, and communication costs.<n>We propose a novel cluster-aware hierarchical federated aggregation framework.<n>We show that the framework achieves accelerated convergence while maintaining practical viability for scalable multi-user personalized AIGC services.
arXiv Detail & Related papers (2025-08-06T06:07:24Z) - PanMatch: Unleashing the Potential of Large Vision Models for Unified Matching Models [80.65273820998875]
We present PanMatch, a versatile foundation model for robust correspondence matching.<n>Our key insight is that any two-frame correspondence matching task can be addressed within a 2D displacement estimation framework.<n>PanMatch achieves multi-task integration by endowing displacement estimation algorithms with unprecedented generalization capabilities.
arXiv Detail & Related papers (2025-07-11T08:18:52Z) - MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents [84.62985963113245]
We introduce MEM1, an end-to-end reinforcement learning framework that enables agents to operate with constant memory across long multi-turn tasks.<n>At each turn, MEM1 updates a compact shared internal state that jointly supports memory consolidation and reasoning.<n>We show that MEM1-7B improves performance by 3.5x while reducing memory usage by 3.7x compared to Qwen2.5-14B-Instruct on a 16-objective multi-hop QA task.
arXiv Detail & Related papers (2025-06-18T19:44:46Z) - Token Communication-Driven Multimodal Large Models in Resource-Constrained Multiuser Networks [7.137830911253685]
multimodal large models pose challenges for deploying intelligent applications at the wireless edge.<n>These constraints manifest as limited bandwidth, computational capacity, and stringent latency requirements.<n>We propose a token communication paradigm that facilitates decentralized proliferations across user devices and edge infrastructure.
arXiv Detail & Related papers (2025-05-06T14:17:05Z) - Federated Fine-Tuning of LLMs: Framework Comparison and Research Directions [59.5243730853157]
Federated learning (FL) provides a privacy-preserving solution for fine-tuning pre-trained large language models (LLMs) using distributed private datasets.<n>This article conducts a comparative analysis of three advanced federated LLM (FedLLM) frameworks that integrate knowledge distillation (KD) and split learning (SL) to mitigate these issues.
arXiv Detail & Related papers (2025-01-08T11:37:06Z) - Learning Multi-Aspect Item Palette: A Semantic Tokenization Framework for Generative Recommendation [55.99632509895994]
We introduce LAMIA, a novel approach for multi-aspect semantic tokenization.<n>Unlike RQ-VAE, which uses a single embedding, LAMIA learns an item palette''--a collection of independent and semantically parallel embeddings.<n>Our results demonstrate significant improvements in recommendation accuracy over existing methods.
arXiv Detail & Related papers (2024-09-11T13:49:48Z) - SPAN: Unlocking Pyramid Representations for Gigapixel Histopathological Images [8.026588319629528]
Whole slide images (WSIs) present fundamental computational challenges due to their gigapixel-scale resolutions and sparse, irregularly distributed informative regions.<n>We propose a novel sparse-native computational framework that preserves exact spatial relationships.<n>We develop Sparse Pyramid Attention Networks (SPAN), incorporating a hierarchical sparse pyramid attention architecture with shifted windows.
arXiv Detail & Related papers (2024-06-13T17:14:30Z) - Scalable Federated Unlearning via Isolated and Coded Sharding [76.12847512410767]
Federated unlearning has emerged as a promising paradigm to erase the client-level data effect.
This paper proposes a scalable federated unlearning framework based on isolated sharding and coded computing.
arXiv Detail & Related papers (2024-01-29T08:41:45Z) - A Unified One-Step Solution for Aspect Sentiment Quad Prediction [3.428123050377681]
Aspect sentiment quad prediction (ASQP) is a challenging yet significant subtask in aspect-based sentiment analysis.
We release two new datasets for ASQP, which contain the following characteristics: larger size, more words per sample, and higher density.
We propose a unified one-step solution for ASQP, namely One-ASQP, to detect the aspect categories and to identify the aspect-opinion-sentiment triplets simultaneously.
arXiv Detail & Related papers (2023-06-07T05:00:01Z) - X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented
Compositional Semantic Parsing [51.81533991497547]
Task-oriented compositional semantic parsing (TCSP) handles complex nested user queries.
We present X2 compared a transferable Cross-lingual and Cross-domain for TCSP.
We propose to predict flattened intents and slots representations separately and cast both prediction tasks into sequence labeling problems.
arXiv Detail & Related papers (2021-06-07T16:40:05Z) - COBRA: Contrastive Bi-Modal Representation Algorithm [43.33840912256077]
We present a novel framework that aims to train two modalities in a joint fashion inspired by Contrastive Predictive Coding (CPC) and Noise Contrastive Estimation (NCE) paradigms.
We empirically show that this framework reduces the modality gap significantly and generates a robust and task agnostic joint-embedding space.
We outperform existing work on four diverse downstream tasks spanning across seven benchmark cross-modal datasets.
arXiv Detail & Related papers (2020-05-07T18:20:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.