Related papers: Query as Anchor: Scenario-Adaptive User Representation via Large Language Model

Query as Anchor: Scenario-Adaptive User Representation via Large Language Model

URL: http://arxiv.org/abs/2602.14492v2
Date: Tue, 17 Feb 2026 02:44:08 GMT
Title: Query as Anchor: Scenario-Adaptive User Representation via Large Language Model
Authors: Jiahao Yuan, Yike Xu, Jinyong Wen, Baokun Wang, Ziyi Gao, Xiaotong Lin, Yun Liu, Xing Fu, Yu Cheng, Yongchao Liu, Weiqiang Wang, Zhongle Xie,
Abstract summary: We propose Query-as-Anchor, a framework shifting user modeling from static encoding to dynamic, query-aware synthesis.<n>We first construct UserU, an industrial-scale pre-training dataset that aligns behavioral sequences with user understanding semantics.<n>We introduce Cluster-based Soft Prompt Tuning to enforce discriminative latent structures.<n>For deployment, anchoring queries at sequence termini enables KV-cache-accelerated inference with negligible incremental latency.
Score: 28.30329175937291
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Industrial-scale user representation learning requires balancing robust universality with acute task-sensitivity. However, existing paradigms primarily yield static, task-agnostic embeddings that struggle to reconcile the divergent requirements of downstream scenarios within unified vector spaces. Furthermore, heterogeneous multi-source data introduces inherent noise and modality conflicts, degrading representation. We propose Query-as-Anchor, a framework shifting user modeling from static encoding to dynamic, query-aware synthesis. To empower Large Language Models (LLMs) with deep user understanding, we first construct UserU, an industrial-scale pre-training dataset that aligns multi-modal behavioral sequences with user understanding semantics, and our Q-Anchor Embedding architecture integrates hierarchical coarse-to-fine encoders into dual-tower LLMs via joint contrastive-autoregressive optimization for query-aware user representation. To bridge the gap between general pre-training and specialized business logic, we further introduce Cluster-based Soft Prompt Tuning to enforce discriminative latent structures, effectively aligning model attention with scenario-specific modalities. For deployment, anchoring queries at sequence termini enables KV-cache-accelerated inference with negligible incremental latency. Evaluations on 10 Alipay industrial benchmarks show consistent SOTA performance, strong scalability, and efficient deployment. Large-scale online A/B testing in Alipay's production system across two real-world scenarios further validates its practical effectiveness. Our code is prepared for public release and will be available at: https://github.com/JhCircle/Q-Anchor.

Related papers

Operationalization of Machine Learning with Serverless Architecture: An Industrial Operationalization of Machine Learning with Serverless Architecture: An Industrial Implementation for Harmonized System Code Prediction [0.0]
This paper presents a serverless MLOps framework orchestrating the complete ML lifecycle from data ingestion, training, deployment, monitoring, and retraining to using event-driven pipelines and managed services.<n>We demonstrate practical applicability through an industrial implementation for Harmonized System (HS) code prediction, a compliance-critical task where short, unstructured product descriptions are mapped to standardized codes used by customs authorities in global trade.<n>Our solution uses a custom text embedding multiple deep learning architectures, with Text-CNN achieving 98 percent accuracy on ground truth data.
arXiv Detail & Related papers (2026-02-19T05:59:55Z)
OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models [57.94189874119267]
Multi-Agent Systems (MAS) offer a powerful paradigm for solving complex problems.<n>Current graph learning-based design methodologies often adhere to a "one-for-one" paradigm.<n>We propose OFA-TAD, a one-for-all framework that generates adaptive collaboration graphs for any task described in natural language.
arXiv Detail & Related papers (2026-01-19T12:23:44Z)
A Simple and Effective Framework for Symmetric Consistent Indexing in Large-Scale Dense Retrieval [11.72564658353791]
Dense retrieval has become the industry standard in large-scale information retrieval systems due to its high efficiency and competitive accuracy.<n>The widely adopted dual-tower encoding architecture introduces inherent challenges, primarily representational space misalignment and retrieval index inconsistency.<n>This paper proposes a simple and effective framework named SCI comprising two synergistic modules.<n>We provide theoretical guarantees for our approach, with its effectiveness validated by results across public datasets and real-world e-commerce datasets.
arXiv Detail & Related papers (2025-12-15T08:11:24Z)
Leveraging Generative Models for Real-Time Query-Driven Text Summarization in Large-Scale Web Search [54.987957691350665]
Query-Driven Text Summarization (QDTS) aims to generate concise and informative summaries from textual documents based on a given query.<n>Traditional extractive summarization models, based primarily on ranking candidate summary segments, have been the dominant approach in industrial applications.<n>We propose a novel framework to pioneer the application of generative models to address real-time QDTS in industrial web search.
arXiv Detail & Related papers (2025-08-28T08:51:51Z)
GLiClass: Generalist Lightweight Model for Sequence Classification Tasks [49.2639069781367]
We propose GLiClass, a novel method that adapts the GLiNER architecture for sequence classification tasks.<n>Our approach achieves strong accuracy and efficiency comparable to embedding-based methods, while maintaining the flexibility needed for zero-shot and few-shot learning scenarios.
arXiv Detail & Related papers (2025-08-11T06:22:25Z)
Learning Unified User Quantized Tokenizers for User Representation [32.942924291891636]
U2QT (Unified User Quantized Tokenizers) is a novel framework that integrates cross-domain knowledge transfer with early fusion of heterogeneous domains.<n>Our framework employs a two-stage architecture: first, we use the Qwen3 Embedding model to derive a compact yet expressive feature representation.<n>Second, a multi-view RQ-VAE discretizes causal embeddings into compact tokens through shared and source-specific codebooks.
arXiv Detail & Related papers (2025-08-01T08:35:32Z)
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition [95.54406667705999]
Pangu Embedded is an efficient Large Language Model (LLM) reasoner developed on Ascend Neural Processing Units (NPUs)<n>It addresses the significant computational costs and inference latency challenges prevalent in existing reasoning-optimized LLMs.<n>It delivers rapid responses and state-of-the-art reasoning quality within a single, unified model architecture.
arXiv Detail & Related papers (2025-05-28T14:03:02Z)
SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models [21.933379266533098]
Large Language Models (LLMs) present a critical trade-off between inference quality and computational cost.<n>Existing serving strategies often employ fixed model scales or static two-stage speculative decoding.<n>This paper introduces systemname, a novel framework that reimagines LLM inference as an adaptive routing problem.
arXiv Detail & Related papers (2025-05-12T15:46:28Z)
LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders [23.70714095931094]
Long-sequence optimized traNsformer for GPU-Efficient Recommenders.<n>Longer consistently outperforms strong baselines in offline metrics and online A/B testing.
arXiv Detail & Related papers (2025-05-07T13:54:26Z)
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z)
CART: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
Cross-modal retrieval aims to search for instances, which are semantically related to the query through the interaction of different modal data.<n>Traditional solutions utilize a single-tower or dual-tower framework to explicitly compute the score between queries and candidates.<n>We propose a generative cross-modal retrieval framework (CART) based on coarse-to-fine semantic modeling.
arXiv Detail & Related papers (2024-06-25T12:47:04Z)
X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented Compositional Semantic Parsing [51.81533991497547]
Task-oriented compositional semantic parsing (TCSP) handles complex nested user queries. We present X2 compared a transferable Cross-lingual and Cross-domain for TCSP. We propose to predict flattened intents and slots representations separately and cast both prediction tasks into sequence labeling problems.
arXiv Detail & Related papers (2021-06-07T16:40:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.