Related papers: Towards Better Query Classification with Multi-Expert Knowledge Condensation in JD Ads Search

Towards Better Query Classification with Multi-Expert Knowledge Condensation in JD Ads Search

URL: http://arxiv.org/abs/2308.01098v3
Date: Sun, 19 Nov 2023 14:42:39 GMT
Title: Towards Better Query Classification with Multi-Expert Knowledge Condensation in JD Ads Search
Authors: Kun-Peng Ning, Ming Pang, Zheng Fang, Xue Jiang, Xi-Wei Zhao, Chang-Ping Peng, Zhan-Gang Lin, Jing-He Hu, Jing-Ping Shao
Abstract summary: shallow model FastText is widely used for efficient online inference. BERT is an effective solution, but it will cause a higher online inference latency and more expensive computing costs. We propose knowledge condensation to boost the classification performance of the online FastText model under strict low latency constraints.
Score: 12.701416688678622
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Search query classification, as an effective way to understand user intents, is of great importance in real-world online ads systems. To ensure a lower latency, a shallow model (e.g. FastText) is widely used for efficient online inference. However, the representation ability of the FastText model is insufficient, resulting in poor classification performance, especially on some low-frequency queries and tailed categories. Using a deeper and more complex model (e.g. BERT) is an effective solution, but it will cause a higher online inference latency and more expensive computing costs. Thus, how to juggle both inference efficiency and classification performance is obviously of great practical importance. To overcome this challenge, in this paper, we propose knowledge condensation (KC), a simple yet effective knowledge distillation framework to boost the classification performance of the online FastText model under strict low latency constraints. Specifically, we propose to train an offline BERT model to retrieve more potentially relevant data. Benefiting from its powerful semantic representation, more relevant labels not exposed in the historical data will be added into the training set for better FastText model training. Moreover, a novel distribution-diverse multi-expert learning strategy is proposed to further improve the mining ability of relevant data. By training multiple BERT models from different data distributions, it can respectively perform better at high, middle, and low-frequency search queries. The model ensemble from multi-distribution makes its retrieval ability more powerful. We have deployed two versions of this framework in JD search, and both offline experiments and online A/B testing from multiple datasets have validated the effectiveness of the proposed approach.

Related papers

SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z)
An Enhanced Model-based Approach for Short Text Clustering [58.60681789677676]
Short text clustering has become increasingly important with the popularity of social media like Twitter, Google+, and Facebook.<n>Existing methods can be broadly categorized into two paradigms: topic model-based approaches and deep representation learning-based approaches.<n>We propose a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model (GSDMM), which effectively handles the sparsity and high dimensionality of short texts.<n>Based on several aspects of GSDMM that warrant further refinement, we propose an improved approach, GSDMM+, designed to further optimize its performance.
arXiv Detail & Related papers (2025-07-18T10:07:42Z)
Underlying Semantic Diffusion for Effective and Efficient In-Context Learning [113.4003355229632]
Underlying Semantic Diffusion (US-Diffusion) is an enhanced diffusion model that boosts underlying semantics learning, computational efficiency, and in-context learning capabilities. We present a Feedback-Aided Learning (FAL) framework, which leverages feedback signals to guide the model in capturing semantic details. We also propose a plug-and-play Efficient Sampling Strategy (ESS) for dense sampling at time steps with high-noise levels.
arXiv Detail & Related papers (2025-03-06T03:06:22Z)
DEUCE: Dual-diversity Enhancement and Uncertainty-awareness for Cold-start Active Learning [54.35107462768146]
Cold-start active learning (CSAL) selects valuable instances from an unlabeled dataset for manual annotation. Existing CSAL methods overlook weak classes and hard representative examples, resulting in biased learning. This paper proposes a novel dual-diversity enhancing and uncertainty-aware framework for CSAL.
arXiv Detail & Related papers (2025-02-01T04:00:03Z)
READ: Reinforcement-based Adversarial Learning for Text Classification with Limited Labeled Data [7.152603583363887]
Pre-trained transformer models such as BERT have shown massive gains across many text classification tasks. This paper proposes a method that encapsulates reinforcement learning-based text generation and semi-supervised adversarial learning approaches. Our method READ, Reinforcement-based Adversarial learning, utilizes an unlabeled dataset to generate diverse synthetic text through reinforcement learning.
arXiv Detail & Related papers (2025-01-14T11:39:55Z)
Improving embedding with contrastive fine-tuning on small datasets with expert-augmented scores [12.86467344792873]
The proposed method uses soft labels derived from expert-augmented scores to fine-tune embedding models. The paper evaluates the method using a Q&A dataset from an online shopping website and eight expert models.
arXiv Detail & Related papers (2024-08-19T01:59:25Z)
CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning [4.004641316826348]
We introduce a novel language-image Contrastive Learning method with an Efficient large language model and prompt Fine-Tuning (CLEFT) Our method demonstrates state-of-the-art performance on multiple chest X-ray and mammography datasets. The proposed parameter efficient framework can reduce the total trainable model size by 39% and reduce the trainable language model to only 4% compared with the current BERT encoder.
arXiv Detail & Related papers (2024-07-30T17:57:32Z)
Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-Commerce [31.076432176267335]
We propose deep Bag-of-Words (DeepBoW) model, an efficient and interpretable relevance architecture for Chinese e-commerce. Our approach proposes to encode the query and the product into the sparse BoW representation, which is a set of word-weight pairs. The relevance score is measured by the accumulation of the matched word between the sparse BoW representation of the query and the product.
arXiv Detail & Related papers (2024-07-12T16:18:05Z)
CELA: Cost-Efficient Language Model Alignment for CTR Prediction [71.85120354973073]
Click-Through Rate (CTR) prediction holds a paramount position in recommender systems. Recent efforts have sought to mitigate these challenges by integrating Pre-trained Language Models (PLMs) We propose textbfCost-textbfEfficient textbfLanguage Model textbfAlignment (textbfCELA) for CTR prediction.
arXiv Detail & Related papers (2024-05-17T07:43:25Z)
Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery. Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data. In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z)
RPLKG: Robust Prompt Learning with Knowledge Graph [14.531071492983767]
multimodal pre-trained models like CLIP have significantly boosted performance in various experiments.<n>Existing methods often lack interpretability and impose high computational costs.<n>We propose Robust Prompt Learning with Knowledge Graph (RPLKG) to curate diverse, interpretable prompt sets automatically.
arXiv Detail & Related papers (2023-04-21T08:22:58Z)
Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation [7.056222499095849]
beam search seeks the transcript with the greatest likelihood computed using the predicted distribution. We show that recently proposed Self-Supervised Learning (SSL)-based ASR models tend to yield exceptionally confident predictions. We propose a decoding procedure that improves the performance of fine-tuned ASR models.
arXiv Detail & Related papers (2022-12-27T06:42:26Z)
Prompt-driven efficient Open-set Semi-supervised Learning [52.30303262499391]
Open-set semi-supervised learning (OSSL) has attracted growing interest, which investigates a more practical scenario where out-of-distribution (OOD) samples are only contained in unlabeled data. We propose a prompt-driven efficient OSSL framework, called OpenPrompt, which can propagate class information from labeled to unlabeled data with only a small number of trainable parameters.
arXiv Detail & Related papers (2022-09-28T16:25:08Z)
Evaluating BERT-based Pre-training Language Models for Detecting Misinformation [2.1915057426589746]
It is challenging to control the quality of online information due to the lack of supervision over all the information posted online. There is a need for automated rumour detection techniques to limit the adverse effects of spreading misinformation. This study proposes the BERT-based pre-trained language models to encode text data into vectors and utilise neural network models to classify these vectors to detect misinformation.
arXiv Detail & Related papers (2022-03-15T08:54:36Z)
Fast Class-wise Updating for Online Hashing [196.14748396106955]
This paper presents a novel supervised online hashing scheme, termed Fast Class-wise Updating for Online Hashing (FCOH) A class-wise updating method is developed to decompose the binary code learning and alternatively renew the hash functions in a class-wise fashion, which well addresses the burden on large amounts of training batches. To further achieve online efficiency, we propose a semi-relaxation optimization, which accelerates the online training by treating different binary constraints independently.
arXiv Detail & Related papers (2020-12-01T07:41:54Z)
MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability. We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error. Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z)
AliExpress Learning-To-Rank: Maximizing Online Model Performance without Going Online [60.887637616379926]
This paper proposes an evaluator-generator framework for learning-to-rank. It consists of an evaluator that generalizes to evaluate recommendations involving the context, and a generator that maximizes the evaluator score by reinforcement learning. Our method achieves a significant improvement in terms of Conversion Rate (CR) over the industrial-level fine-tuned model in online A/B tests.
arXiv Detail & Related papers (2020-03-25T10:27:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.