Towards Better Query Classification with Multi-Expert Knowledge
Condensation in JD Ads Search
- URL: http://arxiv.org/abs/2308.01098v3
- Date: Sun, 19 Nov 2023 14:42:39 GMT
- Title: Towards Better Query Classification with Multi-Expert Knowledge
Condensation in JD Ads Search
- Authors: Kun-Peng Ning, Ming Pang, Zheng Fang, Xue Jiang, Xi-Wei Zhao,
Chang-Ping Peng, Zhan-Gang Lin, Jing-He Hu, Jing-Ping Shao
- Abstract summary: shallow model FastText is widely used for efficient online inference.
BERT is an effective solution, but it will cause a higher online inference latency and more expensive computing costs.
We propose knowledge condensation to boost the classification performance of the online FastText model under strict low latency constraints.
- Score: 12.701416688678622
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Search query classification, as an effective way to understand user intents,
is of great importance in real-world online ads systems. To ensure a lower
latency, a shallow model (e.g. FastText) is widely used for efficient online
inference. However, the representation ability of the FastText model is
insufficient, resulting in poor classification performance, especially on some
low-frequency queries and tailed categories. Using a deeper and more complex
model (e.g. BERT) is an effective solution, but it will cause a higher online
inference latency and more expensive computing costs. Thus, how to juggle both
inference efficiency and classification performance is obviously of great
practical importance. To overcome this challenge, in this paper, we propose
knowledge condensation (KC), a simple yet effective knowledge distillation
framework to boost the classification performance of the online FastText model
under strict low latency constraints. Specifically, we propose to train an
offline BERT model to retrieve more potentially relevant data. Benefiting from
its powerful semantic representation, more relevant labels not exposed in the
historical data will be added into the training set for better FastText model
training. Moreover, a novel distribution-diverse multi-expert learning strategy
is proposed to further improve the mining ability of relevant data. By training
multiple BERT models from different data distributions, it can respectively
perform better at high, middle, and low-frequency search queries. The model
ensemble from multi-distribution makes its retrieval ability more powerful. We
have deployed two versions of this framework in JD search, and both offline
experiments and online A/B testing from multiple datasets have validated the
effectiveness of the proposed approach.
Related papers
- Improving embedding with contrastive fine-tuning on small datasets with expert-augmented scores [12.86467344792873]
The proposed method uses soft labels derived from expert-augmented scores to fine-tune embedding models.
The paper evaluates the method using a Q&A dataset from an online shopping website and eight expert models.
arXiv Detail & Related papers (2024-08-19T01:59:25Z) - CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning [4.004641316826348]
We introduce a novel language-image Contrastive Learning method with an Efficient large language model and prompt Fine-Tuning (CLEFT)
Our method demonstrates state-of-the-art performance on multiple chest X-ray and mammography datasets.
The proposed parameter efficient framework can reduce the total trainable model size by 39% and reduce the trainable language model to only 4% compared with the current BERT encoder.
arXiv Detail & Related papers (2024-07-30T17:57:32Z) - Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-Commerce [31.076432176267335]
We propose deep Bag-of-Words (DeepBoW) model, an efficient and interpretable relevance architecture for Chinese e-commerce.
Our approach proposes to encode the query and the product into the sparse BoW representation, which is a set of word-weight pairs.
The relevance score is measured by the accumulation of the matched word between the sparse BoW representation of the query and the product.
arXiv Detail & Related papers (2024-07-12T16:18:05Z) - CELA: Cost-Efficient Language Model Alignment for CTR Prediction [71.85120354973073]
Click-Through Rate (CTR) prediction holds a paramount position in recommender systems.
Recent efforts have sought to mitigate these challenges by integrating Pre-trained Language Models (PLMs)
We propose textbfCost-textbfEfficient textbfLanguage Model textbfAlignment (textbfCELA) for CTR prediction.
arXiv Detail & Related papers (2024-05-17T07:43:25Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation [7.056222499095849]
beam search seeks the transcript with the greatest likelihood computed using the predicted distribution.
We show that recently proposed Self-Supervised Learning (SSL)-based ASR models tend to yield exceptionally confident predictions.
We propose a decoding procedure that improves the performance of fine-tuned ASR models.
arXiv Detail & Related papers (2022-12-27T06:42:26Z) - Prompt-driven efficient Open-set Semi-supervised Learning [52.30303262499391]
Open-set semi-supervised learning (OSSL) has attracted growing interest, which investigates a more practical scenario where out-of-distribution (OOD) samples are only contained in unlabeled data.
We propose a prompt-driven efficient OSSL framework, called OpenPrompt, which can propagate class information from labeled to unlabeled data with only a small number of trainable parameters.
arXiv Detail & Related papers (2022-09-28T16:25:08Z) - Evaluating BERT-based Pre-training Language Models for Detecting
Misinformation [2.1915057426589746]
It is challenging to control the quality of online information due to the lack of supervision over all the information posted online.
There is a need for automated rumour detection techniques to limit the adverse effects of spreading misinformation.
This study proposes the BERT-based pre-trained language models to encode text data into vectors and utilise neural network models to classify these vectors to detect misinformation.
arXiv Detail & Related papers (2022-03-15T08:54:36Z) - Fast Class-wise Updating for Online Hashing [196.14748396106955]
This paper presents a novel supervised online hashing scheme, termed Fast Class-wise Updating for Online Hashing (FCOH)
A class-wise updating method is developed to decompose the binary code learning and alternatively renew the hash functions in a class-wise fashion, which well addresses the burden on large amounts of training batches.
To further achieve online efficiency, we propose a semi-relaxation optimization, which accelerates the online training by treating different binary constraints independently.
arXiv Detail & Related papers (2020-12-01T07:41:54Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z) - AliExpress Learning-To-Rank: Maximizing Online Model Performance without
Going Online [60.887637616379926]
This paper proposes an evaluator-generator framework for learning-to-rank.
It consists of an evaluator that generalizes to evaluate recommendations involving the context, and a generator that maximizes the evaluator score by reinforcement learning.
Our method achieves a significant improvement in terms of Conversion Rate (CR) over the industrial-level fine-tuned model in online A/B tests.
arXiv Detail & Related papers (2020-03-25T10:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.