Towards Better Query Classification with Multi-Expert Knowledge
Condensation in JD Ads Search
- URL: http://arxiv.org/abs/2308.01098v3
- Date: Sun, 19 Nov 2023 14:42:39 GMT
- Title: Towards Better Query Classification with Multi-Expert Knowledge
Condensation in JD Ads Search
- Authors: Kun-Peng Ning, Ming Pang, Zheng Fang, Xue Jiang, Xi-Wei Zhao,
Chang-Ping Peng, Zhan-Gang Lin, Jing-He Hu, Jing-Ping Shao
- Abstract summary: shallow model FastText is widely used for efficient online inference.
BERT is an effective solution, but it will cause a higher online inference latency and more expensive computing costs.
We propose knowledge condensation to boost the classification performance of the online FastText model under strict low latency constraints.
- Score: 12.701416688678622
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Search query classification, as an effective way to understand user intents,
is of great importance in real-world online ads systems. To ensure a lower
latency, a shallow model (e.g. FastText) is widely used for efficient online
inference. However, the representation ability of the FastText model is
insufficient, resulting in poor classification performance, especially on some
low-frequency queries and tailed categories. Using a deeper and more complex
model (e.g. BERT) is an effective solution, but it will cause a higher online
inference latency and more expensive computing costs. Thus, how to juggle both
inference efficiency and classification performance is obviously of great
practical importance. To overcome this challenge, in this paper, we propose
knowledge condensation (KC), a simple yet effective knowledge distillation
framework to boost the classification performance of the online FastText model
under strict low latency constraints. Specifically, we propose to train an
offline BERT model to retrieve more potentially relevant data. Benefiting from
its powerful semantic representation, more relevant labels not exposed in the
historical data will be added into the training set for better FastText model
training. Moreover, a novel distribution-diverse multi-expert learning strategy
is proposed to further improve the mining ability of relevant data. By training
multiple BERT models from different data distributions, it can respectively
perform better at high, middle, and low-frequency search queries. The model
ensemble from multi-distribution makes its retrieval ability more powerful. We
have deployed two versions of this framework in JD search, and both offline
experiments and online A/B testing from multiple datasets have validated the
effectiveness of the proposed approach.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.