Related papers: An End-to-End Solution for Named Entity Recognition in eCommerce Search

An End-to-End Solution for Named Entity Recognition in eCommerce Search

URL: http://arxiv.org/abs/2012.07553v1
Date: Fri, 11 Dec 2020 04:58:13 GMT
Title: An End-to-End Solution for Named Entity Recognition in eCommerce Search
Authors: Xiang Cheng, Mitchell Bowden, Bhushan Ramesh Bhange, Priyanka Goyal, Thomas Packer, Faizan Javed
Abstract summary: Named entity recognition (NER) is a critical step in modern search query understanding. Recent research shows promising results on shared benchmark NER tasks using deep learning methods. This paper demonstrates an end-to-end solution to address these challenges.
Score: 7.240345005177374
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Named entity recognition (NER) is a critical step in modern search query understanding. In the domain of eCommerce, identifying the key entities, such as brand and product type, can help a search engine retrieve relevant products and therefore offer an engaging shopping experience. Recent research shows promising results on shared benchmark NER tasks using deep learning methods, but there are still unique challenges in the industry regarding domain knowledge, training data, and model production. This paper demonstrates an end-to-end solution to address these challenges. The core of our solution is a novel model training framework "TripleLearn" which iteratively learns from three separate training datasets, instead of one training set as is traditionally done. Using this approach, the best model lifts the F1 score from 69.5 to 93.3 on the holdout test data. In our offline experiments, TripleLearn improved the model performance compared to traditional training approaches which use a single set of training data. Moreover, in the online A/B test, we see significant improvements in user engagement and revenue conversion. The model has been live on homedepot.com for more than 9 months, boosting search conversions and revenue. Beyond our application, this TripleLearn framework, as well as the end-to-end process, is model-independent and problem-independent, so it can be generalized to more industrial applications, especially to the eCommerce industry which has similar data foundations and problems.

Related papers

Offline Learning and Forgetting for Reasoning with Large Language Models [23.384882158333156]
We propose an effective approach that integrates search capabilities directly into the model by fine-tuning it on unpaired successful and failed reasoning paths.<n>Experiments on the challenging Game-of-24 and Countdown reasoning benchmarks show that, replacing CoT-generated data with search-generated data for offline fine-tuning improves success rates by around 23% over inference-time search baselines.<n>Our learning and forgetting objective consistently outperforms both supervised fine-tuning and preference-based methods.
arXiv Detail & Related papers (2025-04-15T16:30:02Z)
Semantic Ads Retrieval at Walmart eCommerce with Language Models Progressively Trained on Multiple Knowledge Domains [6.1008328784394]
We present an end-to-end solution tailored to optimize the ads retrieval system on Walmart.com. Our approach is to pretrain the BERT-like classification model with product category information. It enhances the search relevance metric by up to 16% compared to a baseline DSSM-based model.
arXiv Detail & Related papers (2025-02-13T09:01:34Z)
Vertical Federated Unlearning via Backdoor Certification [15.042986414487922]
VFL offers a novel paradigm in machine learning, enabling distinct entities to train models cooperatively while maintaining data privacy. Recent privacy regulations emphasize an individual's emphright to be forgotten, which necessitates the ability for models to unlearn specific training data. We introduce an innovative modification to traditional VFL by employing a mechanism that inverts the typical learning trajectory with the objective of extracting specific data contributions.
arXiv Detail & Related papers (2024-12-16T06:40:25Z)
ConDa: Fast Federated Unlearning with Contribution Dampening [46.074452659791575]
ConDa is a framework that performs efficient unlearning by tracking down the parameters which affect the global model for each client. We perform experiments on multiple datasets and demonstrate that ConDa is effective to forget a client's data.
arXiv Detail & Related papers (2024-10-05T12:45:35Z)
Continual Learning with Pre-Trained Models: A Survey [61.97613090666247]
Continual Learning aims to overcome the catastrophic forgetting of former knowledge when learning new ones. This paper presents a comprehensive survey of the latest advancements in PTM-based CL.
arXiv Detail & Related papers (2024-01-29T18:27:52Z)
Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box. This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z)
QUADRo: Dataset and Models for QUestion-Answer Database Retrieval [97.84448420852854]
Given a database (DB) of question/answer (q/a) pairs, it is possible to answer a target question by scanning the DB for similar questions. We build a large scale DB of 6.3M q/a pairs, using public questions, and design a new system based on neural IR and a q/a pair reranker. We show that our DB-based approach is competitive with Web-based methods, i.e., a QA system built on top the BING search engine.
arXiv Detail & Related papers (2023-03-30T00:42:07Z)
ZhichunRoad at Amazon KDD Cup 2022: MultiTask Pre-Training for E-Commerce Product Search [4.220439000486713]
We propose a robust multilingual model to improve the quality of search results. In pre-training stage, we adopt mlm task, classification task and contrastive learning task. In fine-tuning stage, we use confident learning, exponential moving average method (EMA), adversarial training (FGM) and regularized dropout strategy (R-Drop)
arXiv Detail & Related papers (2023-01-31T07:31:34Z)
Commonsense Knowledge Salience Evaluation with a Benchmark Dataset in E-commerce [42.726755541409545]
In e-commerce, the salience of commonsense knowledge (CSK) is beneficial for widespread applications such as product search and recommendation. However, many existing CSK collections rank statements solely by confidence scores, and there is no information about which ones are salient from a human perspective. In this work, we define the task of supervised salience evaluation, where given a CSK triple, the model is required to learn whether the triple is salient or not.
arXiv Detail & Related papers (2022-05-22T15:01:23Z)
Winning solutions and post-challenge analyses of the ChaLearn AutoDL challenge 2019 [112.36155380260655]
This paper reports the results and post-challenge analyses of ChaLearn's AutoDL challenge series. Results show that DL methods dominated, though popular Neural Architecture Search (NAS) was impractical. A high level modular organization emerged featuring a "meta-learner", "data ingestor", "model selector", "model/learner", and "evaluator"
arXiv Detail & Related papers (2022-01-11T06:21:18Z)
Ex-Model: Continual Learning from a Stream of Trained Models [12.27992745065497]
We argue that continual learning systems should exploit the availability of compressed information in the form of trained models. We introduce and formalize a new paradigm named "Ex-Model Continual Learning" (ExML), where an agent learns from a sequence of previously trained models instead of raw data.
arXiv Detail & Related papers (2021-12-13T09:46:16Z)
What Stops Learning-based 3D Registration from Working in the Real World? [53.68326201131434]
This work identifies the sources of 3D point cloud registration failures, analyze the reasons behind them, and propose solutions. Ultimately, this translates to a best-practice 3D registration network (BPNet), constituting the first learning-based method able to handle previously-unseen objects in real-world data. Our model generalizes to real data without any fine-tuning, reaching an accuracy of up to 67% on point clouds of unseen objects obtained with a commercial sensor.
arXiv Detail & Related papers (2021-11-19T19:24:27Z)
Imitate TheWorld: A Search Engine Simulation Platform [13.011052642314421]
We build a simulated search engine AESim that can properly give feedback by a well-trained discriminator for generated pages. Different from previous simulation platforms which lose connection with the real world, ours depends on the real data in Search. Our experiments also show AESim can better reflect the online performance of ranking models than classic ranking metrics.
arXiv Detail & Related papers (2021-07-16T03:55:33Z)
Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings. We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data. We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.