Bootstrapping Named Entity Recognition in E-Commerce with Positive
Unlabeled Learning
- URL: http://arxiv.org/abs/2005.11075v1
- Date: Fri, 22 May 2020 09:35:30 GMT
- Title: Bootstrapping Named Entity Recognition in E-Commerce with Positive
Unlabeled Learning
- Authors: Hanchu Zhang, Leonhard Hennig, Christoph Alt, Changjian Hu, Yao Meng,
Chao Wang
- Abstract summary: We present a bootstrapped positive-unlabeled learning algorithm that integrates domain-specific linguistic features to quickly and efficiently expand the seed dictionary.
The model achieves an average F1 score of 72.02% on a novel dataset of product descriptions, an improvement of 3.63% over a baseline BiLSTM classifier.
- Score: 13.790883865748004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Named Entity Recognition (NER) in domains like e-commerce is an understudied
problem due to the lack of annotated datasets. Recognizing novel entity types
in this domain, such as products, components, and attributes, is challenging
because of their linguistic complexity and the low coverage of existing
knowledge resources. To address this problem, we present a bootstrapped
positive-unlabeled learning algorithm that integrates domain-specific
linguistic features to quickly and efficiently expand the seed dictionary. The
model achieves an average F1 score of 72.02% on a novel dataset of product
descriptions, an improvement of 3.63% over a baseline BiLSTM classifier, and in
particular exhibits better recall (4.96% on average).
Related papers
- Named Entity Recognition Under Domain Shift via Metric Learning for Life Sciences [55.185456382328674]
We investigate the applicability of transfer learning for enhancing a named entity recognition model.
Our model consists of two stages: 1) entity grouping in the source domain, which incorporates knowledge from annotated events to establish relations between entities, and 2) entity discrimination in the target domain, which relies on pseudo labeling and contrastive learning to enhance discrimination between the entities in the two domains.
arXiv Detail & Related papers (2024-01-19T03:49:28Z) - Enhanced E-Commerce Attribute Extraction: Innovating with Decorative
Relation Correction and LLAMA 2.0-Based Annotation [4.81846973621209]
We propose a pioneering framework that integrates BERT for classification, a Conditional Random Fields (CRFs) layer for attribute value extraction, and Large Language Models (LLMs) for data annotation.
Our approach capitalizes on the robust representation learning of BERT, synergized with the sequence decoding prowess of CRFs, to adeptly identify and extract attribute values.
Our methodology is rigorously validated on various datasets, including Walmart, BestBuy's e-commerce NER dataset, and the CoNLL dataset.
arXiv Detail & Related papers (2023-12-09T08:26:30Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - ANEC: An Amharic Named Entity Corpus and Transformer Based Recognizer [0.0]
We present an Amharic named entity recognition system based on bidirectional long short-term memory with a conditional random fields layer.
Our named entity recognition system achieves an F_1 score of 93%, which is the new state-of-the-art result for Amharic named entity recognition.
arXiv Detail & Related papers (2022-07-02T09:50:37Z) - QUEACO: Borrowing Treasures from Weakly-labeled Behavior Data for Query
Attribute Value Extraction [57.56700153507383]
This paper proposes a unified query attribute value extraction system in e-commerce search named QUEACO.
For the NER phase, QUEACO adopts a novel teacher-student network, where a teacher network that is trained on the strongly-labeled data generates pseudo-labels.
For the AVN phase, we also leverage the weakly-labeled query-to-attribute behavior data to normalize surface form attribute values from queries into canonical forms from products.
arXiv Detail & Related papers (2021-08-19T03:24:23Z) - Biomedical Named Entity Recognition at Scale [6.85316573653194]
We present a single trainable NER model that obtains new state-of-the-art results on seven public biomedical benchmarks.
This model is freely available within a production-grade code base as part of the open-source Spark NLP library.
arXiv Detail & Related papers (2020-11-12T11:10:17Z) - Named Entity Recognition for Social Media Texts with Semantic
Augmentation [70.44281443975554]
Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts.
We propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account.
arXiv Detail & Related papers (2020-10-29T10:06:46Z) - Automatic Validation of Textual Attribute Values in E-commerce Catalog
by Learning with Limited Labeled Data [61.789797281676606]
We propose a novel meta-learning latent variable approach, called MetaBridge.
It can learn transferable knowledge from a subset of categories with limited labeled data.
It can capture the uncertainty of never-seen categories with unlabeled data.
arXiv Detail & Related papers (2020-06-15T21:31:05Z) - Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive
Object Re-ID [55.21702895051287]
Domain adaptive object re-ID aims to transfer the learned knowledge from the labeled source domain to the unlabeled target domain.
We propose a novel self-paced contrastive learning framework with hybrid memory.
Our method outperforms state-of-the-arts on multiple domain adaptation tasks of object re-ID.
arXiv Detail & Related papers (2020-06-04T09:12:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.