Related papers: Bootstrapping Named Entity Recognition in E-Commerce with Positive Unlabeled Learning

Bootstrapping Named Entity Recognition in E-Commerce with Positive Unlabeled Learning

URL: http://arxiv.org/abs/2005.11075v1
Date: Fri, 22 May 2020 09:35:30 GMT
Title: Bootstrapping Named Entity Recognition in E-Commerce with Positive Unlabeled Learning
Authors: Hanchu Zhang, Leonhard Hennig, Christoph Alt, Changjian Hu, Yao Meng, Chao Wang
Abstract summary: We present a bootstrapped positive-unlabeled learning algorithm that integrates domain-specific linguistic features to quickly and efficiently expand the seed dictionary. The model achieves an average F1 score of 72.02% on a novel dataset of product descriptions, an improvement of 3.63% over a baseline BiLSTM classifier.
Score: 13.790883865748004
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Named Entity Recognition (NER) in domains like e-commerce is an understudied problem due to the lack of annotated datasets. Recognizing novel entity types in this domain, such as products, components, and attributes, is challenging because of their linguistic complexity and the low coverage of existing knowledge resources. To address this problem, we present a bootstrapped positive-unlabeled learning algorithm that integrates domain-specific linguistic features to quickly and efficiently expand the seed dictionary. The model achieves an average F1 score of 72.02% on a novel dataset of product descriptions, an improvement of 3.63% over a baseline BiLSTM classifier, and in particular exhibits better recall (4.96% on average).

Related papers

SNaRe: Domain-aware Data Generation for Low-Resource Event Detection [84.82139313614255]
Event Detection is critical for enabling reasoning in highly specialized domains such as biomedicine, law, and epidemiology.<n>We introduce SNaRe, a domain-aware synthetic data generation framework composed of three components: Scout, Narrator, and Refiner.<n>Scout extracts triggers from unlabeled target domain data and curates a high-quality domain-specific trigger list.<n>Narrator, conditioned on these triggers, generates high-quality domain-aligned sentences, and Refiner identifies additional event mentions.
arXiv Detail & Related papers (2025-02-24T18:20:42Z)
Enhancing Disinformation Detection with Explainable AI and Named Entity Replacement [0.1374949083138427]
We show that non-informative elements (e.g., URLs and emoticons) should be pseudo-anonymized before training to avoid models' bias. We evaluate this methodology with internal dataset and external dataset before and after applying extended data preprocessing and named entity replacement. The results show that our proposal enhances on average the performance of a disinformation classification method with external test data in 65.78% without a significant decrease of the internal test performance.
arXiv Detail & Related papers (2025-02-07T12:01:26Z)
Named Entity Recognition Under Domain Shift via Metric Learning for Life Sciences [55.185456382328674]
We investigate the applicability of transfer learning for enhancing a named entity recognition model. Our model consists of two stages: 1) entity grouping in the source domain, which incorporates knowledge from annotated events to establish relations between entities, and 2) entity discrimination in the target domain, which relies on pseudo labeling and contrastive learning to enhance discrimination between the entities in the two domains.
arXiv Detail & Related papers (2024-01-19T03:49:28Z)
Enhanced E-Commerce Attribute Extraction: Innovating with Decorative Relation Correction and LLAMA 2.0-Based Annotation [4.81846973621209]
We propose a pioneering framework that integrates BERT for classification, a Conditional Random Fields (CRFs) layer for attribute value extraction, and Large Language Models (LLMs) for data annotation. Our approach capitalizes on the robust representation learning of BERT, synergized with the sequence decoding prowess of CRFs, to adeptly identify and extract attribute values. Our methodology is rigorously validated on various datasets, including Walmart, BestBuy's e-commerce NER dataset, and the CoNLL dataset.
arXiv Detail & Related papers (2023-12-09T08:26:30Z)
IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps. We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities. Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z)
ANEC: An Amharic Named Entity Corpus and Transformer Based Recognizer [0.0]
We present an Amharic named entity recognition system based on bidirectional long short-term memory with a conditional random fields layer. Our named entity recognition system achieves an F_1 score of 93%, which is the new state-of-the-art result for Amharic named entity recognition.
arXiv Detail & Related papers (2022-07-02T09:50:37Z)
QUEACO: Borrowing Treasures from Weakly-labeled Behavior Data for Query Attribute Value Extraction [57.56700153507383]
This paper proposes a unified query attribute value extraction system in e-commerce search named QUEACO. For the NER phase, QUEACO adopts a novel teacher-student network, where a teacher network that is trained on the strongly-labeled data generates pseudo-labels. For the AVN phase, we also leverage the weakly-labeled query-to-attribute behavior data to normalize surface form attribute values from queries into canonical forms from products.
arXiv Detail & Related papers (2021-08-19T03:24:23Z)
Biomedical Named Entity Recognition at Scale [6.85316573653194]
We present a single trainable NER model that obtains new state-of-the-art results on seven public biomedical benchmarks. This model is freely available within a production-grade code base as part of the open-source Spark NLP library.
arXiv Detail & Related papers (2020-11-12T11:10:17Z)
Named Entity Recognition for Social Media Texts with Semantic Augmentation [70.44281443975554]
Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts. We propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account.
arXiv Detail & Related papers (2020-10-29T10:06:46Z)
Automatic Validation of Textual Attribute Values in E-commerce Catalog by Learning with Limited Labeled Data [61.789797281676606]
We propose a novel meta-learning latent variable approach, called MetaBridge. It can learn transferable knowledge from a subset of categories with limited labeled data. It can capture the uncertainty of never-seen categories with unlabeled data.
arXiv Detail & Related papers (2020-06-15T21:31:05Z)
Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID [55.21702895051287]
Domain adaptive object re-ID aims to transfer the learned knowledge from the labeled source domain to the unlabeled target domain. We propose a novel self-paced contrastive learning framework with hybrid memory. Our method outperforms state-of-the-arts on multiple domain adaptation tasks of object re-ID.
arXiv Detail & Related papers (2020-06-04T09:12:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.