Improving Tagging Consistency and Entity Coverage for Chemical
Identification in Full-text Articles
- URL: http://arxiv.org/abs/2111.10584v1
- Date: Sat, 20 Nov 2021 13:13:58 GMT
- Title: Improving Tagging Consistency and Entity Coverage for Chemical
Identification in Full-text Articles
- Authors: Hyunjae Kim, Mujeen Sung, Wonjin Yoon, Sungjoon Park, Jaewoo Kang
- Abstract summary: This paper is a technical report on our system submitted to the chemical identification task of the BioCreative VII Track 2 challenge.
We aim to improve tagging consistency and entity coverage using various methods.
In the official evaluation of the challenge, our system was ranked 1st in NER by significantly outperforming the baseline model.
- Score: 17.24298646089662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper is a technical report on our system submitted to the chemical
identification task of the BioCreative VII Track 2 challenge. The main feature
of this challenge is that the data consists of full-text articles, while
current datasets usually consist of only titles and abstracts. To effectively
address the problem, we aim to improve tagging consistency and entity coverage
using various methods such as majority voting within the same articles for
named entity recognition (NER) and a hybrid approach that combines a dictionary
and a neural model for normalization. In the experiments on the NLM-Chem
dataset, we show that our methods improve models' performance, particularly in
terms of recall. Finally, in the official evaluation of the challenge, our
system was ranked 1st in NER by significantly outperforming the baseline model
and more than 80 submissions from 16 teams.
Related papers
- SUMIE: A Synthetic Benchmark for Incremental Entity Summarization [6.149024468471498]
No existing dataset adequately tests how well language models can incrementally update entity summaries.
We introduce SUMIE, a fully synthetic dataset designed to expose real-world IES challenges.
This dataset effectively highlights problems like incorrect entity association and incomplete information presentation.
arXiv Detail & Related papers (2024-06-07T16:49:21Z) - Striking Gold in Advertising: Standardization and Exploration of Ad Text Generation [5.3558730908641525]
We propose a first benchmark dataset, CAMERA, to standardize the task of ATG.
Our experiments show the current state and the remaining challenges.
We also explore how existing metrics in ATG and an LLM-based evaluator align with human evaluations.
arXiv Detail & Related papers (2023-09-21T12:51:24Z) - NER-to-MRC: Named-Entity Recognition Completely Solving as Machine
Reading Comprehension [29.227500985892195]
We frame NER as a machine reading comprehension problem, called NER-to-MRC.
We transform the NER task into a form suitable for the model to solve with MRC in a efficient manner.
We achieve state-of-the-art performance without external data, up to 11.24% improvement on the WNUT-16 dataset.
arXiv Detail & Related papers (2023-05-06T08:05:22Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Nested Named Entity Recognition from Medical Texts: An Adaptive Shared
Network Architecture with Attentive CRF [53.55504611255664]
We propose a novel method, referred to as ASAC, to solve the dilemma caused by the nested phenomenon.
The proposed method contains two key modules: the adaptive shared (AS) part and the attentive conditional random field (ACRF) module.
Our model could learn better entity representations by capturing the implicit distinctions and relationships between different categories of entities.
arXiv Detail & Related papers (2022-11-09T09:23:56Z) - Questioning the Validity of Summarization Datasets and Improving Their
Factual Consistency [14.974996886744083]
We release SummFC, a filtered summarization dataset with improved factual consistency.
We argue that our dataset should become a valid benchmark for developing and evaluating summarization systems.
arXiv Detail & Related papers (2022-10-31T15:04:20Z) - Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks.
We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z) - Chemical Identification and Indexing in PubMed Articles via BERT and
Text-to-Text Approaches [3.7462395049372894]
The Biocreative VII Track-2 challenge consists of named entity recognition, entity-linking (or entity-normalization), and topic indexing tasks.
We achieve our best performance with BERT-based BioMegatron models.
In addition to conventional NER methods, we attempt both named entity recognition and entity linking with a novel text-to-text or "prompt" based method.
arXiv Detail & Related papers (2021-11-30T18:21:06Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z) - Distantly-Supervised Named Entity Recognition with Noise-Robust Learning
and Language Model Augmented Self-Training [66.80558875393565]
We study the problem of training named entity recognition (NER) models using only distantly-labeled data.
We propose a noise-robust learning scheme comprised of a new loss function and a noisy label removal step.
Our method achieves superior performance, outperforming existing distantly-supervised NER models by significant margins.
arXiv Detail & Related papers (2021-09-10T17:19:56Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.