TEAM-Atreides at SemEval-2022 Task 11: On leveraging data augmentation
and ensemble to recognize complex Named Entities in Bangla
- URL: http://arxiv.org/abs/2204.09964v1
- Date: Thu, 21 Apr 2022 08:40:17 GMT
- Title: TEAM-Atreides at SemEval-2022 Task 11: On leveraging data augmentation
and ensemble to recognize complex Named Entities in Bangla
- Authors: Nazia Tasnim, Md. Istiak Hossain Shihab, Asif Shahriyar Sushmit,
Steven Bethard and Farig Sadeque
- Abstract summary: We describe our contribution to SemEval 2022 Task 11 on identifying complex Named Entities.
We have leveraged the ensemble of multiple ELECTRA-based models that were exclusively pretrained on the Bangla language.
We will also present the outcomes of our experiments on architectural decisions, dataset augmentations, and post-competition findings.
- Score: 11.963792253163247
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many areas, such as the biological and healthcare domain, artistic works, and
organization names, have nested, overlapping, discontinuous entity mentions
that may even be syntactically or semantically ambiguous in practice.
Traditional sequence tagging algorithms are unable to recognize these complex
mentions because they may violate the assumptions upon which sequence tagging
schemes are founded. In this paper, we describe our contribution to SemEval
2022 Task 11 on identifying such complex Named Entities. We have leveraged the
ensemble of multiple ELECTRA-based models that were exclusively pretrained on
the Bangla language with the performance of ELECTRA-based models pretrained on
English to achieve competitive performance on the Track-11. Besides providing a
system description, we will also present the outcomes of our experiments on
architectural decisions, dataset augmentations, and post-competition findings.
Related papers
- Semi-Supervised One-Shot Imitation Learning [83.94646047695412]
One-shot Imitation Learning aims to imbue AI agents with the ability to learn a new task from a single demonstration.
We introduce the semi-supervised OSIL problem setting, where the learning agent is presented with a large dataset of trajectories.
We develop an algorithm specifically applicable to this semi-supervised OSIL setting.
arXiv Detail & Related papers (2024-08-09T18:11:26Z) - Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts
for Zero-Shot Dialogue State Tracking [83.40120598637665]
Zero-shot transfer learning for Dialogue State Tracking (DST) helps to handle a variety of task-oriented dialogue domains without the cost of collecting in-domain data.
Existing works mainly study common data- or model-level augmentation methods to enhance the generalization.
We present a simple and effective "divide, conquer and combine" solution, which explicitly disentangles the semantics of seen data.
arXiv Detail & Related papers (2023-06-01T08:21:20Z) - BanglaCoNER: Towards Robust Bangla Complex Named Entity Recognition [0.0]
We present the winning solution of Bangla Complex Named Entity Recognition Challenge.
The dataset consisted of 15300 sentences for training and 800 sentences for validation, in the.conll format.
Our findings also demonstrate the efficacy of Deep Learning models such as BanglaBERT for NER in Bangla language.
arXiv Detail & Related papers (2023-03-16T13:31:31Z) - Disambiguation of Company names via Deep Recurrent Networks [101.90357454833845]
We propose a Siamese LSTM Network approach to extract -- via supervised learning -- an embedding of company name strings.
We analyse how an Active Learning approach to prioritise the samples to be labelled leads to a more efficient overall learning pipeline.
arXiv Detail & Related papers (2023-03-07T15:07:57Z) - SU-NLP at SemEval-2022 Task 11: Complex Named Entity Recognition with
Entity Linking [0.0]
We developed an unsupervised entity linking pipeline that detects potential entity mentions with the help of Wikipedia.
Our results showed that our pipeline improved performance significantly, especially for complex entities in low-context settings.
arXiv Detail & Related papers (2022-03-22T16:09:34Z) - USTC-NELSLIP at SemEval-2022 Task 11: Gazetteer-Adapted Integration
Network for Multilingual Complex Named Entity Recognition [41.26523047041553]
This paper describes the system developed by the USTC-NELSLIP team for SemEval-2022 Task 11 Multilingual Complex Named Entities Recognition (MultiCoNER)
We propose a gazetteer-adapted integration network (GAIN) to improve the performance of language models for recognizing complex named entities.
arXiv Detail & Related papers (2022-03-07T09:05:37Z) - DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for
Multilingual Named Entity Recognition [94.1865071914727]
MultiCoNER aims at detecting semantically ambiguous named entities in short and low-context settings for multiple languages.
Our team DAMO-NLP proposes a knowledge-based system, where we build a multilingual knowledge base based on Wikipedia.
Given an input sentence, our system effectively retrieves related contexts from the knowledge base.
Our system wins 10 out of 13 tracks in the MultiCoNER shared task.
arXiv Detail & Related papers (2022-03-01T15:29:35Z) - LMN at SemEval-2022 Task 11: A Transformer-based System for English
Named Entity Recognition [0.0]
We present our participation in the English track of SemEval-2022 Task 11: Multilingual Complex Named Entity Recognition.
Inspired by the recent advances in pretrained Transformer language models, we propose a simple yet effective Transformer-based baseline for the task.
Our proposed approach shows competitive results in the leaderboard as we ranked 12 over 30 teams.
arXiv Detail & Related papers (2022-02-13T05:46:14Z) - SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and
Synonym Discovery [66.24624547470175]
SynSetExpan is a novel framework that enables two tasks to mutually enhance each other.
We create the first large-scale Synonym-Enhanced Set Expansion dataset via crowdsourcing.
Experiments on the SE2 dataset and previous benchmarks demonstrate the effectiveness of SynSetExpan for both entity set expansion and synonym discovery tasks.
arXiv Detail & Related papers (2020-09-29T07:32:17Z) - Grounded Situation Recognition [56.18102368133022]
We introduce Grounded Situation Recognition (GSR), a task that requires producing structured semantic summaries of images.
GSR presents important technical challenges: identifying semantic saliency, categorizing and localizing a large and diverse set of entities.
We show initial findings on three exciting future directions enabled by our models: conditional querying, visual chaining, and grounded semantic aware image retrieval.
arXiv Detail & Related papers (2020-03-26T17:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.