USTC-NELSLIP at SemEval-2023 Task 2: Statistical Construction and Dual
Adaptation of Gazetteer for Multilingual Complex NER
- URL: http://arxiv.org/abs/2305.02517v1
- Date: Thu, 4 May 2023 03:00:46 GMT
- Title: USTC-NELSLIP at SemEval-2023 Task 2: Statistical Construction and Dual
Adaptation of Gazetteer for Multilingual Complex NER
- Authors: Jun-Yu Ma, Jia-Chen Gu, Jiajun Qi, Zhen-Hua Ling, Quan Liu, Xiaoyi
Zhao
- Abstract summary: This paper describes the system developed by the USTC-NELSLIP team for SemEval-2023 Task 2 Multilingual Complex Named Entity Recognition (MultiCoNER II)
The proposed method is applied to XLM-R with a gazetteer built from Wikidata, and shows great generalization ability across different tracks.
- Score: 36.39635200544498
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper describes the system developed by the USTC-NELSLIP team for
SemEval-2023 Task 2 Multilingual Complex Named Entity Recognition (MultiCoNER
II). A method named Statistical Construction and Dual Adaptation of Gazetteer
(SCDAG) is proposed for Multilingual Complex NER. The method first utilizes a
statistics-based approach to construct a gazetteer. Secondly, the
representations of gazetteer networks and language models are adapted by
minimizing the KL divergence between them at both the sentence-level and
entity-level. Finally, these two networks are then integrated for supervised
named entity recognition (NER) training. The proposed method is applied to
XLM-R with a gazetteer built from Wikidata, and shows great generalization
ability across different tracks. Experimental results and detailed analysis
verify the effectiveness of the proposed method. The official results show that
our system ranked 1st on one track (Hindi) in this task.
Related papers
- Dual-Alignment Pre-training for Cross-lingual Sentence Embedding [79.98111074307657]
We propose a dual-alignment pre-training (DAP) framework for cross-lingual sentence embedding.
We introduce a novel representation translation learning (RTL) task, where the model learns to use one-side contextualized token representation to reconstruct its translation counterpart.
Our approach can significantly improve sentence embedding.
arXiv Detail & Related papers (2023-05-16T03:53:30Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - DualNER: A Dual-Teaching framework for Zero-shot Cross-lingual Named
Entity Recognition [27.245171237640502]
DualNER is a framework to make full use of both annotated source language corpus and unlabeled target language text.
We combine two complementary learning paradigms of NER, i.e., sequence labeling and span prediction, into a unified multi-task framework.
arXiv Detail & Related papers (2022-11-15T12:50:59Z) - CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual
Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results.
We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER.
We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z) - A Dual-Contrastive Framework for Low-Resource Cross-Lingual Named Entity
Recognition [5.030581940990434]
Cross-lingual Named Entity Recognition (NER) has recently become a research hotspot because it can alleviate the data-hungry problem for low-resource languages.
In this paper, we describe our novel dual-contrastive framework ConCNER for cross-lingual NER under the scenario of limited source-language labeled data.
arXiv Detail & Related papers (2022-04-02T07:59:13Z) - USTC-NELSLIP at SemEval-2022 Task 11: Gazetteer-Adapted Integration
Network for Multilingual Complex Named Entity Recognition [41.26523047041553]
This paper describes the system developed by the USTC-NELSLIP team for SemEval-2022 Task 11 Multilingual Complex Named Entities Recognition (MultiCoNER)
We propose a gazetteer-adapted integration network (GAIN) to improve the performance of language models for recognizing complex named entities.
arXiv Detail & Related papers (2022-03-07T09:05:37Z) - An Attention Ensemble Approach for Efficient Text Classification of
Indian Languages [0.0]
This paper focuses on the coarse-grained technical domain identification of short text documents in Marathi, a Devanagari script-based Indian language.
A hybrid CNN-BiLSTM attention ensemble model is proposed that competently combines the intermediate sentence representations generated by the convolutional neural network and the bidirectional long short-term memory, leading to efficient text classification.
Experimental results show that the proposed model outperforms various baseline machine learning and deep learning models in the given task, giving the best validation accuracy of 89.57% and f1-score of 0.8875.
arXiv Detail & Related papers (2021-02-20T07:31:38Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.