DAMO-NLP at SemEval-2023 Task 2: A Unified Retrieval-augmented System
for Multilingual Named Entity Recognition
- URL: http://arxiv.org/abs/2305.03688v3
- Date: Wed, 17 May 2023 03:14:00 GMT
- Title: DAMO-NLP at SemEval-2023 Task 2: A Unified Retrieval-augmented System
for Multilingual Named Entity Recognition
- Authors: Zeqi Tan, Shen Huang, Zixia Jia, Jiong Cai, Yinghui Li, Weiming Lu,
Yueting Zhuang, Kewei Tu, Pengjun Xie, Fei Huang and Yong Jiang
- Abstract summary: The MultiCoNER RNum2 shared task aims to tackle multilingual named entity recognition (NER) in fine-grained and noisy scenarios.
Previous top systems in the MultiCoNER RNum1 either incorporate the knowledge bases or gazetteers.
We propose a unified retrieval-augmented system (U-RaNER) for fine-grained multilingual NER.
- Score: 94.90258603217008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The MultiCoNER \RNum{2} shared task aims to tackle multilingual named entity
recognition (NER) in fine-grained and noisy scenarios, and it inherits the
semantic ambiguity and low-context setting of the MultiCoNER \RNum{1} task. To
cope with these problems, the previous top systems in the MultiCoNER \RNum{1}
either incorporate the knowledge bases or gazetteers. However, they still
suffer from insufficient knowledge, limited context length, single retrieval
strategy. In this paper, our team \textbf{DAMO-NLP} proposes a unified
retrieval-augmented system (U-RaNER) for fine-grained multilingual NER. We
perform error analysis on the previous top systems and reveal that their
performance bottleneck lies in insufficient knowledge. Also, we discover that
the limited context length causes the retrieval knowledge to be invisible to
the model. To enhance the retrieval context, we incorporate the entity-centric
Wikidata knowledge base, while utilizing the infusion approach to broaden the
contextual scope of the model. Also, we explore various search strategies and
refine the quality of retrieval knowledge. Our system\footnote{We will release
the dataset, code, and scripts of our system at {\small
\url{https://github.com/modelscope/AdaSeq/tree/master/examples/U-RaNER}}.} wins
9 out of 13 tracks in the MultiCoNER \RNum{2} shared task. Additionally, we
compared our system with ChatGPT, one of the large language models which have
unlocked strong capabilities on many tasks. The results show that there is
still much room for improvement for ChatGPT on the extraction task.
Related papers
- CUNI Submission to MRL 2023 Shared Task on Multi-lingual Multi-task
Information Retrieval [5.97515243922116]
We present the Charles University system for the MRL2023 Shared Task on Multi-lingual Multi-task Information Retrieval.
The goal of the shared task was to develop systems for named entity recognition and question answering in several under-represented languages.
Our solutions to both subtasks rely on the translate-test approach.
arXiv Detail & Related papers (2023-10-25T10:22:49Z) - Learning to Rank Context for Named Entity Recognition Using a Synthetic Dataset [6.633914491587503]
We propose to generate a synthetic context retrieval training dataset using Alpaca.
Using this dataset, we train a neural context retriever based on a BERT model that is able to find relevant context for NER.
We show that our method outperforms several retrieval baselines for the NER task on an English literary dataset composed of the first chapter of 40 books.
arXiv Detail & Related papers (2023-10-16T06:53:12Z) - FonMTL: Towards Multitask Learning for the Fon Language [1.9370453715137865]
We present the first explorative approach to multitask learning, for model capabilities enhancement in Natural Language Processing for the Fon language.
We leverage two language model heads as encoders to build shared representations for the inputs, and we use linear layers blocks for classification relative to each task.
Our results on the NER and POS tasks for Fon, show competitive (or better) performances compared to several multilingual pretrained language models finetuned on single tasks.
arXiv Detail & Related papers (2023-08-28T03:26:21Z) - End-to-end Knowledge Retrieval with Multi-modal Queries [50.01264794081951]
ReMuQ requires a system to retrieve knowledge from a large corpus by integrating contents from both text and image queries.
We introduce a retriever model ReViz'' that can directly process input text and images to retrieve relevant knowledge in an end-to-end fashion.
We demonstrate superior performance in retrieval on two datasets under zero-shot settings.
arXiv Detail & Related papers (2023-06-01T08:04:12Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - MultiCoNER: A Large-scale Multilingual dataset for Complex Named Entity
Recognition [15.805414696789796]
We present MultiCoNER, a large multilingual dataset for Named Entity Recognition that covers 3 domains (Wiki sentences, questions, and search queries) across 11 languages.
This dataset is designed to represent contemporary challenges in NER, including low-context scenarios.
arXiv Detail & Related papers (2022-08-30T20:45:54Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z) - DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for
Multilingual Named Entity Recognition [94.1865071914727]
MultiCoNER aims at detecting semantically ambiguous named entities in short and low-context settings for multiple languages.
Our team DAMO-NLP proposes a knowledge-based system, where we build a multilingual knowledge base based on Wikipedia.
Given an input sentence, our system effectively retrieves related contexts from the knowledge base.
Our system wins 10 out of 13 tracks in the MultiCoNER shared task.
arXiv Detail & Related papers (2022-03-01T15:29:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.