DaN+: Danish Nested Named Entities and Lexical Normalization
- URL: http://arxiv.org/abs/2105.11301v1
- Date: Mon, 24 May 2021 14:35:21 GMT
- Title: DaN+: Danish Nested Named Entities and Lexical Normalization
- Authors: Barbara Plank, Kristian N{\o}rgaard Jensen and Rob van der Goot
- Abstract summary: This paper introduces DaN+, a new multi-domain corpus and annotation guidelines for Danish nested named entities (NEs) and lexical normalization.
We empirically assess three strategies to model the two-layer Named Entity Recognition (NER) task.
Our results show that 1) the most robust strategy is multi-task learning which is rivaled by multi-label decoding, 2) BERT-based NER models are sensitive to domain shifts, and 3) in-language BERT and lexical normalization are the most beneficial on the least canonical data.
- Score: 18.755176247223616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces DaN+, a new multi-domain corpus and annotation
guidelines for Danish nested named entities (NEs) and lexical normalization to
support research on cross-lingual cross-domain learning for a less-resourced
language. We empirically assess three strategies to model the two-layer Named
Entity Recognition (NER) task. We compare transfer capabilities from German
versus in-language annotation from scratch. We examine language-specific versus
multilingual BERT, and study the effect of lexical normalization on NER. Our
results show that 1) the most robust strategy is multi-task learning which is
rivaled by multi-label decoding, 2) BERT-based NER models are sensitive to
domain shifts, and 3) in-language BERT and lexical normalization are the most
beneficial on the least canonical data. Our results also show that an
out-of-domain setup remains challenging, while performance on news plateaus
quickly. This highlights the importance of cross-domain evaluation of
cross-lingual transfer.
Related papers
- mCL-NER: Cross-Lingual Named Entity Recognition via Multi-view
Contrastive Learning [54.523172171533645]
Cross-lingual named entity recognition (CrossNER) faces challenges stemming from uneven performance due to the scarcity of multilingual corpora.
We propose Multi-view Contrastive Learning for Cross-lingual Named Entity Recognition (mCL-NER)
Our experiments on the XTREME benchmark, spanning 40 languages, demonstrate the superiority of mCL-NER over prior data-driven and model-based approaches.
arXiv Detail & Related papers (2023-08-17T16:02:29Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - Few-Shot Nested Named Entity Recognition [4.8693196802491405]
This paper is the first one dedicated to studying the few-shot nested NER task.
We propose a Biaffine-based Contrastive Learning (BCL) framework to learn contextual dependency to distinguish nested entities.
The BCL outperformed three baseline models on the 1-shot and 5-shot tasks in terms of F1 score.
arXiv Detail & Related papers (2022-12-02T03:42:23Z) - DualNER: A Dual-Teaching framework for Zero-shot Cross-lingual Named
Entity Recognition [27.245171237640502]
DualNER is a framework to make full use of both annotated source language corpus and unlabeled target language text.
We combine two complementary learning paradigms of NER, i.e., sequence labeling and span prediction, into a unified multi-task framework.
arXiv Detail & Related papers (2022-11-15T12:50:59Z) - CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual
Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results.
We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER.
We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z) - A Dual-Contrastive Framework for Low-Resource Cross-Lingual Named Entity
Recognition [5.030581940990434]
Cross-lingual Named Entity Recognition (NER) has recently become a research hotspot because it can alleviate the data-hungry problem for low-resource languages.
In this paper, we describe our novel dual-contrastive framework ConCNER for cross-lingual NER under the scenario of limited source-language labeled data.
arXiv Detail & Related papers (2022-04-02T07:59:13Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - Generalised Unsupervised Domain Adaptation of Neural Machine Translation
with Cross-Lingual Data Selection [34.90952499734384]
We propose a cross-lingual data selection method to extract in-domain sentences in the missing language side from a large generic monolingual corpus.
Our proposed method trains an adaptive layer on top of multilingual BERT by contrastive learning to align the representation between the source and target language.
We evaluate our cross-lingual data selection method on NMT across five diverse domains in three language pairs, as well as a real-world scenario of translation for COVID-19.
arXiv Detail & Related papers (2021-09-09T14:12:12Z) - Learning Domain-Specialised Representations for Cross-Lingual Biomedical
Entity Linking [66.76141128555099]
We propose a novel cross-lingual biomedical entity linking task (XL-BEL)
We first investigate the ability of standard knowledge-agnostic as well as knowledge-enhanced monolingual and multilingual LMs beyond the standard monolingual English BEL task.
We then address the challenge of transferring domain-specific knowledge in resource-rich languages to resource-poor ones.
arXiv Detail & Related papers (2021-05-30T00:50:00Z) - GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and
Event Extraction [107.8262586956778]
We introduce graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic sentence representations.
GCNs struggle to model words with long-range dependencies or are not directly connected in the dependency tree.
We propose to utilize the self-attention mechanism to learn the dependencies between words with different syntactic distances.
arXiv Detail & Related papers (2020-10-06T20:30:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.