kpfriends at SemEval-2022 Task 2: NEAMER -- Named Entity Augmented
Multi-word Expression Recognizer
- URL: http://arxiv.org/abs/2204.08102v1
- Date: Sun, 17 Apr 2022 22:58:33 GMT
- Title: kpfriends at SemEval-2022 Task 2: NEAMER -- Named Entity Augmented
Multi-word Expression Recognizer
- Authors: Min Sik Oh
- Abstract summary: This system is inspired by non-compositionality characteristics shared between Named Entity and idiomatic expressions.
We achieve SOTA with F1 0.9395 during post-evaluation phase and observe improvement in training stability.
Lastly, we experiment with non-compositionality knowledge transfer, cross-lingual fine-tuning and locality features.
- Score: 0.6091702876917281
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present NEAMER -- Named Entity Augmented Multi-word Expression Recognizer.
This system is inspired by non-compositionality characteristics shared between
Named Entity and Idiomatic Expressions. We utilize transfer learning and
locality features to enhance idiom classification task. This system is our
submission for SemEval Task 2: Multilingual Idiomaticity Detection and Sentence
Embedding Subtask A OneShot shared task. We achieve SOTA with F1 0.9395 during
post-evaluation phase. We also observe improvement in training stability.
Lastly, we experiment with non-compositionality knowledge transfer,
cross-lingual fine-tuning and locality features, which we also introduce in
this paper.
Related papers
- Presence or Absence: Are Unknown Word Usages in Dictionaries? [6.185216877366987]
We evaluate our system in the AXOLOTL-24 shared task for Finnish, Russian and German languages.
We use a graph-based clustering approach to predict mappings between unknown word usages and dictionary entries.
Our system ranks first in Finnish and German, and ranks second in Russian on the Subtask 2 testphase leaderboard.
arXiv Detail & Related papers (2024-06-02T07:57:45Z) - A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision [74.972172804514]
We introduce a multi-task Transformer model, CSLR2, that is able to ingest a signing sequence and output in a joint embedding space between signed language and spoken language text.
New dataset annotations provide continuous sign-level annotations for six hours of test videos, and will be made publicly available.
Our model significantly outperforms the previous state of the art on both tasks.
arXiv Detail & Related papers (2024-05-16T17:19:06Z) - Beyond Shared Vocabulary: Increasing Representational Word Similarities
across Languages for Multilingual Machine Translation [9.794506112999823]
In this paper, we define word-level information transfer pathways via word equivalence classes and rely on graph networks to fuse word embeddings across languages.
Our experiments demonstrate the advantages of our approach: 1) embeddings of words with similar meanings are better aligned across languages, 2) our method achieves consistent BLEU improvements of up to 2.3 points for high- and low-resource MNMT, and 3) less than 1.0% additional trainable parameters are required with a limited increase in computational costs.
arXiv Detail & Related papers (2023-05-23T16:11:00Z) - DAMO-NLP at SemEval-2023 Task 2: A Unified Retrieval-augmented System
for Multilingual Named Entity Recognition [94.90258603217008]
The MultiCoNER RNum2 shared task aims to tackle multilingual named entity recognition (NER) in fine-grained and noisy scenarios.
Previous top systems in the MultiCoNER RNum1 either incorporate the knowledge bases or gazetteers.
We propose a unified retrieval-augmented system (U-RaNER) for fine-grained multilingual NER.
arXiv Detail & Related papers (2023-05-05T16:59:26Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - Multilingual Word Sense Disambiguation with Unified Sense Representation [55.3061179361177]
We propose building knowledge and supervised-based Multilingual Word Sense Disambiguation (MWSD) systems.
We build unified sense representations for multiple languages and address the annotation scarcity problem for MWSD by transferring annotations from rich-sourced languages to poorer ones.
Evaluations of SemEval-13 and SemEval-15 datasets demonstrate the effectiveness of our methodology.
arXiv Detail & Related papers (2022-10-14T01:24:03Z) - DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for
Multilingual Named Entity Recognition [94.1865071914727]
MultiCoNER aims at detecting semantically ambiguous named entities in short and low-context settings for multiple languages.
Our team DAMO-NLP proposes a knowledge-based system, where we build a multilingual knowledge base based on Wikipedia.
Given an input sentence, our system effectively retrieves related contexts from the knowledge base.
Our system wins 10 out of 13 tracks in the MultiCoNER shared task.
arXiv Detail & Related papers (2022-03-01T15:29:35Z) - Wiki to Automotive: Understanding the Distribution Shift and its impact
on Named Entity Recognition [0.0]
Transfer learning is often unable to replicate the performance of pre-trained models on text of niche domains like Automotive.
We focus on performing the Named Entity Recognition (NER) task as it requires strong lexical, syntactic and semantic understanding by the model.
Fine-tuning the language models with automotive domain text did not make significant improvements to the NER performance.
arXiv Detail & Related papers (2021-12-01T05:13:47Z) - Interpretability Analysis for Named Entity Recognition to Understand
System Predictions and How They Can Improve [49.878051587667244]
We examine the performance of several variants of LSTM-CRF architectures for named entity recognition.
We find that context representations do contribute to system performance, but that the main factor driving high performance is learning the name tokens themselves.
We enlist human annotators to evaluate the feasibility of inferring entity types from the context alone and find that, while people are not able to infer the entity type either for the majority of the errors made by the context-only system, there is some room for improvement.
arXiv Detail & Related papers (2020-04-09T14:37:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.