Rosetta Stone at KSAA-RD Shared Task: A Hop From Language Modeling To
Word--Definition Alignment
- URL: http://arxiv.org/abs/2310.15823v3
- Date: Sun, 21 Jan 2024 12:40:48 GMT
- Title: Rosetta Stone at KSAA-RD Shared Task: A Hop From Language Modeling To
Word--Definition Alignment
- Authors: Ahmed ElBakry, Mohamed Gabr, Muhammad ElNokrashy, Badr AlKhamissi
- Abstract summary: This work focuses on deriving a vector representation of an Arabic word from its accompanying description.
For the first subtask, our approach relies on an ensemble of finetuned Arabic BERT-based models, predicting the word embedding for a given definition.
In contrast, the most effective solution for the second subtask involves translating the English test definitions into Arabic.
- Score: 2.6672466522084948
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: A Reverse Dictionary is a tool enabling users to discover a word based on its
provided definition, meaning, or description. Such a technique proves valuable
in various scenarios, aiding language learners who possess a description of a
word without its identity, and benefiting writers seeking precise terminology.
These scenarios often encapsulate what is referred to as the
"Tip-of-the-Tongue" (TOT) phenomena. In this work, we present our winning
solution for the Arabic Reverse Dictionary shared task. This task focuses on
deriving a vector representation of an Arabic word from its accompanying
description. The shared task encompasses two distinct subtasks: the first
involves an Arabic definition as input, while the second employs an English
definition. For the first subtask, our approach relies on an ensemble of
finetuned Arabic BERT-based models, predicting the word embedding for a given
definition. The final representation is obtained through averaging the output
embeddings from each model within the ensemble. In contrast, the most effective
solution for the second subtask involves translating the English test
definitions into Arabic and applying them to the finetuned models originally
trained for the first subtask. This straightforward method achieves the highest
score across both subtasks.
Related papers
- Presence or Absence: Are Unknown Word Usages in Dictionaries? [6.185216877366987]
We evaluate our system in the AXOLOTL-24 shared task for Finnish, Russian and German languages.
We use a graph-based clustering approach to predict mappings between unknown word usages and dictionary entries.
Our system ranks first in Finnish and German, and ranks second in Russian on the Subtask 2 testphase leaderboard.
arXiv Detail & Related papers (2024-06-02T07:57:45Z) - Sõnajaht: Definition Embeddings and Semantic Search for Reverse Dictionary Creation [0.21485350418225246]
We present an information retrieval based reverse dictionary system using modern pre-trained language models and approximate nearest neighbors search algorithms.
The proposed approach is applied to an existing Estonian language lexicon resource, Sonaveeb (word web), with the purpose of enhancing and enriching it by introducing cross-lingual reverse dictionary functionality powered by semantic search.
arXiv Detail & Related papers (2024-04-30T10:21:14Z) - Did You Read the Instructions? Rethinking the Effectiveness of Task
Definitions in Instruction Learning [74.70157466822612]
We systematically study the role of task definitions in instruction learning.
We find that model performance drops substantially when removing contents describing the task output.
We propose two strategies to help models better leverage task instructions.
arXiv Detail & Related papers (2023-06-01T21:11:24Z) - CompoundPiece: Evaluating and Improving Decompounding Performance of
Language Models [77.45934004406283]
We systematically study decompounding, the task of splitting compound words into their constituents.
We introduce a dataset of 255k compound and non-compound words across 56 diverse languages obtained from Wiktionary.
We introduce a novel methodology to train dedicated models for decompounding.
arXiv Detail & Related papers (2023-05-23T16:32:27Z) - Semeval-2022 Task 1: CODWOE -- Comparing Dictionaries and Word
Embeddings [1.5293427903448025]
We focus on relating opaque word vectors with human-readable definitions.
This problem naturally divides into two subtasks: converting definitions into embeddings, and converting embeddings into definitions.
This task was conducted in a multilingual setting, using comparable sets of embeddings trained homogeneously.
arXiv Detail & Related papers (2022-05-27T09:40:33Z) - IRB-NLP at SemEval-2022 Task 1: Exploring the Relationship Between Words
and Their Semantic Representations [0.0]
We present our findings based on the descriptive, exploratory, and predictive data analysis conducted on the CODWOE dataset.
We give a detailed overview of the systems that we designed for Definition Modeling and Reverse Dictionary tasks.
arXiv Detail & Related papers (2022-05-13T18:15:20Z) - A Unified Model for Reverse Dictionary and Definition Modelling [7.353994554197792]
We train a dual-way neural dictionary to guess words from definitions (reverse dictionary) and produce definitions given words (definition modelling)
Our method learns the two tasks simultaneously, and handles unknown words via embeddings.
It casts a word or a definition to the same representation space through a shared layer, then generates the other form from there, in a multi-task fashion.
arXiv Detail & Related papers (2022-05-09T23:52:39Z) - Connect-the-Dots: Bridging Semantics between Words and Definitions via
Aligning Word Sense Inventories [47.03271152494389]
Word Sense Disambiguation aims to automatically identify the exact meaning of one word according to its context.
Existing supervised models struggle to make correct predictions on rare word senses due to limited training data.
We propose a gloss alignment algorithm that can align definition sentences with the same meaning from different sense inventories to collect rich lexical knowledge.
arXiv Detail & Related papers (2021-10-27T00:04:33Z) - RUSSE'2020: Findings of the First Taxonomy Enrichment Task for the
Russian language [70.27072729280528]
This paper describes the results of the first shared task on taxonomy enrichment for the Russian language.
16 teams participated in the task demonstrating high results with more than half of them outperforming the provided baseline.
arXiv Detail & Related papers (2020-05-22T13:30:37Z) - Words aren't enough, their order matters: On the Robustness of Grounding
Visual Referring Expressions [87.33156149634392]
We critically examine RefCOg, a standard benchmark for visual referring expression recognition.
We show that 83.7% of test instances do not require reasoning on linguistic structure.
We propose two methods, one based on contrastive learning and the other based on multi-task learning, to increase the robustness of ViLBERT.
arXiv Detail & Related papers (2020-05-04T17:09:15Z) - Lexical Sememe Prediction using Dictionary Definitions by Capturing
Local Semantic Correspondence [94.79912471702782]
Sememes, defined as the minimum semantic units of human languages, have been proven useful in many NLP tasks.
We propose a Sememe Correspondence Pooling (SCorP) model, which is able to capture this kind of matching to predict sememes.
We evaluate our model and baseline methods on a famous sememe KB HowNet and find that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-01-16T17:30:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.