SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence
Embedding
- URL: http://arxiv.org/abs/2204.10050v1
- Date: Thu, 21 Apr 2022 12:20:52 GMT
- Title: SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence
Embedding
- Authors: Harish Tayyar Madabushi, Edward Gow-Smith, Marcos Garcia, Carolina
Scarton, Marco Idiart, Aline Villavicencio
- Abstract summary: This paper presents the shared task on Multilingualaticity Detection and Sentence Embedding.
It consists of two subtasks: (a) a binary classification one aimed at identifying whether a sentence contains an idiomatic expression, and (b) a task based on semantic text similarity which requires the model to adequately represent potentially idiomatic expressions in context.
The task had close to 100 registered participants organised into twenty five teams making over 650 and 150 submissions in the practice and evaluation phases respectively.
- Score: 12.843166994677286
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper presents the shared task on Multilingual Idiomaticity Detection
and Sentence Embedding, which consists of two subtasks: (a) a binary
classification one aimed at identifying whether a sentence contains an
idiomatic expression, and (b) a task based on semantic text similarity which
requires the model to adequately represent potentially idiomatic expressions in
context. Each subtask includes different settings regarding the amount of
training data. Besides the task description, this paper introduces the datasets
in English, Portuguese, and Galician and their annotation procedure, the
evaluation metrics, and a summary of the participant systems and their results.
The task had close to 100 registered participants organised into twenty five
teams making over 650 and 150 submissions in the practice and evaluation phases
respectively.
Related papers
- SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection [68.858931667807]
Subtask A is a binary classification task determining whether a text is written by a human or generated by a machine.
Subtask B is to detect the exact source of a text, discerning whether it is written by a human or generated by a specific LLM.
Subtask C aims to identify the changing point within a text, at which the authorship transitions from human to machine.
arXiv Detail & Related papers (2024-04-22T13:56:07Z) - IITK at SemEval-2024 Task 1: Contrastive Learning and Autoencoders for Semantic Textual Relatedness in Multilingual Texts [4.78482610709922]
This paper describes our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness.
The challenge is focused on automatically detecting the degree of relatedness between pairs of sentences for 14 languages.
arXiv Detail & Related papers (2024-04-06T05:58:42Z) - AAdaM at SemEval-2024 Task 1: Augmentation and Adaptation for Multilingual Semantic Textual Relatedness [16.896143197472114]
This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian languages.
We propose using machine translation for data augmentation to address the low-resource challenge of limited training data.
We achieve competitive results in the shared task: our system performs the best among all ranked teams in both subtask A (supervised learning) and subtask C (cross-lingual transfer)
arXiv Detail & Related papers (2024-04-01T21:21:15Z) - SemEval-2022 Task 7: Identifying Plausible Clarifications of Implicit
and Underspecified Phrases in Instructional Texts [1.3586926359715774]
We describe SemEval-2022 Task 7, a shared task on rating the plausibility of clarifications in instructional texts.
The dataset for this task consists of manually clarified how-to guides for which we generated alternative clarifications and collected human plausibility judgements.
The task of participating systems was to automatically determine the plausibility of a clarification in the respective context.
arXiv Detail & Related papers (2023-09-21T14:19:04Z) - RuArg-2022: Argument Mining Evaluation [69.87149207721035]
This paper is a report of the organizers on the first competition of argumentation analysis systems dealing with Russian language texts.
A corpus containing 9,550 sentences (comments on social media posts) on three topics related to the COVID-19 pandemic was prepared.
The system that won the first place in both tasks used the NLI (Natural Language Inference) variant of the BERT architecture.
arXiv Detail & Related papers (2022-06-18T17:13:37Z) - OCHADAI at SemEval-2022 Task 2: Adversarial Training for Multilingual
Idiomaticity Detection [4.111899441919165]
We propose a multilingual adversarial training model for determining whether a sentence contains an idiomatic expression.
Our model relies on pre-trained contextual representations from different multi-lingual state-of-the-art transformer-based language models.
arXiv Detail & Related papers (2022-06-07T05:52:43Z) - Benchmarking Generalization via In-Context Instructions on 1,600+
Language Tasks [95.06087720086133]
Natural-Instructions v2 is a collection of 1,600+ diverse language tasks and their expert written instructions.
The benchmark covers 70+ distinct task types, such as tagging, in-filling, and rewriting.
This benchmark enables large-scale evaluation of cross-task generalization of the models.
arXiv Detail & Related papers (2022-04-16T03:12:30Z) - Handshakes AI Research at CASE 2021 Task 1: Exploring different
approaches for multilingual tasks [0.22940141855172036]
The aim of the CASE 2021 Shared Task 1 was to detect and classify socio-political and crisis event information in a multilingual setting.
Our submission contained entries in all of the subtasks, and the scores obtained validated our research finding.
arXiv Detail & Related papers (2021-10-29T07:58:49Z) - SemEval-2020 Task 10: Emphasis Selection for Written Text in Visual
Media [50.29389719723529]
We present the main findings and compare the results of SemEval-2020 Task 10, Emphasis Selection for Written Text in Visual Media.
The goal of this shared task is to design automatic methods for emphasis selection.
The analysis of systems submitted to the task indicates that BERT and RoBERTa were the most common choice of pre-trained models used.
arXiv Detail & Related papers (2020-08-07T17:24:53Z) - BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity.
Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset.
We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z) - A Survey on Contextual Embeddings [48.04732268018772]
Contextual embeddings assign each word a representation based on its context, capturing uses of words across varied contexts and encoding knowledge that transfers across languages.
We review existing contextual embedding models, cross-lingual polyglot pre-training, the application of contextual embeddings in downstream tasks, model compression, and model analyses.
arXiv Detail & Related papers (2020-03-16T15:22:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.