Overview of ADoBo 2021: Automatic Detection of Unassimilated Borrowings
in the Spanish Press
- URL: http://arxiv.org/abs/2110.15682v1
- Date: Fri, 29 Oct 2021 11:07:59 GMT
- Title: Overview of ADoBo 2021: Automatic Detection of Unassimilated Borrowings
in the Spanish Press
- Authors: Elena \'Alvarez Mellado, Luis Espinosa Anke, Julio Gonzalo Arroyo,
Constantine Lignos, Jordi Porta Zamorano
- Abstract summary: This paper summarizes the main findings of the ADoBo 2021 shared task, proposed in the context of IberLef 2021.
In this task, we invited participants to detect lexical borrowings (coming mostly from English) in Spanish newswire texts.
We provided participants with an annotated corpus of lexical borrowings which we split into training, development and test splits.
- Score: 8.950918531231158
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper summarizes the main findings of the ADoBo 2021 shared task,
proposed in the context of IberLef 2021. In this task, we invited participants
to detect lexical borrowings (coming mostly from English) in Spanish newswire
texts. This task was framed as a sequence classification problem using BIO
encoding. We provided participants with an annotated corpus of lexical
borrowings which we split into training, development and test splits. We
received submissions from 4 teams with 9 different system runs overall. The
results, which range from F1 scores of 37 to 85, suggest that this is a
challenging task, especially when out-of-domain or OOV words are considered,
and that traditional methods informed with lexicographic information would
benefit from taking advantage of current NLP trends.
Related papers
- BLP-2023 Task 2: Sentiment Analysis [7.725694295666573]
We present an overview of the BLP Sentiment Shared Task, organized as part of the inaugural BLP 2023 workshop.
The task is defined as the detection of sentiment in a given piece of social media text.
This paper provides a detailed account of the task setup, including dataset development and evaluation setup.
arXiv Detail & Related papers (2023-10-24T21:00:41Z) - ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich
Document Images [198.35937007558078]
The competition opened on 30th December, 2022 and closed on 24th March, 2023.
There are 35 participants and 91 valid submissions received for Track 1, and 15 participants and 26 valid submissions received for Track 2.
According to the performance of the submissions, we believe there is still a large gap on the expected information extraction performance for complex and zero-shot scenarios.
arXiv Detail & Related papers (2023-06-05T22:20:52Z) - RuArg-2022: Argument Mining Evaluation [69.87149207721035]
This paper is a report of the organizers on the first competition of argumentation analysis systems dealing with Russian language texts.
A corpus containing 9,550 sentences (comments on social media posts) on three topics related to the COVID-19 pandemic was prepared.
The system that won the first place in both tasks used the NLI (Natural Language Inference) variant of the BERT architecture.
arXiv Detail & Related papers (2022-06-18T17:13:37Z) - ITTC @ TREC 2021 Clinical Trials Track [54.141379782822206]
The task focuses on the problem of matching eligible clinical trials to topics constituting a summary of a patient's admission notes.
We explore different ways of representing trials and topics using NLP techniques, and then use a common retrieval model to generate the ranked list of relevant trials for each topic.
The results from all our submitted runs are well above the median scores for all topics, but there is still plenty of scope for improvement.
arXiv Detail & Related papers (2022-02-16T04:56:47Z) - DSC-IITISM at FinCausal 2021: Combining POS tagging with Attention-based
Contextual Representations for Identifying Causal Relationships in Financial
Documents [0.0]
Causality detection has applications in information retrieval, event prediction, question answering, financial analysis, and market research.
In this study, we explore several methods to identify and extract cause-effect pairs in financial documents using transformers.
Our best methodology achieves an F1-Score of 0.9551, and an Exact Match Score of 0.8777 on the blind test.
arXiv Detail & Related papers (2021-10-31T13:09:19Z) - CAiRE in DialDoc21: Data Augmentation for Information-Seeking Dialogue
System [55.43871578056878]
In DialDoc21 competition, our system achieved 74.95 F1 score and 60.74 Exact Match score in subtask 1, and 37.72 SacreBLEU score in subtask 2.
arXiv Detail & Related papers (2021-06-07T11:40:55Z) - Constraint 2021: Machine Learning Models for COVID-19 Fake News
Detection Shared Task [0.7614628596146599]
We address the challenge of classifying COVID-19 related social media posts as either fake or real.
In our system, we address this challenge by applying classical machine learning algorithms together with several linguistic features.
We find our best performing system to be based on a linear SVM, which obtains a weighted average F1 score of 95.19% on test data.
arXiv Detail & Related papers (2021-01-11T05:57:32Z) - Exploring Text-transformers in AAAI 2021 Shared Task: COVID-19 Fake News
Detection in English [30.61407811064534]
We describe our system for the AAAI 2021 shared task of COVID-19 Fake News Detection in English.
We proposed an ensemble method of different pre-trained language models including BERT, Roberta, Ernie, etc.
We also conduct an extensive analysis of the samples that are not correctly classified.
arXiv Detail & Related papers (2021-01-07T04:01:13Z) - WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets [21.41654078561586]
We describe how we construct a corpus of 10K Tweets and organize the development and evaluation phases for this task.
We present a brief summary of results obtained from the final system evaluation submissions of 55 teams.
arXiv Detail & Related papers (2020-10-16T08:28:05Z) - SemEval-2020 Task 10: Emphasis Selection for Written Text in Visual
Media [50.29389719723529]
We present the main findings and compare the results of SemEval-2020 Task 10, Emphasis Selection for Written Text in Visual Media.
The goal of this shared task is to design automatic methods for emphasis selection.
The analysis of systems submitted to the task indicates that BERT and RoBERTa were the most common choice of pre-trained models used.
arXiv Detail & Related papers (2020-08-07T17:24:53Z) - Domain-Specific Language Model Pretraining for Biomedical Natural
Language Processing [73.37262264915739]
We show that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains.
Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks.
arXiv Detail & Related papers (2020-07-31T00:04:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.