OPI at SemEval 2023 Task 9: A Simple But Effective Approach to
Multilingual Tweet Intimacy Analysis
- URL: http://arxiv.org/abs/2304.07130v1
- Date: Fri, 14 Apr 2023 13:49:28 GMT
- Title: OPI at SemEval 2023 Task 9: A Simple But Effective Approach to
Multilingual Tweet Intimacy Analysis
- Authors: S{\l}awomir Dadas
- Abstract summary: This paper describes our submission to the SemEval 2023 multilingual tweet intimacy analysis shared task.
The goal of the task was to assess the level of intimacy of Twitter posts in ten languages.
Our method was ranked first in five out of ten language subtasks, obtaining the highest average score across all languages.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes our submission to the SemEval 2023 multilingual tweet
intimacy analysis shared task. The goal of the task was to assess the level of
intimacy of Twitter posts in ten languages. The proposed approach consists of
several steps. First, we perform in-domain pre-training to create a language
model adapted to Twitter data. In the next step, we train an ensemble of
regression models to expand the training set with pseudo-labeled examples. The
extended dataset is used to train the final solution. Our method was ranked
first in five out of ten language subtasks, obtaining the highest average score
across all languages.
Related papers
- tmn at SemEval-2023 Task 9: Multilingual Tweet Intimacy Detection using
XLM-T, Google Translate, and Ensemble Learning [2.28438857884398]
The paper describes a transformer-based system designed for SemEval-2023 Task 9: Multilingual Tweet Intimacy Analysis.
The purpose of the task was to predict the intimacy of tweets in a range from 1 (not intimate at all) to 5 (very intimate)
arXiv Detail & Related papers (2023-04-08T15:50:16Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Meta-Learning a Cross-lingual Manifold for Semantic Parsing [75.26271012018861]
Localizing a semantic to support new languages requires effective cross-lingual generalization.
We introduce a first-order meta-learning algorithm to train a semantic annotated with maximal sample efficiency during cross-lingual transfer.
Results across six languages on ATIS demonstrate that our combination of steps yields accurate semantics sampling $le$10% of source training data in each new language.
arXiv Detail & Related papers (2022-09-26T10:42:17Z) - Pretraining Approaches for Spoken Language Recognition: TalTech
Submission to the OLR 2021 Challenge [0.0]
The paper is based on our submission to the Oriental Language Recognition 2021 Challenge.
For the constrained track, we first trained a Conformer-based encoder-decoder model for multilingual automatic speech recognition.
For the unconstrained task, we relied on both externally available pretrained models as well as external data.
arXiv Detail & Related papers (2022-05-14T15:17:08Z) - Por Qu\'e N\~ao Utiliser Alla Spr{\aa}k? Mixed Training with Gradient
Optimization in Few-Shot Cross-Lingual Transfer [2.7213511121305465]
We propose a one-step mixed training method that trains on both source and target data.
We use one model to handle all target languages simultaneously to avoid excessively language-specific models.
Our proposed method achieves state-of-the-art performance on all tasks and outperforms target-adapting by a large margin.
arXiv Detail & Related papers (2022-04-29T04:05:02Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - Cross-lingual Intermediate Fine-tuning improves Dialogue State Tracking [84.50302759362698]
We enhance the transfer learning process by intermediate fine-tuning of pretrained multilingual models.
We use parallel and conversational movie subtitles datasets to design cross-lingual intermediate tasks.
We achieve impressive improvements (> 20% on goal accuracy) on the parallel MultiWoZ dataset and Multilingual WoZ dataset.
arXiv Detail & Related papers (2021-09-28T11:22:38Z) - Facebook AI's WMT20 News Translation Task Submission [69.92594751788403]
This paper describes Facebook AI's submission to WMT20 shared news translation task.
We focus on the low resource setting and participate in two language pairs, Tamil -> English and Inuktitut -> English.
We approach the low resource problem using two main strategies, leveraging all available data and adapting the system to the target news domain.
arXiv Detail & Related papers (2020-11-16T21:49:00Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z) - NLPDove at SemEval-2020 Task 12: Improving Offensive Language Detection
with Cross-lingual Transfer [10.007363787391952]
This paper describes our approach to the task of identifying offensive languages in a multilingual setting.
We investigate two data augmentation strategies: using additional semi-supervised labels with different thresholds and cross-lingual transfer with data selection.
Our multilingual systems achieved competitive results in Greek, Danish, and Turkish at OffensEval 2020.
arXiv Detail & Related papers (2020-08-04T06:20:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.