HFL at SemEval-2022 Task 8: A Linguistics-inspired Regression Model with
Data Augmentation for Multilingual News Similarity
- URL: http://arxiv.org/abs/2204.04844v1
- Date: Mon, 11 Apr 2022 03:08:37 GMT
- Title: HFL at SemEval-2022 Task 8: A Linguistics-inspired Regression Model with
Data Augmentation for Multilingual News Similarity
- Authors: Zihang Xu, Ziqing Yang, Yiming Cui, Zhigang Chen
- Abstract summary: This paper describes our system designed for SemEval-2022 Task 8: Multilingual News Article Similarity.
We proposed a linguistics-inspired model trained with a few task-specific strategies.
Our system ranked 1st on the leaderboard while achieving a Pearson's Correlation Coefficient of 0.818 on the official evaluation set.
- Score: 16.454545004093735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper describes our system designed for SemEval-2022 Task 8:
Multilingual News Article Similarity. We proposed a linguistics-inspired model
trained with a few task-specific strategies. The main techniques of our system
are: 1) data augmentation, 2) multi-label loss, 3) adapted R-Drop, 4) samples
reconstruction with the head-tail combination. We also present a brief analysis
of some negative methods like two-tower architecture. Our system ranked 1st on
the leaderboard while achieving a Pearson's Correlation Coefficient of 0.818 on
the official evaluation set.
Related papers
- BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense [0.04096453902709291]
This paper outlines our approach to SemEval 2024 Task 9, BRAINTEASER: A Novel Task Defying Common Sense.
The dataset comprises multi-choice questions that challenge models to think "outside of the box"
Our best method achieves an overall accuracy of 85 percent on the sentence puzzles subtask.
arXiv Detail & Related papers (2024-06-07T14:01:56Z) - Team QUST at SemEval-2024 Task 8: A Comprehensive Study of Monolingual
and Multilingual Approaches for Detecting AI-generated Text [0.1499944454332829]
This paper presents the participation of team QUST in Task 8 SemEval 2024.
We first performed data augmentation and cleaning on the dataset to enhance model training efficiency and accuracy.
In the monolingual task, we evaluated traditional deep-learning methods, multiscale positive-unlabeled framework (MPU), fine-tuning, adapters and ensemble methods.
arXiv Detail & Related papers (2024-02-19T08:22:51Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - UU-Tax at SemEval-2022 Task 3: Improving the generalizability of
language models for taxonomy classification through data augmentation [0.0]
This paper addresses the SemEval-2022 Task 3 PreTENS: Presupposed Taxonomies evaluating Neural Network Semantics.
The goal of the task is to identify if a sentence is deemed acceptable or not, depending on the taxonomic relationship that holds between a noun pair contained in the sentence.
We propose an effective way to enhance the robustness and the generalizability of language models for better classification.
arXiv Detail & Related papers (2022-10-07T07:41:28Z) - RoBLEURT Submission for the WMT2021 Metrics Task [72.26898579202076]
We present our submission to the Shared Metrics Task: RoBLEURT.
Our model reaches state-of-the-art correlations with the WMT 2020 human annotations upon 8 out of 10 to-English language pairs.
arXiv Detail & Related papers (2022-04-28T08:49:40Z) - Improving Neural Machine Translation by Bidirectional Training [85.64797317290349]
We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation.
Specifically, we bidirectionally update the model parameters at the early stage and then tune the model normally.
Experimental results show that BiT pushes the SOTA neural machine translation performance across 15 translation tasks on 8 language pairs significantly higher.
arXiv Detail & Related papers (2021-09-16T07:58:33Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - ZJUKLAB at SemEval-2021 Task 4: Negative Augmentation with Language
Model for Reading Comprehension of Abstract Meaning [16.151203366447962]
We explain the algorithms used to learn our models and the process of tuning the algorithms and selecting the best model.
Inspired by the similarity of the ReCAM task and the language pre-training, we propose a simple yet effective technology, namely, negative augmentation with language model.
Our models achieve the 4th rank on both official test sets of Subtask 1 and Subtask 2 with an accuracy of 87.9% and an accuracy of 92.8%, respectively.
arXiv Detail & Related papers (2021-02-25T13:03:05Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z) - FiSSA at SemEval-2020 Task 9: Fine-tuned For Feelings [2.362412515574206]
In this paper, we present our approach for sentiment classification on Spanish-English code-mixed social media data.
We explore both monolingual and multilingual models with the standard fine-tuning method.
Although two-step fine-tuning improves sentiment classification performance over the base model, the large multilingual XLM-RoBERTa model achieves best weighted F1-score.
arXiv Detail & Related papers (2020-07-24T14:48:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.