UCAS-IIE-NLP at SemEval-2023 Task 12: Enhancing Generalization of
Multilingual BERT for Low-resource Sentiment Analysis
- URL: http://arxiv.org/abs/2306.01093v1
- Date: Thu, 1 Jun 2023 19:10:09 GMT
- Title: UCAS-IIE-NLP at SemEval-2023 Task 12: Enhancing Generalization of
Multilingual BERT for Low-resource Sentiment Analysis
- Authors: Dou Hu, Lingwei Wei, Yaxin Liu, Wei Zhou, Songlin Hu
- Abstract summary: This paper describes our system designed for SemEval-2023 Task 12: Sentiment analysis for African languages.
Specifically, we design a lexicon-based multilingual BERT to facilitate language adaptation and sentiment-aware representation learning.
Our system achieved competitive results, largely outperforming baselines on both multilingual and zero-shot sentiment classification subtasks.
- Score: 24.542445315345464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper describes our system designed for SemEval-2023 Task 12: Sentiment
analysis for African languages. The challenge faced by this task is the
scarcity of labeled data and linguistic resources in low-resource settings. To
alleviate these, we propose a generalized multilingual system SACL-XLMR for
sentiment analysis on low-resource languages. Specifically, we design a
lexicon-based multilingual BERT to facilitate language adaptation and
sentiment-aware representation learning. Besides, we apply a supervised
adversarial contrastive learning technique to learn sentiment-spread structured
representations and enhance model generalization. Our system achieved
competitive results, largely outperforming baselines on both multilingual and
zero-shot sentiment classification subtasks. Notably, the system obtained the
1st rank on the zero-shot classification subtask in the official ranking.
Extensive experiments demonstrate the effectiveness of our system.
Related papers
- Revisiting non-English Text Simplification: A Unified Multilingual
Benchmark [14.891068432456262]
This paper introduces the MultiSim benchmark, a collection of 27 resources in 12 distinct languages containing over 1.7 million complex-simple sentence pairs.
Our experiments using MultiSim with pre-trained multilingual language models reveal exciting performance improvements from multilingual training in non-English settings.
arXiv Detail & Related papers (2023-05-25T03:03:29Z) - NLNDE at SemEval-2023 Task 12: Adaptive Pretraining and Source Language
Selection for Low-Resource Multilingual Sentiment Analysis [11.05909046179595]
This paper describes our system developed for the SemEval-2023 Task 12 "Sentiment Analysis for Low-resource African languages using Twitter dataset"
Our key findings are: Adapting the pretrained model to the target language and task using a small yet relevant corpus improves performance remarkably by more than 10 F1 score points.
In the shared task, our system wins 8 out of 15 tracks and, in particular, performs best in the multilingual evaluation.
arXiv Detail & Related papers (2023-04-28T21:02:58Z) - Multilingual Word Sense Disambiguation with Unified Sense Representation [55.3061179361177]
We propose building knowledge and supervised-based Multilingual Word Sense Disambiguation (MWSD) systems.
We build unified sense representations for multiple languages and address the annotation scarcity problem for MWSD by transferring annotations from rich-sourced languages to poorer ones.
Evaluations of SemEval-13 and SemEval-15 datasets demonstrate the effectiveness of our methodology.
arXiv Detail & Related papers (2022-10-14T01:24:03Z) - Multi-level Contrastive Learning for Cross-lingual Spoken Language
Understanding [90.87454350016121]
We develop novel code-switching schemes to generate hard negative examples for contrastive learning at all levels.
We develop a label-aware joint model to leverage label semantics for cross-lingual knowledge transfer.
arXiv Detail & Related papers (2022-05-07T13:44:28Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - PALI-NLP at SemEval-2022 Task 4: Discriminative Fine-tuning of Deep
Transformers for Patronizing and Condescending Language Detection [4.883341580669763]
We propose a novel Transformer-based model and its ensembles to accurately understand such language context for PCL detection.
To facilitate comprehension of the subtle and subjective nature of PCL, two fine-tuning strategies are applied.
The system achieves remarkable results on the official ranking, namely 1st in Subtask 1 and 5th in Subtask 2.
arXiv Detail & Related papers (2022-03-09T10:05:10Z) - Intent Classification Using Pre-Trained Embeddings For Low Resource
Languages [67.40810139354028]
Building Spoken Language Understanding systems that do not rely on language specific Automatic Speech Recognition is an important yet less explored problem in language processing.
We present a comparative study aimed at employing a pre-trained acoustic model to perform Spoken Language Understanding in low resource scenarios.
We perform experiments across three different languages: English, Sinhala, and Tamil each with different data sizes to simulate high, medium, and low resource scenarios.
arXiv Detail & Related papers (2021-10-18T13:06:59Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Improving the Lexical Ability of Pretrained Language Models for
Unsupervised Neural Machine Translation [127.81351683335143]
Cross-lingual pretraining requires models to align the lexical- and high-level representations of the two languages.
Previous research has shown that this is because the representations are not sufficiently aligned.
In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings.
arXiv Detail & Related papers (2021-03-18T21:17:58Z) - ANDES at SemEval-2020 Task 12: A jointly-trained BERT multilingual model
for offensive language detection [0.6445605125467572]
We jointly-trained a single model by fine-tuning Multilingual BERT to tackle the task across all the proposed languages.
Our single model had competitive results, with a performance close to top-performing systems.
arXiv Detail & Related papers (2020-08-13T16:07:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.