NLPDove at SemEval-2020 Task 12: Improving Offensive Language Detection
with Cross-lingual Transfer
- URL: http://arxiv.org/abs/2008.01354v1
- Date: Tue, 4 Aug 2020 06:20:50 GMT
- Title: NLPDove at SemEval-2020 Task 12: Improving Offensive Language Detection
with Cross-lingual Transfer
- Authors: Hwijeen Ahn and Jimin Sun and Chan Young Park and Jungyun Seo
- Abstract summary: This paper describes our approach to the task of identifying offensive languages in a multilingual setting.
We investigate two data augmentation strategies: using additional semi-supervised labels with different thresholds and cross-lingual transfer with data selection.
Our multilingual systems achieved competitive results in Greek, Danish, and Turkish at OffensEval 2020.
- Score: 10.007363787391952
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes our approach to the task of identifying offensive
languages in a multilingual setting. We investigate two data augmentation
strategies: using additional semi-supervised labels with different thresholds
and cross-lingual transfer with data selection. Leveraging the semi-supervised
dataset resulted in performance improvements compared to the baseline trained
solely with the manually-annotated dataset. We propose a new metric,
Translation Embedding Distance, to measure the transferability of instances for
cross-lingual data selection. We also introduce various preprocessing steps
tailored for social media text along with methods to fine-tune the pre-trained
multilingual BERT (mBERT) for offensive language identification. Our
multilingual systems achieved competitive results in Greek, Danish, and Turkish
at OffensEval 2020.
Related papers
- Constrained Decoding for Cross-lingual Label Projection [27.567195418950966]
Cross-lingual transfer using multilingual LLMs has become a popular learning paradigm for low-resource languages with no labeled training data.
However, for NLP tasks that involve fine-grained predictions on words and phrases, the performance of zero-shot cross-lingual transfer learning lags far behind supervised fine-tuning methods.
arXiv Detail & Related papers (2024-02-05T15:57:32Z) - Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data.
We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z) - Transfer-Free Data-Efficient Multilingual Slot Labeling [82.02076369811402]
Slot labeling is a core component of task-oriented dialogue (ToD) systems.
To mitigate the inherent data scarcity issue, current research on multilingual ToD assumes that sufficient English-language annotated data are always available.
We propose a two-stage slot labeling approach (termed TWOSL) which transforms standard multilingual sentence encoders into effective slot labelers.
arXiv Detail & Related papers (2023-05-22T22:47:32Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - MultiEURLEX -- A multi-lingual and multi-label legal document
classification dataset for zero-shot cross-lingual transfer [13.24356999779404]
We introduce MULTI-EURLEX, a new multilingual dataset for topic classification of legal documents.
The dataset comprises 65k European Union (EU) laws, officially translated in 23 languages, annotated with multiple labels from the EUROVOC taxonomy.
We use the dataset as a testbed for zero-shot cross-lingual transfer, where we exploit annotated training documents in one language (source) to classify documents in another language (target)
arXiv Detail & Related papers (2021-09-02T12:52:55Z) - MCL@IITK at SemEval-2021 Task 2: Multilingual and Cross-lingual
Word-in-Context Disambiguation using Augmented Data, Signals, and
Transformers [1.869621561196521]
We present our approach for solving the SemEval 2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC)
The goal is to detect whether a given word common to both the sentences evokes the same meaning.
We submit systems for both the settings - Multilingual and Cross-Lingual.
arXiv Detail & Related papers (2021-04-04T08:49:28Z) - Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond [58.80417796087894]
Cross-lingual adaptation with multilingual pre-trained language models (mPTLMs) mainly consists of two lines of works: zero-shot approach and translation-based approach.
We propose a novel framework to consolidate the zero-shot approach and the translation-based approach for better adaptation performance.
arXiv Detail & Related papers (2020-10-23T13:47:01Z) - A Study of Cross-Lingual Ability and Language-specific Information in
Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks.
Datasize and context window size are crucial factors to the transferability.
There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.