problemConquero at SemEval-2020 Task 12: Transformer and Soft
label-based approaches
- URL: http://arxiv.org/abs/2007.10877v1
- Date: Tue, 21 Jul 2020 15:06:58 GMT
- Title: problemConquero at SemEval-2020 Task 12: Transformer and Soft
label-based approaches
- Authors: Karishma Laud, Jagriti Singh, Randeep Kumar Sahu, Ashutosh Modi
- Abstract summary: We present various systems submitted by our team problemConquero for SemEval-2020 Shared Task 12 Multilingual Offensive Language Identification in Social Media.
We participated in all the three sub-tasks of OffensEval-2020, and our final submissions during the evaluation phase included transformer-based approaches and a soft label-based approach.
- Score: 2.434159858639793
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present various systems submitted by our team
problemConquero for SemEval-2020 Shared Task 12 Multilingual Offensive Language
Identification in Social Media. We participated in all the three sub-tasks of
OffensEval-2020, and our final submissions during the evaluation phase included
transformer-based approaches and a soft label-based approach. BERT based
fine-tuned models were submitted for each language of sub-task A (offensive
tweet identification). RoBERTa based fine-tuned model for sub-task B (automatic
categorization of offense types) was submitted. We submitted two models for
sub-task C (offense target identification), one using soft labels and the other
using BERT based fine-tuned model. Our ranks for sub-task A were Greek-19 out
of 37, Turkish-22 out of 46, Danish-26 out of 39, Arabic-39 out of 53, and
English-20 out of 85. We achieved a rank of 28 out of 43 for sub-task B. Our
best rank for sub-task C was 20 out of 39 using BERT based fine-tuned model.
Related papers
- SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection [68.858931667807]
Subtask A is a binary classification task determining whether a text is written by a human or generated by a machine.
Subtask B is to detect the exact source of a text, discerning whether it is written by a human or generated by a specific LLM.
Subtask C aims to identify the changing point within a text, at which the authorship transitions from human to machine.
arXiv Detail & Related papers (2024-04-22T13:56:07Z) - Unify word-level and span-level tasks: NJUNLP's Participation for the
WMT2023 Quality Estimation Shared Task [59.46906545506715]
We introduce the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task.
Our team submitted predictions for the English-German language pair on all two sub-tasks.
Our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks.
arXiv Detail & Related papers (2023-09-23T01:52:14Z) - CL-UZH at SemEval-2023 Task 10: Sexism Detection through Incremental
Fine-Tuning and Multi-Task Learning with Label Descriptions [0.0]
SemEval shared task textitTowards Explainable Detection of Online Sexism (EDOS 2023) is to detect sexism in English social media posts.
We present our submitted systems for all three subtasks, based on a multi-task model that has been fine-tuned on a range of related tasks.
We implement multi-task learning by formulating each task as binary pairwise text classification, where the dataset and label descriptions are given along with the input text.
arXiv Detail & Related papers (2023-06-06T17:59:49Z) - BJTU-WeChat's Systems for the WMT22 Chat Translation Task [66.81525961469494]
This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German.
Based on the Transformer, we apply several effective variants.
Our systems achieve 0.810 and 0.946 COMET scores.
arXiv Detail & Related papers (2022-11-28T02:35:04Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - UPB at SemEval-2020 Task 12: Multilingual Offensive Language Detection
on Social Media by Fine-tuning a Variety of BERT-based Models [0.0]
This paper describes our Transformer-based solutions for identifying offensive language on Twitter in five languages.
It was employed in Subtask A of the Offenseval 2020 shared task.
arXiv Detail & Related papers (2020-10-26T14:28:29Z) - QiaoNing at SemEval-2020 Task 4: Commonsense Validation and Explanation
system based on ensemble of language model [2.728575246952532]
In this paper, we present language model system submitted to SemEval-2020 Task 4 competition: "Commonsense Validation and Explanation"
We implemented with transfer learning using pretrained language models (BERT, XLNet, RoBERTa, and ALBERT) and fine-tune them on this task.
The ensembled model better solves this problem, making the model's accuracy reached 95.9% on subtask A, which just worse than human's by only 3% accuracy.
arXiv Detail & Related papers (2020-09-06T05:12:50Z) - BUT-FIT at SemEval-2020 Task 4: Multilingual commonsense [1.433758865948252]
This paper describes work of the BUT-FIT's team at SemEval 2020 Task 4 - Commonsense Validation and Explanation.
In subtasks A and B, our submissions are based on pretrained language representation models (namely ALBERT) and data augmentation.
We experimented with solving the task for another language, Czech, by means of multilingual models and machine translated dataset.
We show that with a strong machine translation system, our system can be used in another language with a small accuracy loss.
arXiv Detail & Related papers (2020-08-17T12:45:39Z) - LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for
Multi-Granular Propaganda Span Identification [70.1903083747775]
This paper describes our submission for the task of Propaganda Span Identification in news articles.
We introduce a BERT-BiLSTM based span-level propaganda classification model that identifies which token spans within the sentence are indicative of propaganda.
arXiv Detail & Related papers (2020-08-11T16:14:47Z) - LT@Helsinki at SemEval-2020 Task 12: Multilingual or language-specific
BERT? [0.42056926734482064]
This paper presents the different models submitted by the LT@Heldirectional team for the SemEval 2020 Shared Task 12.
Our team participated in sub-tasks A and C; titled offensive language identification and offense target identification, respectively.
In both cases we used the so-called Bisinki Representation from Transformer (BERT), a model pre-trained by Google and fine-tuned by us on the OLID and SOLID datasets.
arXiv Detail & Related papers (2020-08-03T12:03:17Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.