ReINTEL Challenge 2020: Exploiting Transfer Learning Models for Reliable
Intelligence Identification on Vietnamese Social Network Sites
- URL: http://arxiv.org/abs/2102.10794v3
- Date: Wed, 24 Feb 2021 03:08:43 GMT
- Title: ReINTEL Challenge 2020: Exploiting Transfer Learning Models for Reliable
Intelligence Identification on Vietnamese Social Network Sites
- Authors: Kim Thi-Thanh Nguyen, Kiet Van Nguyen
- Abstract summary: This paper presents the system that we propose for the Reliable Intelligence Indentification on Vietnamese Social Network Sites (ReINTEL)
In this task, the VLSP 2020 provides a dataset with approximately 6,000 trainning news/posts annotated with reliable or unreliable labels, and a test set consists of 2,000 examples without labels.
In our experiments, we achieve the AUC score of 94.52% on the private test set from ReINTEL's organizers.
- Score: 0.38073142980733
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents the system that we propose for the Reliable Intelligence
Indentification on Vietnamese Social Network Sites (ReINTEL) task of the
Vietnamese Language and Speech Processing 2020 (VLSP 2020) Shared Task. In this
task, the VLSP 2020 provides a dataset with approximately 6,000 trainning
news/posts annotated with reliable or unreliable labels, and a test set
consists of 2,000 examples without labels. In this paper, we conduct
experiments on different transfer learning models, which are bert4news and
PhoBERT fine-tuned to predict whether the news is reliable or not. In our
experiments, we achieve the AUC score of 94.52% on the private test set from
ReINTEL's organizers.
Related papers
- BJTU-WeChat's Systems for the WMT22 Chat Translation Task [66.81525961469494]
This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German.
Based on the Transformer, we apply several effective variants.
Our systems achieve 0.810 and 0.946 COMET scores.
arXiv Detail & Related papers (2022-11-28T02:35:04Z) - UrduFake@FIRE2020: Shared Track on Fake News Identification in Urdu [62.6928395368204]
This paper gives the overview of the first shared task at FIRE 2020 on fake news detection in the Urdu language.
The goal is to identify fake news using a dataset composed of 900 annotated news articles for training and 400 news articles for testing.
The dataset contains news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and (v) Business.
arXiv Detail & Related papers (2022-07-25T03:46:51Z) - Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2020 [62.6928395368204]
Task was posed as a binary classification task, in which the goal is to differentiate between real and fake news.
We provided a dataset divided into 900 annotated news articles for training and 400 news articles for testing.
42 teams from 6 different countries (India, China, Egypt, Germany, Pakistan, and the UK) registered for the task.
arXiv Detail & Related papers (2022-07-25T03:41:32Z) - Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2021 [55.41644538483948]
The goal of the shared task is to motivate the community to come up with efficient methods for solving this vital problem.
The training set contains 1300 annotated news articles -- 750 real news, 550 fake news, while the testing set contains 300 news articles -- 200 real, 100 fake news.
The best performing system obtained an F1-macro score of 0.679, which is lower than the past year's best result of 0.907 F1-macro.
arXiv Detail & Related papers (2022-07-11T18:58:36Z) - NLPBK at VLSP-2020 shared task: Compose transformer pretrained models
for Reliable Intelligence Identification on Social network [0.0]
This paper describes our method for tuning a transformer-based pretrained model, to adaptation with Reliable Intelligence Identification on Vietnamese SNSs problem.
We also proposed a model that combines bert-base pretrained models with some metadata features, such as the number of comments, number of likes, images of SNS documents.
With appropriate training techniques, our model is able to achieve 0.9392 ROC-AUC on public test set and the final version settles at top 2 ROC-AUC (0.9513) on private test set.
arXiv Detail & Related papers (2021-01-29T16:19:28Z) - ReINTEL: A Multimodal Data Challenge for Responsible Information
Identification on Social Network Sites [7.653131137068877]
This paper reports on the ReINTEL Shared Task for Responsible Information Identification on social network sites.
Given a piece of news with respective textual, visual content and metadata, participants are required to classify whether the news is reliable' or unreliable'
We introduce a novel human-annotated dataset of over 10,000 news collected from a social network in Vietnam.
arXiv Detail & Related papers (2020-12-16T12:17:08Z) - Leveraging Transfer Learning for Reliable Intelligence Identification on
Vietnamese SNSs (ReINTEL) [0.8602553195689513]
We exploit both of monolingual and multilingual pre-trained models.
Our team achieved a score of 0.9378 at ROC-AUC metric in the private test set.
arXiv Detail & Related papers (2020-12-10T15:43:50Z) - WeChat Neural Machine Translation Systems for WMT20 [61.03013964996131]
Our system is based on the Transformer with effective variants and the DTMT architecture.
In our experiments, we employ data selection, several synthetic data generation approaches, advanced finetuning approaches and self-bleu based model ensemble.
Our constrained Chinese to English system achieves 36.9 case-sensitive BLEU score, which is the highest among all submissions.
arXiv Detail & Related papers (2020-10-01T08:15:09Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.