ReINTEL: A Multimodal Data Challenge for Responsible Information
Identification on Social Network Sites
- URL: http://arxiv.org/abs/2012.08895v1
- Date: Wed, 16 Dec 2020 12:17:08 GMT
- Title: ReINTEL: A Multimodal Data Challenge for Responsible Information
Identification on Social Network Sites
- Authors: Duc-Trong Le, Xuan-Son Vu, Nhu-Dung To, Huu-Quang Nguyen, Thuy-Trinh
Nguyen, Linh Le, Anh-Tuan Nguyen, Minh-Duc Hoang, Nghia Le, Huyen Nguyen and
Hoang D. Nguyen
- Abstract summary: This paper reports on the ReINTEL Shared Task for Responsible Information Identification on social network sites.
Given a piece of news with respective textual, visual content and metadata, participants are required to classify whether the news is reliable' or unreliable'
We introduce a novel human-annotated dataset of over 10,000 news collected from a social network in Vietnam.
- Score: 7.653131137068877
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper reports on the ReINTEL Shared Task for Responsible Information
Identification on social network sites, which is hosted at the seventh annual
workshop on Vietnamese Language and Speech Processing (VLSP 2020). Given a
piece of news with respective textual, visual content and metadata,
participants are required to classify whether the news is `reliable' or
`unreliable'. In order to generate a fair benchmark, we introduce a novel
human-annotated dataset of over 10,000 news collected from a social network in
Vietnam. All models will be evaluated in terms of AUC-ROC score, a typical
evaluation metric for classification. The competition was run on the Codalab
platform. Within two months, the challenge has attracted over 60 participants
and recorded nearly 1,000 submission entries.
Related papers
- Overview of the VLSP 2022 -- Abmusu Shared Task: A Data Challenge for
Vietnamese Abstractive Multi-document Summarization [0.6827423171182151]
The goal of Abmusu shared task is to develop summarization systems that could create abstractive summaries automatically for a set of documents on a topic.
We build a human-annotated dataset of 1,839 documents in 600 clusters, collected from Vietnamese news in 8 categories.
Models are evaluated and ranked in terms of textttROUGE2-F1 score, the most typical evaluation metric for document summarization problem.
arXiv Detail & Related papers (2023-11-27T04:01:13Z) - DeVAn: Dense Video Annotation for Video-Language Models [68.70692422636313]
We present a novel human annotated dataset for evaluating the ability for visual-language models to generate descriptions for real-world video clips.
The dataset contains 8.5K YouTube video clips of 20-60 seconds in duration and covers a wide range of topics and interests.
arXiv Detail & Related papers (2023-10-08T08:02:43Z) - Slovo: Russian Sign Language Dataset [83.93252084624997]
This paper presents the Russian Sign Language (RSL) video dataset Slovo, produced using crowdsourcing platforms.
The dataset contains 20,000 FullHD recordings, divided into 1,000 classes of isolated RSL gestures received by 194 signers.
arXiv Detail & Related papers (2023-05-23T21:00:42Z) - UrduFake@FIRE2020: Shared Track on Fake News Identification in Urdu [62.6928395368204]
This paper gives the overview of the first shared task at FIRE 2020 on fake news detection in the Urdu language.
The goal is to identify fake news using a dataset composed of 900 annotated news articles for training and 400 news articles for testing.
The dataset contains news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and (v) Business.
arXiv Detail & Related papers (2022-07-25T03:46:51Z) - Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2020 [62.6928395368204]
Task was posed as a binary classification task, in which the goal is to differentiate between real and fake news.
We provided a dataset divided into 900 annotated news articles for training and 400 news articles for testing.
42 teams from 6 different countries (India, China, Egypt, Germany, Pakistan, and the UK) registered for the task.
arXiv Detail & Related papers (2022-07-25T03:41:32Z) - Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2021 [55.41644538483948]
The goal of the shared task is to motivate the community to come up with efficient methods for solving this vital problem.
The training set contains 1300 annotated news articles -- 750 real news, 550 fake news, while the testing set contains 300 news articles -- 200 real, 100 fake news.
The best performing system obtained an F1-macro score of 0.679, which is lower than the past year's best result of 0.907 F1-macro.
arXiv Detail & Related papers (2022-07-11T18:58:36Z) - VLSP 2021 Shared Task: Vietnamese Machine Reading Comprehension [2.348805691644086]
This article presents details of the organization of the shared task, an overview of the methods employed by shared-task participants, and the results.
We provide a benchmark dataset named UIT-ViQuAD 2.0 for evaluating the MRC task and question answering systems for the Vietnamese language.
The UIT-ViQuAD 2.0 dataset motivates more researchers to explore Vietnamese machine reading comprehension, question answering, and question generation.
arXiv Detail & Related papers (2022-03-22T00:44:41Z) - ReINTEL Challenge 2020: Exploiting Transfer Learning Models for Reliable
Intelligence Identification on Vietnamese Social Network Sites [0.38073142980733]
This paper presents the system that we propose for the Reliable Intelligence Indentification on Vietnamese Social Network Sites (ReINTEL)
In this task, the VLSP 2020 provides a dataset with approximately 6,000 trainning news/posts annotated with reliable or unreliable labels, and a test set consists of 2,000 examples without labels.
In our experiments, we achieve the AUC score of 94.52% on the private test set from ReINTEL's organizers.
arXiv Detail & Related papers (2021-02-22T06:17:33Z) - A Sentence Cloze Dataset for Chinese Machine Reading Comprehension [64.07894249743767]
We propose a new task called Sentence Cloze-style Machine Reading (SC-MRC)
The proposed task aims to fill the right candidate sentence into the passage that has several blanks.
We built a Chinese dataset called CMRC 2019 to evaluate the difficulty of the SC-MRC task.
arXiv Detail & Related papers (2020-04-07T04:09:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.