Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of
claims using transformer-based models
- URL: http://arxiv.org/abs/2009.02431v1
- Date: Sat, 5 Sep 2020 01:44:11 GMT
- Title: Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of
claims using transformer-based models
- Authors: Evan Williams, Paul Rodrigues, Valerie Novak
- Abstract summary: We introduce the strategies used by the Accenture Team for the CLEF 2020 CheckThat! Lab, Task 1, on English and Arabic.
This shared task evaluated whether a claim in social media text should be professionally fact checked.
We utilized BERT and RoBERTa models to identify claims in social media text a professional fact-checker should review.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce the strategies used by the Accenture Team for the CLEF2020
CheckThat! Lab, Task 1, on English and Arabic. This shared task evaluated
whether a claim in social media text should be professionally fact checked. To
a journalist, a statement presented as fact, which would be of interest to a
large audience, requires professional fact-checking before dissemination. We
utilized BERT and RoBERTa models to identify claims in social media text a
professional fact-checker should review, and rank these in priority order for
the fact-checker. For the English challenge, we fine-tuned a RoBERTa model and
added an extra mean pooling layer and a dropout layer to enhance
generalizability to unseen text. For the Arabic task, we fine-tuned
Arabic-language BERT models and demonstrate the use of back-translation to
amplify the minority class and balance the dataset. The work presented here was
scored 1st place in the English track, and 1st, 2nd, 3rd, and 4th place in the
Arabic track.
Related papers
- ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic [51.922112625469836]
We present datasetname, the first multi-task language understanding benchmark for the Arabic language.
Our data comprises 40 tasks and 14,575 multiple-choice questions in Modern Standard Arabic (MSA) and is carefully constructed by collaborating with native speakers in the region.
Our evaluations of 35 models reveal substantial room for improvement, particularly among the best open-source models.
arXiv Detail & Related papers (2024-02-20T09:07:41Z) - Arabic Tweet Act: A Weighted Ensemble Pre-Trained Transformer Model for
Classifying Arabic Speech Acts on Twitter [0.32885740436059047]
This paper proposes a Twitter dialectal Arabic speech act classification approach based on a transformer deep learning neural network.
We proposed a BERT based weighted ensemble learning approach to integrate the advantages of various BERT models in dialectal Arabic speech acts classification.
The results show that the best BERT model is araBERTv2-Twitter models with a macro-averaged F1 score and an accuracy of 0.73 and 0.84, respectively.
arXiv Detail & Related papers (2024-01-30T19:01:24Z) - AceGPT, Localizing Large Language Models in Arabic [73.39989503874634]
The paper proposes a comprehensive solution that includes pre-training with Arabic texts, Supervised Fine-Tuning (SFT) utilizing native Arabic instructions, and GPT-4 responses in Arabic.
The goal is to cultivate culturally cognizant and value-aligned Arabic LLMs capable of accommodating the diverse, application-specific needs of Arabic-speaking communities.
arXiv Detail & Related papers (2023-09-21T13:20:13Z) - Cross-Lingual NER for Financial Transaction Data in Low-Resource
Languages [70.25418443146435]
We propose an efficient modeling framework for cross-lingual named entity recognition in semi-structured text data.
We employ two independent datasets of SMSs in English and Arabic, each carrying semi-structured banking transaction information.
With access to only 30 labeled samples, our model can generalize the recognition of merchants, amounts, and other fields from English to Arabic.
arXiv Detail & Related papers (2023-07-16T00:45:42Z) - The Effect of Normalization for Bi-directional Amharic-English Neural
Machine Translation [53.907805815477126]
This paper presents the first relatively large-scale Amharic-English parallel sentence dataset.
We build bi-directional Amharic-English translation models by fine-tuning the existing Facebook M2M100 pre-trained model.
The results show that the normalization of Amharic homophone characters increases the performance of Amharic-English machine translation in both directions.
arXiv Detail & Related papers (2022-10-27T07:18:53Z) - Tencent AI Lab - Shanghai Jiao Tong University Low-Resource Translation
System for the WMT22 Translation Task [49.916963624249355]
This paper describes Tencent AI Lab - Shanghai Jiao Tong University (TAL-SJTU) Low-Resource Translation systems for the WMT22 shared task.
We participate in the general translation task on English$Leftrightarrow$Livonian.
Our system is based on M2M100 with novel techniques that adapt it to the target language pair.
arXiv Detail & Related papers (2022-10-17T04:34:09Z) - Z-Index at CheckThat! Lab 2022: Check-Worthiness Identification on Tweet
Text [2.0887898772540217]
We describe our participation in Subtask-1A: Check-worthiness of tweets (English, Dutch and Spanish) of CheckThat! lab at CLEF 2022.
We performed standard preprocessing steps and applied different models to identify whether a given text is worthy of fact checking or not.
We also used BERT multilingual (BERT-m) and XLM-RoBERTa-base pre-trained models for the experiments.
arXiv Detail & Related papers (2022-07-15T06:21:35Z) - Overview of the CLEF--2021 CheckThat! Lab on Detecting Check-Worthy
Claims, Previously Fact-Checked Claims, and Fake News [21.574997165145486]
We describe the fourth edition of the CheckThat! Lab, part of the 2021 Conference and Labs of the Evaluation Forum (CLEF)
The lab evaluates technology supporting tasks related to factuality, and covers Arabic, Bulgarian, English, Spanish, and Turkish.
arXiv Detail & Related papers (2021-09-23T06:10:36Z) - Accenture at CheckThat! 2021: Interesting claim identification and
ranking with contextually sensitive lexical training data augmentation [0.0]
This paper discusses the approach used by the Accenture Team for CLEF2021 CheckThat! Lab, Task 1.
It identifies whether a claim made in social media would be interesting to a wide audience and should be fact-checked.
Twitter training and test data were provided in English, Arabic, Spanish, Turkish, and Bulgarian.
arXiv Detail & Related papers (2021-07-12T18:46:47Z) - TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [113.29476656550342]
We present TaBERT, a pretrained LM that jointly learns representations for NL sentences and tables.
TaBERT is trained on a large corpus of 26 million tables and their English contexts.
Implementation of the model will be available at http://fburl.com/TaBERT.
arXiv Detail & Related papers (2020-05-17T17:26:40Z) - An Empirical Study of Pre-trained Transformers for Arabic Information
Extraction [25.10651348642055]
We pre-train a customized bilingual BERT, dubbed GigaBERT, specifically for Arabic NLP and English-to-Arabic zero-shot transfer learning.
We study GigaBERT's effectiveness on zero-short transfer across four IE tasks: named entity recognition, part-of-speech tagging, argument role labeling, and relation extraction.
Our best model significantly outperforms mBERT, XLM-RoBERTa, and AraBERT in both the supervised and zero-shot transfer settings.
arXiv Detail & Related papers (2020-04-30T00:01:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.