Z-Index at CheckThat! Lab 2022: Check-Worthiness Identification on Tweet
Text
- URL: http://arxiv.org/abs/2207.07308v1
- Date: Fri, 15 Jul 2022 06:21:35 GMT
- Title: Z-Index at CheckThat! Lab 2022: Check-Worthiness Identification on Tweet
Text
- Authors: Prerona Tarannum, Firoj Alam, Md. Arid Hasan, Sheak Rashed Haider
Noori
- Abstract summary: We describe our participation in Subtask-1A: Check-worthiness of tweets (English, Dutch and Spanish) of CheckThat! lab at CLEF 2022.
We performed standard preprocessing steps and applied different models to identify whether a given text is worthy of fact checking or not.
We also used BERT multilingual (BERT-m) and XLM-RoBERTa-base pre-trained models for the experiments.
- Score: 2.0887898772540217
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The wide use of social media and digital technologies facilitates sharing
various news and information about events and activities. Despite sharing
positive information misleading and false information is also spreading on
social media. There have been efforts in identifying such misleading
information both manually by human experts and automatic tools. Manual effort
does not scale well due to the high volume of information, containing factual
claims, are appearing online. Therefore, automatically identifying check-worthy
claims can be very useful for human experts. In this study, we describe our
participation in Subtask-1A: Check-worthiness of tweets (English, Dutch and
Spanish) of CheckThat! lab at CLEF 2022. We performed standard preprocessing
steps and applied different models to identify whether a given text is worthy
of fact checking or not. We use the oversampling technique to balance the
dataset and applied SVM and Random Forest (RF) with TF-IDF representations. We
also used BERT multilingual (BERT-m) and XLM-RoBERTa-base pre-trained models
for the experiments. We used BERT-m for the official submissions and our
systems ranked as 3rd, 5th, and 12th in Spanish, Dutch, and English,
respectively. In further experiments, our evaluation shows that transformer
models (BERT-m and XLM-RoBERTa-base) outperform the SVM and RF in Dutch and
English languages where a different scenario is observed for Spanish.
Related papers
- Cross-lingual Transfer Learning for Check-worthy Claim Identification
over Twitter [7.601937548486356]
Misinformation spread over social media has become an undeniable infodemic.
We present a systematic study of six approaches for cross-lingual check-worthiness estimation across pairs of five diverse languages with the help of Multilingual BERT (mBERT) model.
Our results show that for some language pairs, zero-shot cross-lingual transfer is possible and can perform as good as monolingual models that are trained on the target language.
arXiv Detail & Related papers (2022-11-09T18:18:53Z) - Detecting Text Formality: A Study of Text Classification Approaches [78.11745751651708]
This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods.
We conducted three types of experiments -- monolingual, multilingual, and cross-lingual.
The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task.
arXiv Detail & Related papers (2022-04-19T16:23:07Z) - BERTuit: Understanding Spanish language in Twitter through a native
transformer [70.77033762320572]
We present bfBERTuit, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets.
Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network.
arXiv Detail & Related papers (2022-04-07T14:28:51Z) - Zero Shot Crosslingual Eye-Tracking Data Prediction using Multilingual
Transformer Models [0.0]
We describe our submission to the CMCL 2022 shared task on predicting human reading patterns for multi-lingual dataset.
Our model uses text representations from transformers and some hand engineered features with a regression layer on top to predict statistical measures of mean and standard deviation.
We train an end to end model to extract meaningful information from different languages and test our model on two seperate datasets.
arXiv Detail & Related papers (2022-03-30T17:11:48Z) - Matching Tweets With Applicable Fact-Checks Across Languages [27.762055254009017]
We focus on automatically finding existing fact-checks for claims made in social media posts (tweets)
We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings.
We present promising results for "match" classification (93% average accuracy) in four language pairs.
arXiv Detail & Related papers (2022-02-14T23:33:02Z) - FacTeR-Check: Semi-automated fact-checking through Semantic Similarity
and Natural Language Inference [61.068947982746224]
FacTeR-Check enables retrieving fact-checked information, unchecked claims verification and tracking dangerous information over social media.
The architecture is validated using a new dataset called NLI19-SP that is publicly released with COVID-19 related hoaxes and tweets from Spanish social media.
Our results show state-of-the-art performance on the individual benchmarks, as well as producing useful analysis of the evolution over time of 61 different hoaxes.
arXiv Detail & Related papers (2021-10-27T15:44:54Z) - FBERT: A Neural Transformer for Identifying Offensive Content [67.12838911384024]
fBERT is a BERT model retrained on SOLID, the largest English offensive language identification corpus available with over $1.4$ million offensive instances.
We evaluate fBERT's performance on identifying offensive content on multiple English datasets and we test several thresholds for selecting instances from SOLID.
The fBERT model will be made freely available to the community.
arXiv Detail & Related papers (2021-09-10T19:19:26Z) - Transfer Learning for Mining Feature Requests and Bug Reports from
Tweets and App Store Reviews [4.446419663487345]
Existing approaches fail to detect feature requests and bug reports with high Recall and acceptable Precision.
We train both monolingual and multilingual BERT models and compare the performance with state-of-the-art methods.
arXiv Detail & Related papers (2021-08-02T06:51:13Z) - It's not Greek to mBERT: Inducing Word-Level Translations from
Multilingual BERT [54.84185432755821]
multilingual BERT (mBERT) learns rich cross-lingual representations, that allow for transfer across languages.
We study the word-level translation information embedded in mBERT and present two simple methods that expose remarkable translation capabilities with no fine-tuning.
arXiv Detail & Related papers (2020-10-16T09:49:32Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z) - Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of
claims using transformer-based models [0.0]
We introduce the strategies used by the Accenture Team for the CLEF 2020 CheckThat! Lab, Task 1, on English and Arabic.
This shared task evaluated whether a claim in social media text should be professionally fact checked.
We utilized BERT and RoBERTa models to identify claims in social media text a professional fact-checker should review.
arXiv Detail & Related papers (2020-09-05T01:44:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.