Related papers: Evaluating BERT-based Pre-training Language Models for Detecting Misinformation

Evaluating BERT-based Pre-training Language Models for Detecting Misinformation

URL: http://arxiv.org/abs/2203.07731v1
Date: Tue, 15 Mar 2022 08:54:36 GMT
Title: Evaluating BERT-based Pre-training Language Models for Detecting Misinformation
Authors: Rini Anggrainingsih, Ghulam Mubashar Hassan and Amitava Datta
Abstract summary: It is challenging to control the quality of online information due to the lack of supervision over all the information posted online. There is a need for automated rumour detection techniques to limit the adverse effects of spreading misinformation. This study proposes the BERT-based pre-trained language models to encode text data into vectors and utilise neural network models to classify these vectors to detect misinformation.
Score: 2.1915057426589746
License: http://creativecommons.org/licenses/by/4.0/
Abstract: It is challenging to control the quality of online information due to the lack of supervision over all the information posted online. Manual checking is almost impossible given the vast number of posts made on online media and how quickly they spread. Therefore, there is a need for automated rumour detection techniques to limit the adverse effects of spreading misinformation. Previous studies mainly focused on finding and extracting the significant features of text data. However, extracting features is time-consuming and not a highly effective process. This study proposes the BERT- based pre-trained language models to encode text data into vectors and utilise neural network models to classify these vectors to detect misinformation. Furthermore, different language models (LM) ' performance with different trainable parameters was compared. The proposed technique is tested on different short and long text datasets. The result of the proposed technique has been compared with the state-of-the-art techniques on the same datasets. The results show that the proposed technique performs better than the state-of-the-art techniques. We also tested the proposed technique by combining the datasets. The results demonstrated that the large data training and testing size considerably improves the technique's performance.

Related papers

READ: Reinforcement-based Adversarial Learning for Text Classification with Limited Labeled Data [7.152603583363887]
Pre-trained transformer models such as BERT have shown massive gains across many text classification tasks. This paper proposes a method that encapsulates reinforcement learning-based text generation and semi-supervised adversarial learning approaches. Our method READ, Reinforcement-based Adversarial learning, utilizes an unlabeled dataset to generate diverse synthetic text through reinforcement learning.
arXiv Detail & Related papers (2025-01-14T11:39:55Z)
Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled Data [35.431340001608476]
Traditional data mining methods are inadequate when faced with large-scale, high-dimensional and complex data. This study introduces semi-supervised learning methods, aiming to improve the algorithm's ability to utilize unlabeled data. Specifically, we adopt a self-training method and combine it with a convolutional neural network (CNN) for image feature extraction and classification.
arXiv Detail & Related papers (2024-11-27T18:59:50Z)
Investigating the Impact of Semi-Supervised Methods with Data Augmentation on Offensive Language Detection in Romanian Language [2.2823100315094624]
Offensive language detection is a crucial task in today's digital landscape. Building robust offensive language detection models requires large amounts of labeled data. Semi-supervised learning offers a feasible solution by utilizing labeled and unlabeled data.
arXiv Detail & Related papers (2024-07-29T15:02:51Z)
Probing Language Models for Pre-training Data Detection [11.37731401086372]
We propose to utilize the probing technique for pre-training data detection by examining the model's internal activations. Our method is simple and effective and leads to more trustworthy pre-training data detection.
arXiv Detail & Related papers (2024-06-03T13:58:04Z)
Improving Sampling Methods for Fine-tuning SentenceBERT in Text Streams [49.3179290313959]
This study explores the efficacy of seven text sampling methods designed to selectively fine-tune language models. We precisely assess the impact of these methods on fine-tuning the SBERT model using four different loss functions. Our findings indicate that Softmax loss and Batch All Triplets loss are particularly effective for text stream classification.
arXiv Detail & Related papers (2024-03-18T23:41:52Z)
Towards Better Query Classification with Multi-Expert Knowledge Condensation in JD Ads Search [12.701416688678622]
shallow model FastText is widely used for efficient online inference. BERT is an effective solution, but it will cause a higher online inference latency and more expensive computing costs. We propose knowledge condensation to boost the classification performance of the online FastText model under strict low latency constraints.
arXiv Detail & Related papers (2023-08-02T12:05:01Z)
DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text [26.02072055825044]
We introduce two novel zero-shot methods for detecting machine-generated text by leveraging the log rank information. One is called DetectLLM-LRR, which is fast and efficient, and the other is called DetectLLM-NPR, which is more accurate, but slower due to the need for perturbations. Our experiments on three datasets and seven language models show that our proposed methods improve over the state of the art by 3.9 and 1.75 AUROC points absolute.
arXiv Detail & Related papers (2023-05-23T11:18:30Z)
Transformer-based approaches to Sentiment Detection [55.41644538483948]
We examined the performance of four different types of state-of-the-art transformer models for text classification. The RoBERTa transformer model performs best on the test dataset with a score of 82.6% and is highly recommended for quality predictions.
arXiv Detail & Related papers (2023-03-13T17:12:03Z)
AugGPT: Leveraging ChatGPT for Text Data Augmentation [59.76140039943385]
We propose a text data augmentation approach based on ChatGPT (named AugGPT) AugGPT rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples. Experiment results on few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach.
arXiv Detail & Related papers (2023-02-25T06:58:16Z)
Panning for gold: Lessons learned from the platform-agnostic automated detection of political content in textual data [48.7576911714538]
We discuss how these techniques can be used to detect political content across different platforms. We compare the performance of three groups of detection techniques relying on dictionaries, supervised machine learning, or neural networks. Our results show the limited impact of preprocessing on model performance, with the best results for less noisy data being achieved by neural network- and machine-learning-based models.
arXiv Detail & Related papers (2022-07-01T15:23:23Z)
Sample Efficient Approaches for Idiomaticity Detection [6.481818246474555]
This work explores sample efficient methods of idiomaticity detection. In particular, we study the impact of Pattern Exploit Training (PET), a few-shot method of classification, and BERTRAM, an efficient method of creating contextual embeddings. Our experiments show that whilePET improves performance on English, they are much less effective on Portuguese and Galician, leading to an overall performance about on par with vanilla mBERT.
arXiv Detail & Related papers (2022-05-23T13:46:35Z)
Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms. Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications. By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.