Employing Sentence Space Embedding for Classification of Data Stream from Fake News Domain
- URL: http://arxiv.org/abs/2407.10807v1
- Date: Mon, 15 Jul 2024 15:23:21 GMT
- Title: Employing Sentence Space Embedding for Classification of Data Stream from Fake News Domain
- Authors: Paweł Zyblewski, Jakub Klikowski, Weronika Borek-Marciniec, Paweł Ksieniewicz,
- Abstract summary: This paper presents an approach to natural language data stream classification using the sentence space method.
It allows the use of convolutional deep networks dedicated to image classification to solve the task of recognizing fake news based on text data.
Based on the real-life Fakeddit dataset, the proposed approach was compared with state-of-the-art algorithms for data stream classification.
- Score: 0.24999074238880487
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Tabular data is considered the last unconquered castle of deep learning, yet the task of data stream classification is stated to be an equally important and demanding research area. Due to the temporal constraints, it is assumed that deep learning methods are not the optimal solution for application in this field. However, excluding the entire -- and prevalent -- group of methods seems rather rash given the progress that has been made in recent years in its development. For this reason, the following paper is the first to present an approach to natural language data stream classification using the sentence space method, which allows for encoding text into the form of a discrete digital signal. This allows the use of convolutional deep networks dedicated to image classification to solve the task of recognizing fake news based on text data. Based on the real-life Fakeddit dataset, the proposed approach was compared with state-of-the-art algorithms for data stream classification based on generalization ability and time complexity.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - LatentForensics: Towards frugal deepfake detection in the StyleGAN latent space [2.629091178090276]
We propose a deepfake detection method that operates in the latent space of a state-of-the-art generative adversarial network (GAN) trained on high-quality face images.
Experimental results on standard datasets reveal that the proposed approach outperforms other state-of-the-art deepfake classification methods.
arXiv Detail & Related papers (2023-03-30T08:36:48Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation.
We develop a new adversarial learning based method, which is simple and efficient to apply.
We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z) - Dominant Set-based Active Learning for Text Classification and its
Application to Online Social Media [0.0]
We present a novel pool-based active learning method for the training of large unlabeled corpus with minimum annotation cost.
Our proposed method does not have any parameters to be tuned, making it dataset-independent.
Our method achieves a higher performance in comparison to the state-of-the-art active learning strategies.
arXiv Detail & Related papers (2022-01-28T19:19:03Z) - Active Weighted Aging Ensemble for Drifted Data Stream Classification [2.277447144331876]
Concept drift destabilizes the performance of the classification model and seriously degrades its quality.
The proposed method has been evaluated through computer experiments using both real and generated data streams.
The results confirm the high quality of the proposed algorithm over state-of-the-art methods.
arXiv Detail & Related papers (2021-12-19T13:52:53Z) - MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake
Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos.
We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z) - The Application of Active Query K-Means in Text Classification [0.0]
Active learning is a state-of-art machine learning approach to deal with an abundance of unlabeled data.
Traditional unsupervised k-means clustering is first modified into a semi-supervised version in this research.
A novel attempt is applied to further extend the algorithm into active learning scenario with Penalized Min-Max-selection.
After tested on a Chinese news dataset, it shows a consistent increase in accuracy while lowering the cost in training.
arXiv Detail & Related papers (2021-07-16T03:06:35Z) - TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for
Unsupervised Sentence Embedding Learning [53.32740707197856]
We present a new state-of-the-art unsupervised method based on pre-trained Transformers and Sequential Denoising Auto-Encoder (TSDAE)
It can achieve up to 93.1% of the performance of in-domain supervised approaches.
arXiv Detail & Related papers (2021-04-14T17:02:18Z) - Revisiting Mahalanobis Distance for Transformer-Based Out-of-Domain
Detection [60.88952532574564]
This paper conducts a thorough comparison of out-of-domain intent detection methods.
We evaluate multiple contextual encoders and methods, proven to be efficient, on three standard datasets for intent classification.
Our main findings show that fine-tuning Transformer-based encoders on in-domain data leads to superior results.
arXiv Detail & Related papers (2021-01-11T09:10:58Z) - Focus on Semantic Consistency for Cross-domain Crowd Understanding [34.560447389853614]
Some domain adaptation algorithms try to liberate it by training models with synthetic data.
We found that a mass of estimation errors in the background areas impede the performance of the existing methods.
In this paper, we propose a domain adaptation method to eliminate it.
arXiv Detail & Related papers (2020-02-20T08:51:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.