Feature Extraction of Text for Deep Learning Algorithms: Application on
Fake News Detection
- URL: http://arxiv.org/abs/2010.05496v2
- Date: Tue, 3 Nov 2020 11:32:14 GMT
- Title: Feature Extraction of Text for Deep Learning Algorithms: Application on
Fake News Detection
- Authors: HyeonJun Kim
- Abstract summary: It will be shown that by using deep learning algorithms and alphabet frequencies of the original text of a news without any information about the sequence of the alphabet can actually be used to classify fake news and trustworthy ones in high accuracy.
It seems that alphabet frequencies contains some useful features for understanding complex context or meaning of the original text.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Feature extraction is an important process of machine learning and deep
learning, as the process make algorithms function more efficiently, and also
accurate. In natural language processing used in deception detection such as
fake news detection, several ways of feature extraction in statistical aspect
had been introduced (e.g. N-gram). In this research, it will be shown that by
using deep learning algorithms and alphabet frequencies of the original text of
a news without any information about the sequence of the alphabet can actually
be used to classify fake news and trustworthy ones in high accuracy (85\%). As
this pre-processing method makes the data notably compact but also include the
feature that is needed for the classifier, it seems that alphabet frequencies
contains some useful features for understanding complex context or meaning of
the original text.
Related papers
- Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation [67.89838237013078]
Named entity recognition (NER) models often struggle with noisy inputs.
We propose a more realistic setting in which only noisy text and its NER labels are available.
We employ a multi-view training framework that improves robust NER without retrieving text during inference.
arXiv Detail & Related papers (2024-07-26T07:30:41Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - A Novel Ehanced Move Recognition Algorithm Based on Pre-trained Models
with Positional Embeddings [6.688643243555054]
The recognition of abstracts is crucial for effectively locating the content and clarifying the article.
This paper proposes a novel enhanced move recognition algorithm with an improved pre-trained model and a gated network with attention mechanism for unstructured abstracts of Chinese scientific and technological papers.
arXiv Detail & Related papers (2023-08-14T03:20:28Z) - A Deep Learning Anomaly Detection Method in Textual Data [0.45687771576879593]
We propose using deep learning and transformer architectures combined with classical machine learning algorithms.
We used multiple machine learning methods such as Sentence Transformers, Autos, Logistic Regression and Distance calculation methods to predict anomalies.
arXiv Detail & Related papers (2022-11-25T05:18:13Z) - Refining neural network predictions using background knowledge [68.35246878394702]
We show we can use logical background knowledge in learning system to compensate for a lack of labeled training data.
We introduce differentiable refinement functions that find a corrected prediction close to the original prediction.
This algorithm finds optimal refinements on complex SAT formulas in significantly fewer iterations and frequently finds solutions where gradient descent can not.
arXiv Detail & Related papers (2022-06-10T10:17:59Z) - Development of Fake News Model using Machine Learning through Natural
Language Processing [0.7120858995754653]
We use machine learning algorithms and for identification of fake news.
Simple classification is not completely correct in fake news detection.
With the integration of machine learning and text-based processing, we can detect fake news.
arXiv Detail & Related papers (2022-01-19T09:26:15Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Hidden Markov Based Mathematical Model dedicated to Extract Ingredients
from Recipe Text [0.0]
Partof-speech tagging (POS tagging) is a pre-processing task that requires an annotated corpus.
I performed a mathematical model based on Hidden Markov structures and obtained a high-level accuracy of ingredients extracted from text recipe.
arXiv Detail & Related papers (2021-09-28T14:38:11Z) - Does a Hybrid Neural Network based Feature Selection Model Improve Text
Classification? [9.23545668304066]
We propose a hybrid feature selection method for obtaining relevant features.
We then present three ways of implementing a feature selection and neural network pipeline.
We also observed a slight increase in accuracy on some datasets.
arXiv Detail & Related papers (2021-01-22T09:12:19Z) - Predicting What You Already Know Helps: Provable Self-Supervised
Learning [60.27658820909876]
Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data.
We show a mechanism exploiting the statistical connections between certain em reconstruction-based pretext tasks that guarantee to learn a good representation.
We prove the linear layer yields small approximation error even for complex ground truth function class.
arXiv Detail & Related papers (2020-08-03T17:56:13Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.