A Deep Learning Anomaly Detection Method in Textual Data
- URL: http://arxiv.org/abs/2211.13900v1
- Date: Fri, 25 Nov 2022 05:18:13 GMT
- Title: A Deep Learning Anomaly Detection Method in Textual Data
- Authors: Amir Jafari
- Abstract summary: We propose using deep learning and transformer architectures combined with classical machine learning algorithms.
We used multiple machine learning methods such as Sentence Transformers, Autos, Logistic Regression and Distance calculation methods to predict anomalies.
- Score: 0.45687771576879593
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this article, we propose using deep learning and transformer architectures
combined with classical machine learning algorithms to detect and identify text
anomalies in texts. Deep learning model provides a very crucial context
information about the textual data which all textual context are converted to a
numerical representation. We used multiple machine learning methods such as
Sentence Transformers, Auto Encoders, Logistic Regression and Distance
calculation methods to predict anomalies. The method are tested on the texts
data and we used syntactic data from different source injected into the
original text as anomalies or use them as target. Different methods and
algorithm are explained in the field of outlier detection and the results of
the best technique is presented. These results suggest that our algorithm could
potentially reduce false positive rates compared with other anomaly detection
methods that we are testing.
Related papers
- Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Exploring Machine Learning and Transformer-based Approaches for
Deceptive Text Classification: A Comparative Analysis [0.0]
This study presents a comparative analysis of machine learning and transformer-based approaches for deceptive text classification.
We investigate the effectiveness of traditional machine learning algorithms and state-of-the-art transformer models, such as BERT, XLNET, DistilBERT, and RoBERTa.
The results of this study shed light on the strengths and limitations of machine learning and transformer-based methods for deceptive text classification.
arXiv Detail & Related papers (2023-08-10T10:07:00Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - Evaluating BERT-based Pre-training Language Models for Detecting
Misinformation [2.1915057426589746]
It is challenging to control the quality of online information due to the lack of supervision over all the information posted online.
There is a need for automated rumour detection techniques to limit the adverse effects of spreading misinformation.
This study proposes the BERT-based pre-trained language models to encode text data into vectors and utilise neural network models to classify these vectors to detect misinformation.
arXiv Detail & Related papers (2022-03-15T08:54:36Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z) - Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or
Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms.
Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications.
By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - A Fast Randomized Algorithm for Massive Text Normalization [26.602776972067936]
We present FLAN, a scalable randomized algorithm to clean and canonicalize massive text data.
Our algorithm relies on the Jaccard similarity between words to suggest correction results.
Our experimental results on real-world datasets demonstrate the efficiency and efficacy of FLAN.
arXiv Detail & Related papers (2021-10-06T19:18:17Z) - Autoregressive Belief Propagation for Decoding Block Codes [113.38181979662288]
We revisit recent methods that employ graph neural networks for decoding error correcting codes.
Our method violates the symmetry conditions that enable the other methods to train exclusively with the zero-word.
Despite not having the luxury of training on a single word, and the inability to train on more than a small fraction of the relevant sample space, we demonstrate effective training.
arXiv Detail & Related papers (2021-01-23T17:14:55Z) - Text Detection on Roughly Placed Books by Leveraging a Learning-based
Model Trained with Another Domain Data [0.30458514384586394]
In this paper, we focus on how to generate bounding boxes that are appropriate to grasp text areas on books.
We develop algorithms that construct the bounding boxes by improving and leveraging the results of a learning-based method.
Our algorithms can utilize different learning-based approaches to detect scene texts.
arXiv Detail & Related papers (2020-06-26T05:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.