An Intelligent CNN-VAE Text Representation Technology Based on Text
Semantics for Comprehensive Big Data
- URL: http://arxiv.org/abs/2008.12522v1
- Date: Fri, 28 Aug 2020 07:39:45 GMT
- Title: An Intelligent CNN-VAE Text Representation Technology Based on Text
Semantics for Comprehensive Big Data
- Authors: Genggeng Liu, Canyang Guo, Lin Xie, Wenxi Liu, Naixue Xiong and
Guolong Chen
- Abstract summary: A text feature representation model based on convolutional neural network (CNN) and variational autoencoder (VAE) is proposed.
The proposed model outperforms in k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) classification algorithms.
- Score: 15.680918844684454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the era of big data, a large number of text data generated by the Internet
has given birth to a variety of text representation methods. In natural
language processing (NLP), text representation transforms text into vectors
that can be processed by computer without losing the original semantic
information. However, these methods are difficult to effectively extract the
semantic features among words and distinguish polysemy in language. Therefore,
a text feature representation model based on convolutional neural network (CNN)
and variational autoencoder (VAE) is proposed to extract the text features and
apply the obtained text feature representation on the text classification
tasks. CNN is used to extract the features of text vector to get the semantics
among words and VAE is introduced to make the text feature space more
consistent with Gaussian distribution. In addition, the output of the improved
word2vec model is employed as the input of the proposed model to distinguish
different meanings of the same word in different contexts. The experimental
results show that the proposed model outperforms in k-nearest neighbor (KNN),
random forest (RF) and support vector machine (SVM) classification algorithms.
Related papers
- RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder
for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE)
It encodes the text corpus into a latent space, capturing current and future information from both source and target text.
Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z) - Description-Based Text Similarity [59.552704474862004]
We identify the need to search for texts based on abstract descriptions of their content.
We propose an alternative model that significantly improves when used in standard nearest neighbor search.
arXiv Detail & Related papers (2023-05-21T17:14:31Z) - SLCNN: Sentence-Level Convolutional Neural Network for Text
Classification [0.0]
Convolutional neural network (CNN) has shown remarkable success in the task of text classification.
New baseline models have been studied for text classification using CNN.
Results have shown that the proposed models have better performance, particularly in the longer documents.
arXiv Detail & Related papers (2023-01-27T13:16:02Z) - Text Revision by On-the-Fly Representation Optimization [76.11035270753757]
Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems.
We present an iterative in-place editing approach for text revision, which requires no parallel data.
It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification.
arXiv Detail & Related papers (2022-04-15T07:38:08Z) - TextConvoNet:A Convolutional Neural Network based Architecture for Text
Classification [0.34410212782758043]
We present a CNN-based architecture TextConvoNet that not only extracts the intra-sentence n-gram features but also captures the inter-sentence n-gram features in input text data.
The experimental results show that the presented TextConvoNet outperforms state-of-the-art machine learning and deep learning models for text classification purposes.
arXiv Detail & Related papers (2022-03-10T06:09:56Z) - Text Smoothing: Enhance Various Data Augmentation Methods on Text
Classification Tasks [47.5423959822716]
Smoothed representation is the probability of candidate tokens obtained from a pre-trained masked language model.
We propose an efficient data augmentation method, termed text smoothing, by converting a sentence from its one-hot representation to a controllable smoothed representation.
arXiv Detail & Related papers (2022-02-28T14:54:08Z) - Dependency Parsing based Semantic Representation Learning with Graph
Neural Network for Enhancing Expressiveness of Text-to-Speech [49.05471750563229]
We propose a semantic representation learning method based on graph neural network, considering dependency relations of a sentence.
We show that our proposed method outperforms the baseline using vanilla BERT features both in LJSpeech and Bilzzard Challenge 2013 datasets.
arXiv Detail & Related papers (2021-04-14T13:09:51Z) - Contextualized Spoken Word Representations from Convolutional
Autoencoders [2.28438857884398]
This paper proposes a Convolutional Autoencoder based neural architecture to model syntactically and semantically adequate contextualized representations of varying length spoken words.
The proposed model was able to demonstrate its robustness when compared to the other two language-based models.
arXiv Detail & Related papers (2020-07-06T16:48:11Z) - Context based Text-generation using LSTM networks [0.5330240017302621]
The proposed model is trained to generate text for a given set of input words along with a context vector.
The results are evaluated based on the semantic closeness of the generated text to the given context.
arXiv Detail & Related papers (2020-04-30T18:39:25Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.