Related papers: An Intelligent CNN-VAE Text Representation Technology Based on Text Semantics for Comprehensive Big Data

An Intelligent CNN-VAE Text Representation Technology Based on Text Semantics for Comprehensive Big Data

URL: http://arxiv.org/abs/2008.12522v1
Date: Fri, 28 Aug 2020 07:39:45 GMT
Title: An Intelligent CNN-VAE Text Representation Technology Based on Text Semantics for Comprehensive Big Data
Authors: Genggeng Liu, Canyang Guo, Lin Xie, Wenxi Liu, Naixue Xiong and Guolong Chen
Abstract summary: A text feature representation model based on convolutional neural network (CNN) and variational autoencoder (VAE) is proposed. The proposed model outperforms in k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) classification algorithms.
Score: 15.680918844684454
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the era of big data, a large number of text data generated by the Internet has given birth to a variety of text representation methods. In natural language processing (NLP), text representation transforms text into vectors that can be processed by computer without losing the original semantic information. However, these methods are difficult to effectively extract the semantic features among words and distinguish polysemy in language. Therefore, a text feature representation model based on convolutional neural network (CNN) and variational autoencoder (VAE) is proposed to extract the text features and apply the obtained text feature representation on the text classification tasks. CNN is used to extract the features of text vector to get the semantics among words and VAE is introduced to make the text feature space more consistent with Gaussian distribution. In addition, the output of the improved word2vec model is employed as the input of the proposed model to distinguish different meanings of the same word in different contexts. The experimental results show that the proposed model outperforms in k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) classification algorithms.

Related papers

RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE) It encodes the text corpus into a latent space, capturing current and future information from both source and target text. Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z)
Description-Based Text Similarity [59.552704474862004]
We identify the need to search for texts based on abstract descriptions of their content. We propose an alternative model that significantly improves when used in standard nearest neighbor search.
arXiv Detail & Related papers (2023-05-21T17:14:31Z)
SLCNN: Sentence-Level Convolutional Neural Network for Text Classification [0.0]
Convolutional neural network (CNN) has shown remarkable success in the task of text classification. New baseline models have been studied for text classification using CNN. Results have shown that the proposed models have better performance, particularly in the longer documents.
arXiv Detail & Related papers (2023-01-27T13:16:02Z)
Text Revision by On-the-Fly Representation Optimization [76.11035270753757]
Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems. We present an iterative in-place editing approach for text revision, which requires no parallel data. It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification.
arXiv Detail & Related papers (2022-04-15T07:38:08Z)
TextConvoNet:A Convolutional Neural Network based Architecture for Text Classification [0.34410212782758043]
We present a CNN-based architecture TextConvoNet that not only extracts the intra-sentence n-gram features but also captures the inter-sentence n-gram features in input text data. The experimental results show that the presented TextConvoNet outperforms state-of-the-art machine learning and deep learning models for text classification purposes.
arXiv Detail & Related papers (2022-03-10T06:09:56Z)
Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks [47.5423959822716]
Smoothed representation is the probability of candidate tokens obtained from a pre-trained masked language model. We propose an efficient data augmentation method, termed text smoothing, by converting a sentence from its one-hot representation to a controllable smoothed representation.
arXiv Detail & Related papers (2022-02-28T14:54:08Z)
Dependency Parsing based Semantic Representation Learning with Graph Neural Network for Enhancing Expressiveness of Text-to-Speech [49.05471750563229]
We propose a semantic representation learning method based on graph neural network, considering dependency relations of a sentence. We show that our proposed method outperforms the baseline using vanilla BERT features both in LJSpeech and Bilzzard Challenge 2013 datasets.
arXiv Detail & Related papers (2021-04-14T13:09:51Z)
Contextualized Spoken Word Representations from Convolutional Autoencoders [2.28438857884398]
This paper proposes a Convolutional Autoencoder based neural architecture to model syntactically and semantically adequate contextualized representations of varying length spoken words. The proposed model was able to demonstrate its robustness when compared to the other two language-based models.
arXiv Detail & Related papers (2020-07-06T16:48:11Z)
Context based Text-generation using LSTM networks [0.5330240017302621]
The proposed model is trained to generate text for a given set of input words along with a context vector. The results are evaluated based on the semantic closeness of the generated text to the given context.
arXiv Detail & Related papers (2020-04-30T18:39:25Z)
Improve Variational Autoencoder for Text Generationwith Discrete Latent Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. VAEs tend to ignore latent variables with a strong auto-regressive decoder. We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.