Context based Text-generation using LSTM networks
- URL: http://arxiv.org/abs/2005.00048v1
- Date: Thu, 30 Apr 2020 18:39:25 GMT
- Title: Context based Text-generation using LSTM networks
- Authors: Sivasurya Santhanam
- Abstract summary: The proposed model is trained to generate text for a given set of input words along with a context vector.
The results are evaluated based on the semantic closeness of the generated text to the given context.
- Score: 0.5330240017302621
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long short-term memory(LSTM) units on sequence-based models are being used in
translation, question-answering systems, classification tasks due to their
capability of learning long-term dependencies. In Natural language generation,
LSTM networks are providing impressive results on text generation models by
learning language models with grammatically stable syntaxes. But the downside
is that the network does not learn about the context. The network only learns
the input-output function and generates text given a set of input words
irrespective of pragmatics. As the model is trained without any such context,
there is no semantic consistency among the generated sentences. The proposed
model is trained to generate text for a given set of input words along with a
context vector. A context vector is similar to a paragraph vector that grasps
the semantic meaning(context) of the sentence. Several methods of extracting
the context vectors are proposed in this work. While training a language model,
in addition to the input-output sequences, context vectors are also trained
along with the inputs. Due to this structure, the model learns the relation
among the input words, context vector and the target word. Given a set of
context terms, a well trained model will generate text around the provided
context. Based on the nature of computing context vectors, the model has been
tried out with two variations (word importance and word clustering). In the
word clustering method, the suitable embeddings among various domains are also
explored. The results are evaluated based on the semantic closeness of the
generated text to the given context.
Related papers
- A Case Study on Context-Aware Neural Machine Translation with Multi-Task Learning [49.62044186504516]
In document-level neural machine translation (DocNMT), multi-encoder approaches are common in encoding context and source sentences.
Recent studies have shown that the context encoder generates noise and makes the model robust to the choice of context.
This paper further investigates this observation by explicitly modelling context encoding through multi-task learning (MTL) to make the model sensitive to the choice of context.
arXiv Detail & Related papers (2024-07-03T12:50:49Z) - Detecting out-of-distribution text using topological features of transformer-based language models [0.5735035463793009]
We explore the use of topological features of self-attention maps from transformer-based language models to detect when input text is out of distribution.
We evaluate our approach on BERT and compare it to a traditional OOD approach using CLS embeddings.
Our results show that our approach outperforms CLS embeddings in distinguishing in-distribution samples from far-out-of-domain samples, but struggles with near or same-domain datasets.
arXiv Detail & Related papers (2023-11-22T02:04:35Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - Context-aware Fine-tuning of Self-supervised Speech Models [56.95389222319555]
We study the use of context, i.e., surrounding segments, during fine-tuning.
We propose a new approach called context-aware fine-tuning.
We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks.
arXiv Detail & Related papers (2022-12-16T15:46:15Z) - Syntax-Enhanced Pre-trained Model [49.1659635460369]
We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa.
Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages.
We present a model that utilizes the syntax of text in both pre-training and fine-tuning stages.
arXiv Detail & Related papers (2020-12-28T06:48:04Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z) - How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization
on Natural Text [2.881185491084005]
We learn a language model where syntactic structures are implicitly given.
We show that the context update vectors, i.e. outputs of internal gates, are approximately quantized to binary or ternary values.
For some dimensions in the context vector, we show that their activations are highly correlated with the depth of phrase structures.
We also show that natural clusters of the functional words and the part of speeches that trigger phrases are represented in a small but principal subspace of the context-update vector of LSTM.
arXiv Detail & Related papers (2020-10-01T12:49:01Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - An Intelligent CNN-VAE Text Representation Technology Based on Text
Semantics for Comprehensive Big Data [15.680918844684454]
A text feature representation model based on convolutional neural network (CNN) and variational autoencoder (VAE) is proposed.
The proposed model outperforms in k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) classification algorithms.
arXiv Detail & Related papers (2020-08-28T07:39:45Z) - Logic Constrained Pointer Networks for Interpretable Textual Similarity [11.142649867439406]
We introduce a novel pointer network based model with a sentinel gating function to align constituent chunks.
We improve this base model with a loss function to equally penalize misalignments in both sentences, ensuring the alignments are bidirectional.
The model achieves an F1 score of 97.73 and 96.32 on the benchmark SemEval datasets for the chunk alignment task.
arXiv Detail & Related papers (2020-07-15T13:01:44Z) - Distributional semantic modeling: a revised technique to train term/word
vector space models applying the ontology-related approach [36.248702416150124]
We design a new technique for the distributional semantic modeling with a neural network-based approach to learn distributed term representations (or term embeddings)
Vec2graph is a Python library for visualizing word embeddings (term embeddings in our case) as dynamic and interactive graphs.
arXiv Detail & Related papers (2020-03-06T18:27:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.