Related papers: Context based Text-generation using LSTM networks

Context based Text-generation using LSTM networks

URL: http://arxiv.org/abs/2005.00048v1
Date: Thu, 30 Apr 2020 18:39:25 GMT
Title: Context based Text-generation using LSTM networks
Authors: Sivasurya Santhanam
Abstract summary: The proposed model is trained to generate text for a given set of input words along with a context vector. The results are evaluated based on the semantic closeness of the generated text to the given context.
Score: 0.5330240017302621
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Long short-term memory(LSTM) units on sequence-based models are being used in translation, question-answering systems, classification tasks due to their capability of learning long-term dependencies. In Natural language generation, LSTM networks are providing impressive results on text generation models by learning language models with grammatically stable syntaxes. But the downside is that the network does not learn about the context. The network only learns the input-output function and generates text given a set of input words irrespective of pragmatics. As the model is trained without any such context, there is no semantic consistency among the generated sentences. The proposed model is trained to generate text for a given set of input words along with a context vector. A context vector is similar to a paragraph vector that grasps the semantic meaning(context) of the sentence. Several methods of extracting the context vectors are proposed in this work. While training a language model, in addition to the input-output sequences, context vectors are also trained along with the inputs. Due to this structure, the model learns the relation among the input words, context vector and the target word. Given a set of context terms, a well trained model will generate text around the provided context. Based on the nature of computing context vectors, the model has been tried out with two variations (word importance and word clustering). In the word clustering method, the suitable embeddings among various domains are also explored. The results are evaluated based on the semantic closeness of the generated text to the given context.

Related papers

SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation [55.61004653386632]
Large Language Models (LLMs) often produce hallucinations, i.e., information that is unfaithful or not grounded in the input context. This paper introduces a novel self-supervised method for generating a training set of unfaithful samples. We then refine the model using a training process that encourages the generation of grounded outputs over unfaithful ones.
arXiv Detail & Related papers (2025-02-19T12:31:58Z)
A Case Study on Context-Aware Neural Machine Translation with Multi-Task Learning [49.62044186504516]
In document-level neural machine translation (DocNMT), multi-encoder approaches are common in encoding context and source sentences. Recent studies have shown that the context encoder generates noise and makes the model robust to the choice of context. This paper further investigates this observation by explicitly modelling context encoding through multi-task learning (MTL) to make the model sensitive to the choice of context.
arXiv Detail & Related papers (2024-07-03T12:50:49Z)
Detecting out-of-distribution text using topological features of transformer-based language models [0.5735035463793009]
We explore the use of topological features of self-attention maps from transformer-based language models to detect when input text is out of distribution. We evaluate our approach on BERT and compare it to a traditional OOD approach using CLS embeddings. Our results show that our approach outperforms CLS embeddings in distinguishing in-distribution samples from far-out-of-domain samples, but struggles with near or same-domain datasets.
arXiv Detail & Related papers (2023-11-22T02:04:35Z)
Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context. We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability. Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z)
Context-aware Fine-tuning of Self-supervised Speech Models [56.95389222319555]
We study the use of context, i.e., surrounding segments, during fine-tuning. We propose a new approach called context-aware fine-tuning. We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks.
arXiv Detail & Related papers (2022-12-16T15:46:15Z)
Syntax-Enhanced Pre-trained Model [49.1659635460369]
We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa. Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages. We present a model that utilizes the syntax of text in both pre-training and fine-tuning stages.
arXiv Detail & Related papers (2020-12-28T06:48:04Z)
Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes. An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences. The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z)
How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization on Natural Text [2.881185491084005]
We learn a language model where syntactic structures are implicitly given. We show that the context update vectors, i.e. outputs of internal gates, are approximately quantized to binary or ternary values. For some dimensions in the context vector, we show that their activations are highly correlated with the depth of phrase structures. We also show that natural clusters of the functional words and the part of speeches that trigger phrases are represented in a small but principal subspace of the context-update vector of LSTM.
arXiv Detail & Related papers (2020-10-01T12:49:01Z)
Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size. We propose a fully compositional output embedding layer for language models. To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)
An Intelligent CNN-VAE Text Representation Technology Based on Text Semantics for Comprehensive Big Data [15.680918844684454]
A text feature representation model based on convolutional neural network (CNN) and variational autoencoder (VAE) is proposed. The proposed model outperforms in k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) classification algorithms.
arXiv Detail & Related papers (2020-08-28T07:39:45Z)
Logic Constrained Pointer Networks for Interpretable Textual Similarity [11.142649867439406]
We introduce a novel pointer network based model with a sentinel gating function to align constituent chunks. We improve this base model with a loss function to equally penalize misalignments in both sentences, ensuring the alignments are bidirectional. The model achieves an F1 score of 97.73 and 96.32 on the benchmark SemEval datasets for the chunk alignment task.
arXiv Detail & Related papers (2020-07-15T13:01:44Z)
Distributional semantic modeling: a revised technique to train term/word vector space models applying the ontology-related approach [36.248702416150124]
We design a new technique for the distributional semantic modeling with a neural network-based approach to learn distributed term representations (or term embeddings) Vec2graph is a Python library for visualizing word embeddings (term embeddings in our case) as dynamic and interactive graphs.
arXiv Detail & Related papers (2020-03-06T18:27:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.