Related papers: ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models

ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models

URL: http://arxiv.org/abs/2311.01981v1
Date: Fri, 3 Nov 2023 15:34:02 GMT
Title: ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models
Authors: Haotian Luo, Kunming Wu, Cheng Dai, Sixian Ding, Xinhao Chen
Abstract summary: We propose an architecture to teach the model memorizing prompt during generation by synthetic gradient. We construct a dataset for experiments, and the results have demonstrated the effectiveness of our method.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: RNN-like language models are getting renewed attention from NLP researchers in recent years and several models have made significant progress, which demonstrates performance comparable to traditional transformers. However, due to the recurrent nature of RNNs, this kind of language model can only store information in a set of fixed-length state vectors. As a consequence, they still suffer from forgetfulness though after a lot of improvements and optimizations, when given complex instructions or prompts. As the prompted generation is the main and most concerned function of LMs, solving the problem of forgetting in the process of generation is no wonder of vital importance. In this paper, focusing on easing the prompt forgetting during generation, we proposed an architecture to teach the model memorizing prompt during generation by synthetic gradient. To force the model to memorize the prompt, we derive the states that encode the prompt, then transform it into model parameter modification using low-rank gradient approximation, which hard-codes the prompt into model parameters temporarily. We construct a dataset for experiments, and the results have demonstrated the effectiveness of our method in solving the problem of forgetfulness in the process of prompted generation. We will release all the code upon acceptance.

Related papers

Can Gradient Descent Simulate Prompting? [56.60154660021178]
gradient updates the effects of conditioning on new information.<n> gradient descent training recovers some (and occasionally all) of prompted model performance.<n>Results suggest new avenues for long-context modeling.
arXiv Detail & Related papers (2025-06-26T04:06:20Z)
Neuron Patching: Semantic-based Neuron-level Language Model Repair for Code Generation [32.178931149612644]
ulModel ulImprovement via ulNeuron ulTargeting (textscMINT) is a novel approach for repairing code Language Models (LMs) textscMINT is effective, efficient, and reliable, capable of correcting a neural model by patching a minimum number of neurons.
arXiv Detail & Related papers (2023-12-08T20:28:08Z)
R-Tuning: Instructing Large Language Models to Say `I Don't Know' [66.11375475253007]
Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges. Previous instruction tuning methods force the model to complete a sentence no matter whether the model knows the knowledge or not. We present a new approach called Refusal-Aware Instruction Tuning (R-Tuning) Experimental results demonstrate R-Tuning effectively improves a model's ability to answer known questions and refrain from answering unknown questions.
arXiv Detail & Related papers (2023-11-16T08:45:44Z)
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers [29.319666323947708]
We present a novel approach that dynamically prunes contextual information while preserving the model's expressiveness. Our method employs a learnable mechanism that determines which uninformative tokens can be dropped from the context. Our reference implementation achieves up to $2times$ increase in inference throughput and even greater memory savings.
arXiv Detail & Related papers (2023-05-25T07:39:41Z)
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks [21.616328837090396]
Spiking Neural Networks (SNNs) leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. We implement generative language model with binary, event-driven spiking activation units. SpikeGPT is the largest backpropagation-trained SNN model to date, rendering it suitable for both the generation and comprehension of natural language.
arXiv Detail & Related papers (2023-02-27T16:43:04Z)
Prompt Conditioned VAE: Enhancing Generative Replay for Lifelong Learning in Task-Oriented Dialogue [80.05509768165135]
generative replay methods are widely employed to consolidate past knowledge with generated pseudo samples. Most existing generative replay methods use only a single task-specific token to control their models. We propose a novel method, prompt conditioned VAE for lifelong learning, to enhance generative replay by incorporating tasks' statistics.
arXiv Detail & Related papers (2022-10-14T13:12:14Z)
Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep. We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z)
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning. During inference, we introduce a new generation procedure with a critical sampling strategy. For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z)
Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation [4.866431869728018]
We introduce ScaleGrad, a modification straight to the gradient of the loss function, to remedy the degeneration issue of the standard MLE objective. Empirical results show the effectiveness of our method not only in open-ended generation, but also in directed generation tasks.
arXiv Detail & Related papers (2021-06-14T07:46:30Z)
POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation. The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner. The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)
Recognizing Long Grammatical Sequences Using Recurrent Networks Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction. RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems. One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack. In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.