Discourse-Aware Prompt Design for Text Generation
- URL: http://arxiv.org/abs/2112.05717v1
- Date: Fri, 10 Dec 2021 18:15:44 GMT
- Title: Discourse-Aware Prompt Design for Text Generation
- Authors: Marjan Ghazvininejad, Vladimir Karpukhin, Asli Celikyilmaz
- Abstract summary: We show that prompt based conditional text generation can be improved with simple and efficient methods.
First, we show that a higher-level discourse structure of human written text can be modelled with textithierarchical blocking on prefix parameters.
Second, we propose sparse prefix tuning by introducing textitattention sparsity on the prefix parameters at different layers of the network and learn sparse transformations on the softmax-function.
- Score: 13.835916386769474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current efficient fine-tuning methods (e.g., adapters, prefix-tuning, etc.)
have optimized conditional text generation via training a small set of extra
parameters of the neural language model, while freezing the rest for
efficiency. While showing strong performance on some generation tasks, they
don't generalize across all generation tasks. In this work, we show that prompt
based conditional text generation can be improved with simple and efficient
methods that simulate modeling the discourse structure of human written text.
We introduce two key design choices: First we show that a higher-level
discourse structure of human written text can be modelled with
\textit{hierarchical blocking} on prefix parameters that enable spanning
different parts of the input and output text and yield more coherent output
generations. Second, we propose sparse prefix tuning by introducing
\textit{attention sparsity} on the prefix parameters at different layers of the
network and learn sparse transformations on the softmax-function, respectively.
We find that sparse attention enables the prefix-tuning to better control of
the input contents (salient facts) yielding more efficient tuning of the
prefix-parameters. Experiments on a wide-variety of text generation tasks show
that structured design of prefix parameters can achieve comparable results to
fine-tuning all parameters while outperforming standard prefix-tuning on all
generation tasks even in low-resource settings.
Related papers
- Towards Infinite-Long Prefix in Transformer [18.24137806007111]
We study the ability of Prompting and context-based fine-tuning methods to match the performance of full parameter fine-tuning.
We implement an algorithm that only needs to introduce and fine-tune a few extra trainable parameters instead of an infinite-long prefix.
Our method achieves superior or competitive performance compared to existing methods like full parameters fine-tuning, P-Tuning V2, and LoRA.
arXiv Detail & Related papers (2024-06-20T06:56:35Z) - RIFF: Learning to Rephrase Inputs for Few-shot Fine-tuning of Language Models [4.085425430499285]
We explore the impact of altering the input text of the original task in conjunction with parameter-efficient fine-tuning methods.
To most effectively rewrite the input text, we train a few-shot paraphrase model with a Maximum-Marginal Likelihood objective.
We show that enriching data with paraphrases at train and test time enhances the performance beyond what can be achieved with parameter-efficient fine-tuning alone.
arXiv Detail & Related papers (2024-03-04T17:58:09Z) - Copy Is All You Need [66.00852205068327]
We formulate text generation as progressively copying text segments from an existing text collection.
Our approach achieves better generation quality according to both automatic and human evaluations.
Our approach attains additional performance gains by simply scaling up to larger text collections.
arXiv Detail & Related papers (2023-07-13T05:03:26Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - Inducer-tuning: Connecting Prefix-tuning and Adapter-tuning [53.72897232951918]
We show that inducer-tuning can close the performance gap between prefix-tuning and fine-tuning.
We suggest a new variant of prefix-tuning -- textitinducer-tuning, which shares the exact mechanism as prefix-tuning while leveraging the residual form found in adapter-tuning.
arXiv Detail & Related papers (2022-10-26T04:39:42Z) - Text Revision by On-the-Fly Representation Optimization [76.11035270753757]
Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems.
We present an iterative in-place editing approach for text revision, which requires no parallel data.
It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification.
arXiv Detail & Related papers (2022-04-15T07:38:08Z) - Outline to Story: Fine-grained Controllable Story Generation from
Cascaded Events [39.577220559911055]
We propose a new task named "Outline to Story" (O2S) as a test bed for fine-grained controllable generation of long text.
We then create datasets for future benchmarks, built by state-of-the-art keyword extraction techniques.
arXiv Detail & Related papers (2021-01-04T08:16:21Z) - Prefix-Tuning: Optimizing Continuous Prompts for Generation [85.6357778621526]
Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks.
We propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks.
We find that by learning only 0.1% of the parameters, prefix-tuning obtains comparable performance in the full data setting.
arXiv Detail & Related papers (2021-01-01T08:00:36Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.