Learning to Predict Concept Ordering for Common Sense Generation
- URL: http://arxiv.org/abs/2309.06363v1
- Date: Tue, 12 Sep 2023 16:27:18 GMT
- Title: Learning to Predict Concept Ordering for Common Sense Generation
- Authors: Tianhui Zhang, Danushka Bollegala, Bei Peng
- Abstract summary: We study the relationship between the ordering of the input concepts and the quality of the generated sentences.
We find that BART-large model consistently outperforms all other LMs considered in this study.
The larger GPT3-based large language models (LLMs) variants do not necessarily outperform much smaller LMs on this task.
- Score: 32.2052248473022
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prior work has shown that the ordering in which concepts are shown to a
commonsense generator plays an important role, affecting the quality of the
generated sentence. However, it remains a challenge to determine the optimal
ordering of a given set of concepts such that a natural sentence covering all
the concepts could be generated from a pretrained generator. To understand the
relationship between the ordering of the input concepts and the quality of the
generated sentences, we conduct a systematic study considering multiple
language models (LMs) and concept ordering strategies. We find that BART-large
model consistently outperforms all other LMs considered in this study when
fine-tuned using the ordering of concepts as they appear in CommonGen training
data as measured using multiple evaluation metrics. Moreover, the larger
GPT3-based large language models (LLMs) variants do not necessarily outperform
much smaller LMs on this task, even when fine-tuned on task-specific training
data. Interestingly, human annotators significantly reorder input concept sets
when manually writing sentences covering those concepts, and this ordering
provides the best sentence generations independently of the LM used for the
generation, outperforming a probabilistic concept ordering baseline
Related papers
- Is Next Token Prediction Sufficient for GPT? Exploration on Code Logic Comprehension [18.919972400933393]
We propose an advanced pretraining task, "Next Token Prediction+"
Following this pretraining, both Code Llama and StarCoder, the prevalent code domain pretraining models, display significant improvements on our logically equivalent code selection task and the code completion task.
arXiv Detail & Related papers (2024-04-13T03:11:07Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator [114.8954615026781]
We propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator.
GanLM is trained with two pre-training objectives: replaced token detection and replaced token denoising.
Experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models.
arXiv Detail & Related papers (2022-12-20T12:51:11Z) - Revisiting Generative Commonsense Reasoning: A Pre-Ordering Approach [16.91261958272558]
We argue that the order of the input concepts can affect the PTM's ability to utilize its commonsense knowledge.
We propose a pre-ordering approach to elaborately manipulate the order of the given concepts before generation.
arXiv Detail & Related papers (2022-05-26T06:36:53Z) - Better Language Model with Hypernym Class Prediction [101.8517004687825]
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs.
In this study, we revisit this approach in the context of neural LMs.
arXiv Detail & Related papers (2022-03-21T01:16:44Z) - Learning and Analyzing Generation Order for Undirected Sequence Models [86.10875837475783]
We train a policy that learns the generation order for a pre-trained, undirected translation model via reinforcement learning.
We show that the translations by our learned orders achieve higher BLEU scores than the outputs decoded from left to right or decoded by the learned order from Mansimov et al.
Our findings could provide more insights on the mechanism of undirected generation models and encourage further research in this direction.
arXiv Detail & Related papers (2021-12-16T18:29:07Z) - A Minimalist Dataset for Systematic Generalization of Perception,
Syntax, and Semantics [131.93113552146195]
We present a new dataset, Handwritten arithmetic with INTegers (HINT), to examine machines' capability of learning generalizable concepts.
In HINT, machines are tasked with learning how concepts are perceived from raw signals such as images.
We undertake extensive experiments with various sequence-to-sequence models, including RNNs, Transformers, and GPT-3.
arXiv Detail & Related papers (2021-03-02T01:32:54Z) - oLMpics -- On what Language Model Pre-training Captures [84.60594612120173]
We propose eight reasoning tasks, which require operations such as comparison, conjunction, and composition.
A fundamental challenge is to understand whether the performance of a LM on a task should be attributed to the pre-trained representations or to the process of fine-tuning on the task data.
arXiv Detail & Related papers (2019-12-31T12:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.