Related papers: Learning to Predict Concept Ordering for Common Sense Generation

Learning to Predict Concept Ordering for Common Sense Generation

URL: http://arxiv.org/abs/2309.06363v1
Date: Tue, 12 Sep 2023 16:27:18 GMT
Title: Learning to Predict Concept Ordering for Common Sense Generation
Authors: Tianhui Zhang, Danushka Bollegala, Bei Peng
Abstract summary: We study the relationship between the ordering of the input concepts and the quality of the generated sentences. We find that BART-large model consistently outperforms all other LMs considered in this study. The larger GPT3-based large language models (LLMs) variants do not necessarily outperform much smaller LMs on this task.
Score: 32.2052248473022
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prior work has shown that the ordering in which concepts are shown to a commonsense generator plays an important role, affecting the quality of the generated sentence. However, it remains a challenge to determine the optimal ordering of a given set of concepts such that a natural sentence covering all the concepts could be generated from a pretrained generator. To understand the relationship between the ordering of the input concepts and the quality of the generated sentences, we conduct a systematic study considering multiple language models (LMs) and concept ordering strategies. We find that BART-large model consistently outperforms all other LMs considered in this study when fine-tuned using the ordering of concepts as they appear in CommonGen training data as measured using multiple evaluation metrics. Moreover, the larger GPT3-based large language models (LLMs) variants do not necessarily outperform much smaller LMs on this task, even when fine-tuned on task-specific training data. Interestingly, human annotators significantly reorder input concept sets when manually writing sentences covering those concepts, and this ordering provides the best sentence generations independently of the LM used for the generation, outperforming a probabilistic concept ordering baseline

Related papers

Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability [27.84922167294656]
In generative commonsense reasoning tasks, generative large language models (LLMs) compose sentences that include all given concepts.<n>This benchmark measures ordered coverage to assess whether concepts are generated in the specified order.<n>Even the most instruction-compliant LLM achieved only about 75% ordered coverage, highlighting the need for improvements in both instruction-following and compositional generalization capabilities.
arXiv Detail & Related papers (2025-06-18T17:00:54Z)
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO [87.52631406241456]
Recent text-to-image systems face limitations in handling multimodal inputs and complex reasoning tasks.<n>We introduce Mind Omni, a unified multimodal large language model that addresses these challenges by incorporating reasoning generation through reinforcement learning.
arXiv Detail & Related papers (2025-05-19T12:17:04Z)
FedBM: Stealing Knowledge from Pre-trained Language Models for Heterogeneous Federated Learning [33.84409350929454]
We propose a novel framework called Federated Bias eliMinating (FedBM) to get rid of local learning bias in heterogeneous learning (FL) FedBM consists of two modules, i.e., Linguistic Knowledge-based Construction (LKCC) and Concept-guided Global Distribution Estimation (CGDE)
arXiv Detail & Related papers (2025-02-24T04:35:48Z)
Is Next Token Prediction Sufficient for GPT? Exploration on Code Logic Comprehension [18.919972400933393]
We propose an advanced pretraining task, "Next Token Prediction+" Following this pretraining, both Code Llama and StarCoder, the prevalent code domain pretraining models, display significant improvements on our logically equivalent code selection task and the code completion task.
arXiv Detail & Related papers (2024-04-13T03:11:07Z)
Instruction Position Matters in Sequence Generation with Large Language Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization. We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z)
GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator [114.8954615026781]
We propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator. GanLM is trained with two pre-training objectives: replaced token detection and replaced token denoising. Experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models.
arXiv Detail & Related papers (2022-12-20T12:51:11Z)
Revisiting Generative Commonsense Reasoning: A Pre-Ordering Approach [16.91261958272558]
We argue that the order of the input concepts can affect the PTM's ability to utilize its commonsense knowledge. We propose a pre-ordering approach to elaborately manipulate the order of the given concepts before generation.
arXiv Detail & Related papers (2022-05-26T06:36:53Z)
Better Language Model with Hypernym Class Prediction [101.8517004687825]
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs. In this study, we revisit this approach in the context of neural LMs.
arXiv Detail & Related papers (2022-03-21T01:16:44Z)
Learning and Analyzing Generation Order for Undirected Sequence Models [86.10875837475783]
We train a policy that learns the generation order for a pre-trained, undirected translation model via reinforcement learning. We show that the translations by our learned orders achieve higher BLEU scores than the outputs decoded from left to right or decoded by the learned order from Mansimov et al. Our findings could provide more insights on the mechanism of undirected generation models and encourage further research in this direction.
arXiv Detail & Related papers (2021-12-16T18:29:07Z)
A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics [131.93113552146195]
We present a new dataset, Handwritten arithmetic with INTegers (HINT), to examine machines' capability of learning generalizable concepts. In HINT, machines are tasked with learning how concepts are perceived from raw signals such as images. We undertake extensive experiments with various sequence-to-sequence models, including RNNs, Transformers, and GPT-3.
arXiv Detail & Related papers (2021-03-02T01:32:54Z)
oLMpics -- On what Language Model Pre-training Captures [84.60594612120173]
We propose eight reasoning tasks, which require operations such as comparison, conjunction, and composition. A fundamental challenge is to understand whether the performance of a LM on a task should be attributed to the pre-trained representations or to the process of fine-tuning on the task data.
arXiv Detail & Related papers (2019-12-31T12:11:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.