MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text
Generation
- URL: http://arxiv.org/abs/2212.08607v1
- Date: Fri, 16 Dec 2022 17:36:23 GMT
- Title: MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text
Generation
- Authors: Swarnadeep Saha, Xinyan Velocity Yu, Mohit Bansal, Ramakanth Pasunuru,
Asli Celikyilmaz
- Abstract summary: We propose MURMUR, a neuro-symbolic modular approach to text generation from semi-structured data with multi-step reasoning.
We conduct experiments on two data-to-text generation tasks like WebNLG and LogicNLG.
- Score: 102.20036684996248
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompting large language models has enabled significant recent progress in
multi-step reasoning over text. However, when applied to text generation from
semi-structured data (e.g., graphs or tables), these methods typically suffer
from low semantic coverage, hallucination, and logical inconsistency. We
propose MURMUR, a neuro-symbolic modular approach to text generation from
semi-structured data with multi-step reasoning. MURMUR is a best-first search
method that generates reasoning paths using: (1) neural and symbolic modules
with specific linguistic and logical skills, (2) a grammar whose production
rules define valid compositions of modules, and (3) value functions that assess
the quality of each reasoning step. We conduct experiments on two diverse
data-to-text generation tasks like WebNLG and LogicNLG. These tasks differ in
their data representations (graphs and tables) and span multiple linguistic and
logical skills. MURMUR obtains significant improvements over recent few-shot
baselines like direct prompting and chain-of-thought prompting, while also
achieving comparable performance to fine-tuned GPT-2 on out-of-domain data.
Moreover, human evaluation shows that MURMUR generates highly faithful and
correct reasoning paths that lead to 26% more logically consistent summaries on
LogicNLG, compared to direct prompting.
Related papers
- Exploration of Masked and Causal Language Modelling for Text Generation [6.26998839917804]
This paper conducts an extensive comparison of Causal Language Modelling approaches for text generation tasks.
We first employ quantitative metrics and then perform a qualitative human evaluation to analyse coherence and grammatical correctness.
The results show that consistently outperforms CLM in text generation across all datasets.
arXiv Detail & Related papers (2024-05-21T09:33:31Z) - MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning [63.80739044622555]
We introduce MuSR, a dataset for evaluating language models on soft reasoning tasks specified in a natural language narrative.
This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm.
Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning.
arXiv Detail & Related papers (2023-10-24T17:59:20Z) - Abstract Meaning Representation-Based Logic-Driven Data Augmentation for Logical Reasoning [27.224364543134094]
We introduce a novel logic-driven data augmentation approach, AMR-LDA.
AMR-LDA converts the original text into an Abstract Meaning Representation (AMR) graph.
The modified AMR graphs are subsequently converted back into text to create augmented data.
arXiv Detail & Related papers (2023-05-21T23:16:26Z) - Improving Logical-Level Natural Language Generation with
Topic-Conditioned Data Augmentation and Logical Form Generation [18.93964332724296]
We propose a topic-conditioned data augmentation (TopicDA) to generate logical forms and textual descriptions directly from tables.
We introduce logical form generation (LG), a dual task of Logic2text that requires generating a valid logical form based on a text description of a table.
We also propose a semi-supervised learning approach to jointly train a Logic2text and an LG model with both labeled and augmented data.
arXiv Detail & Related papers (2021-12-12T13:50:18Z) - LOGEN: Few-shot Logical Knowledge-Conditioned Text Generation with
Self-training [76.90793623822866]
We propose a unified framework for logical knowledge-conditioned text generation in the few-shot setting.
Our approach leverages self-training and samples pseudo logical forms based on content and structure consistency.
arXiv Detail & Related papers (2021-12-02T16:49:41Z) - Logic-Consistency Text Generation from Semantic Parses [32.543257899910216]
This paper first proposes SNOWBALL, a framework for logic consistent text generation from semantic parses.
Second, we propose a novel automatic metric, BLEC, for evaluating the logical consistency between the semantic parses and generated texts.
arXiv Detail & Related papers (2021-08-02T01:12:18Z) - Logic-Driven Context Extension and Data Augmentation for Logical
Reasoning of Text [65.24325614642223]
We propose to understand logical symbols and expressions in the text to arrive at the answer.
Based on such logical information, we put forward a context extension framework and a data augmentation algorithm.
Our method achieves the state-of-the-art performance, and both logic-driven context extension framework and data augmentation algorithm can help improve the accuracy.
arXiv Detail & Related papers (2021-05-08T10:09:36Z) - Deep Graph Matching and Searching for Semantic Code Retrieval [76.51445515611469]
We propose an end-to-end deep graph matching and searching model based on graph neural networks.
We first represent both natural language query texts and programming language code snippets with the unified graph-structured data.
In particular, DGMS not only captures more structural information for individual query texts or code snippets but also learns the fine-grained similarity between them.
arXiv Detail & Related papers (2020-10-24T14:16:50Z) - Logical Natural Language Generation from Open-Domain Tables [107.04385677577862]
We propose a new task where a model is tasked with generating natural language statements that can be emphlogically entailed by the facts.
To facilitate the study of the proposed logical NLG problem, we use the existing TabFact dataset citechen 2019tabfact featured with a wide range of logical/symbolic inferences.
The new task poses challenges to the existing monotonic generation frameworks due to the mismatch between sequence order and logical order.
arXiv Detail & Related papers (2020-04-22T06:03:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.