Comparing Computational Architectures for Automated Journalism
- URL: http://arxiv.org/abs/2210.04107v1
- Date: Sat, 8 Oct 2022 21:20:52 GMT
- Title: Comparing Computational Architectures for Automated Journalism
- Authors: Yan V. Sym, Jo\~ao Gabriel M. Campos, Marcos M. Jos\'e, Fabio G.
Cozman
- Abstract summary: This study compares the most often employed methods for generating Brazilian Portuguese texts from structured data.
Results suggest that explicit intermediate steps in the generation process produce better texts than the ones generated by neural end-to-end architectures.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The majority of NLG systems have been designed following either a
template-based or a pipeline-based architecture. Recent neural models for
data-to-text generation have been proposed with an end-to-end deep learning
flavor, which handles non-linguistic input in natural language without explicit
intermediary representations. This study compares the most often employed
methods for generating Brazilian Portuguese texts from structured data. Results
suggest that explicit intermediate steps in the generation process produce
better texts than the ones generated by neural end-to-end architectures,
avoiding data hallucination while better generalizing to unseen inputs. Code
and corpus are publicly available.
Related papers
- The Whole Truth and Nothing But the Truth: Faithful and Controllable
Dialogue Response Generation with Dataflow Transduction and Constrained
Decoding [65.34601470417967]
We describe a hybrid architecture for dialogue response generation that combines the strengths of neural language modeling and rule-based generation.
Our experiments show that this system outperforms both rule-based and learned approaches in human evaluations of fluency, relevance, and truthfulness.
arXiv Detail & Related papers (2022-09-16T09:00:49Z) - Deep Latent-Variable Models for Text Generation [7.119436003155924]
Deep neural network-based end-to-end architectures have been widely adopted.
End-to-end approach conflates all sub-modules, which used to be designed by complex handcrafted rules, into a holistic encode-decode architecture.
This dissertation presents how deep latent-variable models can improve over the standard encoder-decoder model for text generation.
arXiv Detail & Related papers (2022-03-03T23:06:39Z) - AUGNLG: Few-shot Natural Language Generation using Self-trained Data
Augmentation [26.016540126949103]
This paper proposes AUGNLG, a novel data augmentation approach that combines a self-trained neural retrieval model with a few-shot learned NLU model.
The proposed system mostly outperforms the state-of-the-art methods on the FewShotWOZ data in both BLEU and Slot Error Rate.
arXiv Detail & Related papers (2021-06-10T08:45:28Z) - Learning to Synthesize Data for Semantic Parsing [57.190817162674875]
We propose a generative model which models the composition of programs and maps a program to an utterance.
Due to the simplicity of PCFG and pre-trained BART, our generative model can be efficiently learned from existing data at hand.
We evaluate our method in both in-domain and out-of-domain settings of text-to-Query parsing on the standard benchmarks of GeoQuery and Spider.
arXiv Detail & Related papers (2021-04-12T21:24:02Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z) - Unnatural Language Processing: Bridging the Gap Between Synthetic and
Natural Language Data [37.542036032277466]
We introduce a technique for -simulation-to-real'' transfer in language understanding problems.
Our approach matches or outperforms state-of-the-art models trained on natural language data in several domains.
arXiv Detail & Related papers (2020-04-28T16:41:00Z) - Improved Code Summarization via a Graph Neural Network [96.03715569092523]
In general, source code summarization techniques use the source code as input and outputs a natural language description.
We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
arXiv Detail & Related papers (2020-04-06T17:36:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.