'Don't Get Too Technical with Me': A Discourse Structure-Based Framework
for Science Journalism
- URL: http://arxiv.org/abs/2310.15077v1
- Date: Mon, 23 Oct 2023 16:35:05 GMT
- Title: 'Don't Get Too Technical with Me': A Discourse Structure-Based Framework
for Science Journalism
- Authors: Ronald Cardenas, Bingsheng Yao, Dakuo Wang, Yufang Hou
- Abstract summary: Science journalism refers to the task of reporting technical findings of a scientific paper as a less technical news article to the general public.
We aim to design an automated system to support this real-world task (i.e., automatic science journalism) by introducing audiences of a publicly-available scientific paper, its corresponding news article, and an expert-written short summary snippet.
- Score: 36.16009435194716
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Science journalism refers to the task of reporting technical findings of a
scientific paper as a less technical news article to the general public
audience. We aim to design an automated system to support this real-world task
(i.e., automatic science journalism) by 1) introducing a newly-constructed and
real-world dataset (SciTechNews), with tuples of a publicly-available
scientific paper, its corresponding news article, and an expert-written short
summary snippet; 2) proposing a novel technical framework that integrates a
paper's discourse structure with its metadata to guide generation; and, 3)
demonstrating with extensive automatic and human experiments that our framework
outperforms other baseline methods (e.g. Alpaca and ChatGPT) in elaborating a
content plan meaningful for the target audience, simplifying the information
selected, and producing a coherent final report in a layman's style.
Related papers
- SciNews: From Scholarly Complexities to Public Narratives -- A Dataset for Scientific News Report Generation [20.994565065595232]
We present a new corpus to facilitate the automated generation of scientific news reports.
Our dataset comprises academic publications and their corresponding scientific news reports across nine disciplines.
We benchmark our dataset employing state-of-the-art text generation models.
arXiv Detail & Related papers (2024-03-26T14:54:48Z) - Large Language Models for Scientific Information Extraction: An
Empirical Study for Virology [0.0]
We champion the use of structured and semantic content representation of discourse-based scholarly communication.
Inspired by tools like Wikipedia infoboxes or structured Amazon product descriptions, we develop an automated approach to produce structured scholarly contribution summaries.
Our results show that finetuned FLAN-T5 with 1000x fewer parameters than the state-of-the-art GPT-davinci is competitive for the task.
arXiv Detail & Related papers (2024-01-18T15:04:55Z) - The Semantic Scholar Open Data Platform [79.4493235243312]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature.
We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction.
The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z) - Citation Trajectory Prediction via Publication Influence Representation
Using Temporal Knowledge Graph [52.07771598974385]
Existing approaches mainly rely on mining temporal and graph data from academic articles.
Our framework is composed of three modules: difference-preserved graph embedding, fine-grained influence representation, and learning-based trajectory calculation.
Experiments are conducted on both the APS academic dataset and our contributed AIPatent dataset.
arXiv Detail & Related papers (2022-10-02T07:43:26Z) - NEWTS: A Corpus for News Topic-Focused Summarization [9.872518517174498]
This paper introduces the first topical summarization corpus, based on the well-known CNN/Dailymail dataset.
We evaluate a range of existing techniques and analyze the effectiveness of different prompting methods.
arXiv Detail & Related papers (2022-05-31T10:01:38Z) - Revise and Resubmit: An Intertextual Model of Text-based Collaboration
in Peer Review [52.359007622096684]
Peer review is a key component of the publishing process in most fields of science.
Existing NLP studies focus on the analysis of individual texts.
editorial assistance often requires modeling interactions between pairs of texts.
arXiv Detail & Related papers (2022-04-22T16:39:38Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - Hierarchical Bi-Directional Self-Attention Networks for Paper Review
Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation.
Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three)
We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.