Related papers: NumHG: A Dataset for Number-Focused Headline Generation

NumHG: A Dataset for Number-Focused Headline Generation

URL: http://arxiv.org/abs/2309.01455v1
Date: Mon, 4 Sep 2023 09:03:53 GMT
Title: NumHG: A Dataset for Number-Focused Headline Generation
Authors: Jian-Tao Huang, Chung-Chi Chen, Hen-Hsen Huang, Hsin-Hsi Chen
Abstract summary: Headline generation, a key task in abstractive summarization, strives to condense a full-length article into a succinct, single line of text. We introduce a new dataset, the NumHG, and provide over 27,000 annotated numeral-rich news articles for detailed investigation. We evaluate five well-performing models from previous headline generation tasks using human evaluation in terms of numerical accuracy, reasonableness, and readability.
Score: 28.57003500212883
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Headline generation, a key task in abstractive summarization, strives to condense a full-length article into a succinct, single line of text. Notably, while contemporary encoder-decoder models excel based on the ROUGE metric, they often falter when it comes to the precise generation of numerals in headlines. We identify the lack of datasets providing fine-grained annotations for accurate numeral generation as a major roadblock. To address this, we introduce a new dataset, the NumHG, and provide over 27,000 annotated numeral-rich news articles for detailed investigation. Further, we evaluate five well-performing models from previous headline generation tasks using human evaluation in terms of numerical accuracy, reasonableness, and readability. Our study reveals a need for improvement in numerical accuracy, demonstrating the potential of the NumHG dataset to drive progress in number-focused headline generation and stimulate further discussions in numeral-focused text generation.

Related papers

An Analysis of Datasets, Metrics and Models in Keyphrase Generation [33.04325179283727]
Keyphrase generation refers to the task of producing a set of words or phrases that summarise a document.<n>We present an analysis of over 50 research papers on keyphrase generation, offering a comprehensive overview of recent progress, limitations, and open challenges.
arXiv Detail & Related papers (2025-06-12T04:54:44Z)
Teaching Large Language Models Number-Focused Headline Generation With Key Element Rationales [11.428237505896218]
Number-focused headline generation is a unique challenge for Large Language Models (LLMs) We propose a novel chain-of-thought framework for using rationales comprising key elements of the Topic, Entities, and Numerical reasoning (TEN) in news articles. Our approach teaches the student LLM automatic generation of rationales with enhanced capability for numerical reasoning and topic-aligned numerical headline generation.
arXiv Detail & Related papers (2025-02-05T12:39:07Z)
Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation. We introduce novel methodologies and datasets to overcome these challenges. We propose MhBART, an encoder-decoder model designed to emulate human writing style. We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z)
Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents. Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z)
Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral. This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z)
Optimizing Factual Accuracy in Text Generation through Dynamic Knowledge Selection [71.20871905457174]
Language models (LMs) have revolutionized the way we interact with information, but they often generate nonfactual text. Previous methods use external knowledge as references for text generation to enhance factuality but often struggle with the knowledge mix-up of irrelevant references. We present DKGen, which divide the text generation process into an iterative process.
arXiv Detail & Related papers (2023-08-30T02:22:40Z)
How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning [23.274139396706264]
Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on modern and historical manuscripts. Those models struggle to obtain the same performance when applied to manuscripts with peculiar characteristics, such as language, paper support, ink, and author handwriting. In this paper, we take into account large, real benchmark datasets and synthetic ones obtained with a styled Handwritten Text Generation model. We give a quantitative indication of the most relevant characteristics of such data for obtaining an HTR model able to effectively transcribe manuscripts in small collections with as little as five real fine-tuning lines
arXiv Detail & Related papers (2023-05-04T07:00:28Z)
Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation [92.1582872870226]
We propose a new grounded keys-to-text generation task. The task is to generate a factual description about an entity given a set of guiding keys, and grounding passages. Inspired by recent QA-based evaluation measures, we propose an automatic metric, MAFE, for factual correctness of generated descriptions.
arXiv Detail & Related papers (2022-12-04T23:59:41Z)
NumGPT: Improving Numeracy Ability of Generative Pre-trained Models [59.931394234642816]
We propose NumGPT, a generative pre-trained model that explicitly models the numerical properties of numbers in texts. Specifically, it leverages a prototype-based numeral embedding to encode the mantissa of the number and an individual embedding to encode the exponent of the number. A numeral-aware loss function is designed to integrate numerals into the pre-training objective of NumGPT.
arXiv Detail & Related papers (2021-09-07T15:06:12Z)
Introducing a new high-resolution handwritten digits data set with writer characteristics [0.0]
We introduce a new handwritten digit data set that we collected. It contains high-resolution images of handwritten digits together with various writer characteristics. Multiple writer characteristics gathered are a novelty of our data set and create new research opportunities.
arXiv Detail & Related papers (2020-11-04T18:18:43Z)
Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG) It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains. Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z)
Revisiting Challenges in Data-to-Text Generation with Fact Grounding [2.969705152497174]
We introduce a larger-scale dataset, RotoWire-FG (Ground-Facting), with 50% more data from the year 2017-19. We achieve improved data fidelity over the state-of-the-art models by integrating a new form of table reconstruction.
arXiv Detail & Related papers (2020-01-12T02:31:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.