NumHG: A Dataset for Number-Focused Headline Generation
- URL: http://arxiv.org/abs/2309.01455v1
- Date: Mon, 4 Sep 2023 09:03:53 GMT
- Title: NumHG: A Dataset for Number-Focused Headline Generation
- Authors: Jian-Tao Huang, Chung-Chi Chen, Hen-Hsen Huang, Hsin-Hsi Chen
- Abstract summary: Headline generation, a key task in abstractive summarization, strives to condense a full-length article into a succinct, single line of text.
We introduce a new dataset, the NumHG, and provide over 27,000 annotated numeral-rich news articles for detailed investigation.
We evaluate five well-performing models from previous headline generation tasks using human evaluation in terms of numerical accuracy, reasonableness, and readability.
- Score: 28.57003500212883
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Headline generation, a key task in abstractive summarization, strives to
condense a full-length article into a succinct, single line of text. Notably,
while contemporary encoder-decoder models excel based on the ROUGE metric, they
often falter when it comes to the precise generation of numerals in headlines.
We identify the lack of datasets providing fine-grained annotations for
accurate numeral generation as a major roadblock. To address this, we introduce
a new dataset, the NumHG, and provide over 27,000 annotated numeral-rich news
articles for detailed investigation. Further, we evaluate five well-performing
models from previous headline generation tasks using human evaluation in terms
of numerical accuracy, reasonableness, and readability. Our study reveals a
need for improvement in numerical accuracy, demonstrating the potential of the
NumHG dataset to drive progress in number-focused headline generation and
stimulate further discussions in numeral-focused text generation.
Related papers
- Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral.
This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z) - Optimizing Factual Accuracy in Text Generation through Dynamic Knowledge
Selection [71.20871905457174]
Language models (LMs) have revolutionized the way we interact with information, but they often generate nonfactual text.
Previous methods use external knowledge as references for text generation to enhance factuality but often struggle with the knowledge mix-up of irrelevant references.
We present DKGen, which divide the text generation process into an iterative process.
arXiv Detail & Related papers (2023-08-30T02:22:40Z) - How to Choose Pretrained Handwriting Recognition Models for Single
Writer Fine-Tuning [23.274139396706264]
Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on modern and historical manuscripts.
Those models struggle to obtain the same performance when applied to manuscripts with peculiar characteristics, such as language, paper support, ink, and author handwriting.
In this paper, we take into account large, real benchmark datasets and synthetic ones obtained with a styled Handwritten Text Generation model.
We give a quantitative indication of the most relevant characteristics of such data for obtaining an HTR model able to effectively transcribe manuscripts in small collections with as little as five real fine-tuning lines
arXiv Detail & Related papers (2023-05-04T07:00:28Z) - Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation [92.1582872870226]
We propose a new grounded keys-to-text generation task.
The task is to generate a factual description about an entity given a set of guiding keys, and grounding passages.
Inspired by recent QA-based evaluation measures, we propose an automatic metric, MAFE, for factual correctness of generated descriptions.
arXiv Detail & Related papers (2022-12-04T23:59:41Z) - NumGPT: Improving Numeracy Ability of Generative Pre-trained Models [59.931394234642816]
We propose NumGPT, a generative pre-trained model that explicitly models the numerical properties of numbers in texts.
Specifically, it leverages a prototype-based numeral embedding to encode the mantissa of the number and an individual embedding to encode the exponent of the number.
A numeral-aware loss function is designed to integrate numerals into the pre-training objective of NumGPT.
arXiv Detail & Related papers (2021-09-07T15:06:12Z) - Introducing a new high-resolution handwritten digits data set with
writer characteristics [0.0]
We introduce a new handwritten digit data set that we collected.
It contains high-resolution images of handwritten digits together with various writer characteristics.
Multiple writer characteristics gathered are a novelty of our data set and create new research opportunities.
arXiv Detail & Related papers (2020-11-04T18:18:43Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z) - Revisiting Challenges in Data-to-Text Generation with Fact Grounding [2.969705152497174]
We introduce a larger-scale dataset, RotoWire-FG (Ground-Facting), with 50% more data from the year 2017-19.
We achieve improved data fidelity over the state-of-the-art models by integrating a new form of table reconstruction.
arXiv Detail & Related papers (2020-01-12T02:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.