ETimeline: An Extensive Timeline Generation Dataset based on Large Language Model
- URL: http://arxiv.org/abs/2502.07474v1
- Date: Tue, 11 Feb 2025 11:34:33 GMT
- Title: ETimeline: An Extensive Timeline Generation Dataset based on Large Language Model
- Authors: Xiaochen Liu, Yanan Zhang,
- Abstract summary: We propose ETimeline, which encompasses over $13,000$ news articles, spanning $600$ bilingual domains across $28$ news domains.<n>This work contributes to timeline generation research and supports a wide range of tasks including generation and event relationships.
- Score: 4.639419073825561
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Timeline generation is of great significance for a comprehensive understanding of the development of events over time. Its goal is to organize news chronologically, which helps to identify patterns and trends that may be obscured when viewing news in isolation, making it easier to track the development of stories and understand the interrelationships between key events. Timelines are now common in various commercial products, but academic research in this area is notably scarce. Additionally, the current datasets are in need of refinement for enhanced utility and expanded coverage. In this paper, we propose ETimeline, which encompasses over $13,000$ news articles, spanning $600$ bilingual timelines across $28$ news domains. Specifically, we gather a candidate pool of more than $120,000$ news articles and employ the large language model (LLM) Pipeline to improve performance, ultimately yielding the ETimeline. The data analysis underscores the appeal of ETimeline. Additionally, we also provide the news pool data for further research and analysis. This work contributes to the advancement of timeline generation research and supports a wide range of tasks, including topic generation and event relationships. We believe that this dataset will serve as a catalyst for innovative research and bridge the gap between academia and industry in understanding the practical application of technology services. The dataset is available at https://zenodo.org/records/11392212
Related papers
- Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization [93.56166917491487]
This paper proposes CHRONOS - Causal Headline Retrieval for Open-domain News Timeline SummarizatiOn via Iterative Self-Questioning.<n>Our experiments indicate that CHRONOS is not only adept at open-domain timeline summarization, but it also rivals the performance of existing state-of-the-art systems designed for closed-domain applications.
arXiv Detail & Related papers (2025-01-01T16:28:21Z) - Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding [57.62275091656578]
We refer to the complex events composed of many news articles over an extended period as Temporal Complex Event (TCE)
This paper proposes a novel approach using Large Language Models (LLMs) to systematically extract and analyze the event chain within TCE.
arXiv Detail & Related papers (2024-06-04T16:42:17Z) - A diverse Multilingual News Headlines Dataset from around the World [57.37355895609648]
Babel Briefings is a novel dataset featuring 4.7 million news headlines from August 2020 to November 2021, across 30 languages and 54 locations worldwide.
It serves as a high-quality dataset for training or evaluating language models as well as offering a simple, accessible collection of articles.
arXiv Detail & Related papers (2024-03-28T12:08:39Z) - SciNews: From Scholarly Complexities to Public Narratives -- A Dataset for Scientific News Report Generation [16.61347730523143]
We present a new corpus to facilitate the automated generation of scientific news reports.<n>Our dataset comprises academic publications and their corresponding scientific news reports across nine disciplines.<n>We benchmark our dataset employing state-of-the-art text generation models.
arXiv Detail & Related papers (2024-03-26T14:54:48Z) - Once Upon a $\textit{Time}$ in $\textit{Graph}$: Relative-Time
Pretraining for Complex Temporal Reasoning [96.03608822291136]
We make use of the underlying nature of time, and suggest creating a graph structure based on the relative placements of events along the time axis.
Inspired by the graph view, we propose RemeMo, which explicitly connects all temporally-scoped facts by modeling the time relations between any two sentences.
Experimental results show that RemeMo outperforms the baseline T5 on multiple temporal question answering datasets.
arXiv Detail & Related papers (2023-10-23T08:49:00Z) - Large Models for Time Series and Spatio-Temporal Data: A Survey and
Outlook [95.32949323258251]
Temporal data, notably time series andtemporal-temporal data, are prevalent in real-world applications.
Recent advances in large language and other foundational models have spurred increased use in time series andtemporal data mining.
arXiv Detail & Related papers (2023-10-16T09:06:00Z) - Video Timeline Modeling For News Story Understanding [123.03394373132353]
We present a novel problem, namely video timeline modeling.
Our objective is to create a video-associated timeline from a set of videos related to a specific topic, thereby facilitating the content and structure understanding of the story being told.
This problem has significant potential in various real-world applications, for instance, news story summarization.
arXiv Detail & Related papers (2023-09-23T18:24:15Z) - VLSNR:Vision-Linguistics Coordination Time Sequence-aware News
Recommendation [0.0]
multimodal semantics is beneficial for enhancing the comprehension of users' temporal and long-lasting interests.
In our work, we propose a vision-linguistics coordinate time sequence news recommendation.
We also construct a large scale multimodal news recommendation dataset V-MIND.
arXiv Detail & Related papers (2022-10-06T14:27:37Z) - Deep learning for time series classification [2.0305676256390934]
Time series analysis allows us to visualize and understand the evolution of a process over time.
Time series classification consists of constructing algorithms dedicated to automatically label time series data.
Deep learning has emerged as one of the most effective methods for tackling the supervised classification task.
arXiv Detail & Related papers (2020-10-01T17:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.