Topic Detection and Tracking with Time-Aware Document Embeddings
- URL: http://arxiv.org/abs/2112.06166v2
- Date: Tue, 26 Mar 2024 04:26:18 GMT
- Title: Topic Detection and Tracking with Time-Aware Document Embeddings
- Authors: Hang Jiang, Doug Beeferman, Weiquan Mao, Deb Roy,
- Abstract summary: We design a neural method that fuses temporal and textual information into a single representation of news documents for event detection.
In the retrospective setting, we apply clustering algorithms to the time-aware embeddings and show substantial improvements over baselines on the News2013 data set.
In the online streaming setting, we add our document encoder to an existing state-of-the-art TDT pipeline and demonstrate that it can benefit the overall performance.
- Score: 23.348627263872842
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The time at which a message is communicated is a vital piece of metadata in many real-world natural language processing tasks such as Topic Detection and Tracking (TDT). TDT systems aim to cluster a corpus of news articles by event, and in that context, stories that describe the same event are likely to have been written at around the same time. Prior work on time modeling for TDT takes this into account, but does not well capture how time interacts with the semantic nature of the event. For example, stories about a tropical storm are likely to be written within a short time interval, while stories about a movie release may appear over weeks or months. In our work, we design a neural method that fuses temporal and textual information into a single representation of news documents for event detection. We fine-tune these time-aware document embeddings with a triplet loss architecture, integrate the model into downstream TDT systems, and evaluate the systems on two benchmark TDT data sets in English. In the retrospective setting, we apply clustering algorithms to the time-aware embeddings and show substantial improvements over baselines on the News2013 data set. In the online streaming setting, we add our document encoder to an existing state-of-the-art TDT pipeline and demonstrate that it can benefit the overall performance. We conduct ablation studies on the time representation and fusion algorithm strategies, showing that our proposed model outperforms alternative strategies. Finally, we probe the model to examine how it handles recurring events more effectively than previous TDT systems.
Related papers
- Language in the Flow of Time: Time-Series-Paired Texts Weaved into a Unified Temporal Narrative [65.84249211767921]
Texts as Time Series (TaTS) considers the time-series-paired texts to be auxiliary variables of the time series.
TaTS can be plugged into any existing numerical-only time series models and enable them to handle time series data with paired texts effectively.
arXiv Detail & Related papers (2025-02-13T03:43:27Z) - Mind the Time: Temporally-Controlled Multi-Event Video Generation [65.05423863685866]
We present MinT, a multi-event video generator with temporal control.
Our key insight is to bind each event to a specific period in the generated video, which allows the model to focus on one event at a time.
For the first time in the literature, our model offers control over the timing of events in generated videos.
arXiv Detail & Related papers (2024-12-06T18:52:20Z) - Retrieval of Temporal Event Sequences from Textual Descriptions [0.0]
We introduce TESRBench, a benchmark for temporal event sequence retrieval from textual descriptions.
We propose TPP-Embedding, a novel model for embedding and retrieving event sequences.
TPP-Embedding demonstrates superior performance over baseline models across TESRBench datasets.
arXiv Detail & Related papers (2024-10-17T21:35:55Z) - Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding [57.62275091656578]
We refer to the complex events composed of many news articles over an extended period as Temporal Complex Event (TCE)
This paper proposes a novel approach using Large Language Models (LLMs) to systematically extract and analyze the event chain within TCE.
arXiv Detail & Related papers (2024-06-04T16:42:17Z) - Beyond Trend and Periodicity: Guiding Time Series Forecasting with Textual Cues [9.053923035530152]
This work introduces a novel Text-Guided Time Series Forecasting (TGTSF) task.
By integrating textual cues, such as channel descriptions and dynamic news, TGTSF addresses the critical limitations of traditional methods.
We propose TGForecaster, a robust baseline model that fuses textual cues and time series data using cross-attention mechanisms.
arXiv Detail & Related papers (2024-05-22T10:45:50Z) - Multi-Sentence Grounding for Long-term Instructional Video [63.27905419718045]
We aim to establish an automatic, scalable pipeline for denoising a large-scale instructional dataset.
We construct a high-quality video-text dataset with multiple descriptive steps supervision, named HowToStep.
arXiv Detail & Related papers (2023-12-21T17:28:09Z) - Towards Similarity-Aware Time-Series Classification [51.2400839966489]
We study time-series classification (TSC), a fundamental task of time-series data mining.
We propose Similarity-Aware Time-Series Classification (SimTSC), a framework that models similarity information with graph neural networks (GNNs)
arXiv Detail & Related papers (2022-01-05T02:14:57Z) - Topic-time Heatmaps for Human-in-the-loop Topic Detection and Tracking [3.7057859167913456]
Topic Detection and Tracking (TDT) aims to organize a collection of news media into clusters of stories that pertain to the same real-world event.
To apply TDT models to practical applications such as search engines and discovery tools, human guidance is needed to pin down the scope of an "event" for the corpus of interest.
We generate a visual overview of the entire corpus, allowing the user to select regions of interest from the overview, and then ask a series of questions to affirm (or reject) that the selected documents belong to the same event.
arXiv Detail & Related papers (2021-10-12T19:17:56Z) - Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring
Sequential Events Detection for Dense Video Captioning [63.91369308085091]
We propose a novel and simple model for event sequence generation and explore temporal relationships of the event sequence in the video.
The proposed model omits inefficient two-stage proposal generation and directly generates event boundaries conditioned on bi-directional temporal dependency in one pass.
The overall system achieves state-of-the-art performance on the dense-captioning events in video task with 9.894 METEOR score on the challenge testing set.
arXiv Detail & Related papers (2020-06-14T13:21:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.