Topic Detection and Tracking with Time-Aware Document Embeddings
- URL: http://arxiv.org/abs/2112.06166v2
- Date: Tue, 26 Mar 2024 04:26:18 GMT
- Title: Topic Detection and Tracking with Time-Aware Document Embeddings
- Authors: Hang Jiang, Doug Beeferman, Weiquan Mao, Deb Roy,
- Abstract summary: We design a neural method that fuses temporal and textual information into a single representation of news documents for event detection.
In the retrospective setting, we apply clustering algorithms to the time-aware embeddings and show substantial improvements over baselines on the News2013 data set.
In the online streaming setting, we add our document encoder to an existing state-of-the-art TDT pipeline and demonstrate that it can benefit the overall performance.
- Score: 23.348627263872842
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The time at which a message is communicated is a vital piece of metadata in many real-world natural language processing tasks such as Topic Detection and Tracking (TDT). TDT systems aim to cluster a corpus of news articles by event, and in that context, stories that describe the same event are likely to have been written at around the same time. Prior work on time modeling for TDT takes this into account, but does not well capture how time interacts with the semantic nature of the event. For example, stories about a tropical storm are likely to be written within a short time interval, while stories about a movie release may appear over weeks or months. In our work, we design a neural method that fuses temporal and textual information into a single representation of news documents for event detection. We fine-tune these time-aware document embeddings with a triplet loss architecture, integrate the model into downstream TDT systems, and evaluate the systems on two benchmark TDT data sets in English. In the retrospective setting, we apply clustering algorithms to the time-aware embeddings and show substantial improvements over baselines on the News2013 data set. In the online streaming setting, we add our document encoder to an existing state-of-the-art TDT pipeline and demonstrate that it can benefit the overall performance. We conduct ablation studies on the time representation and fusion algorithm strategies, showing that our proposed model outperforms alternative strategies. Finally, we probe the model to examine how it handles recurring events more effectively than previous TDT systems.
Related papers
- Efficient Retrieval of Temporal Event Sequences from Textual Descriptions [0.0]
TPP-LLM-Embedding is a unified model for embedding and retrieving event sequences based on natural language descriptions.
Our model encodes both event types and times, generating a sequence-level representation through pooling.
TPP-LLM-Embedding enables efficient retrieval and demonstrates superior performance compared to baseline models across diverse datasets.
arXiv Detail & Related papers (2024-10-17T21:35:55Z) - Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding [57.62275091656578]
We refer to the complex events composed of many news articles over an extended period as Temporal Complex Event (TCE)
This paper proposes a novel approach using Large Language Models (LLMs) to systematically extract and analyze the event chain within TCE.
arXiv Detail & Related papers (2024-06-04T16:42:17Z) - Beyond Trend and Periodicity: Guiding Time Series Forecasting with Textual Cues [9.053923035530152]
This work introduces a novel Text-Guided Time Series Forecasting (TGTSF) task.
By integrating textual cues, such as channel descriptions and dynamic news, TGTSF addresses the critical limitations of traditional methods.
We propose TGForecaster, a robust baseline model that fuses textual cues and time series data using cross-attention mechanisms.
arXiv Detail & Related papers (2024-05-22T10:45:50Z) - Multi-Sentence Grounding for Long-term Instructional Video [63.27905419718045]
We aim to establish an automatic, scalable pipeline for denoising a large-scale instructional dataset.
We construct a high-quality video-text dataset with multiple descriptive steps supervision, named HowToStep.
arXiv Detail & Related papers (2023-12-21T17:28:09Z) - Tracking Objects and Activities with Attention for Temporal Sentence
Grounding [51.416914256782505]
Temporal sentence (TSG) aims to localize the temporal segment which is semantically aligned with a natural language query in an untrimmed segment.
We propose a novel Temporal Sentence Tracking Network (TSTNet), which contains (A) a Cross-modal Targets Generator to generate multi-modal and search space, and (B) a Temporal Sentence Tracker to track multi-modal targets' behavior and to predict query-related segment.
arXiv Detail & Related papers (2023-02-21T16:42:52Z) - HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training [49.52679453475878]
We propose a Temporal-Aware video-language pre-training framework, HiTeA, for modeling cross-modal alignment between moments and texts.
We achieve state-of-the-art results on 15 well-established video-language understanding and generation tasks.
arXiv Detail & Related papers (2022-12-30T04:27:01Z) - Towards Similarity-Aware Time-Series Classification [51.2400839966489]
We study time-series classification (TSC), a fundamental task of time-series data mining.
We propose Similarity-Aware Time-Series Classification (SimTSC), a framework that models similarity information with graph neural networks (GNNs)
arXiv Detail & Related papers (2022-01-05T02:14:57Z) - Topic-time Heatmaps for Human-in-the-loop Topic Detection and Tracking [3.7057859167913456]
Topic Detection and Tracking (TDT) aims to organize a collection of news media into clusters of stories that pertain to the same real-world event.
To apply TDT models to practical applications such as search engines and discovery tools, human guidance is needed to pin down the scope of an "event" for the corpus of interest.
We generate a visual overview of the entire corpus, allowing the user to select regions of interest from the overview, and then ask a series of questions to affirm (or reject) that the selected documents belong to the same event.
arXiv Detail & Related papers (2021-10-12T19:17:56Z) - Time-Series Representation Learning via Temporal and Contextual
Contrasting [14.688033556422337]
We propose an unsupervised Time-Series representation learning framework via Temporal and Contextual Contrasting (TS-TCC)
First, the raw time-series data are transformed into two different yet correlated views by using weak and strong augmentations.
Second, we propose a novel temporal contrasting module to learn robust temporal representations by designing a tough cross-view prediction task.
Third, to further learn discriminative representations, we propose a contextual contrasting module built upon the contexts from the temporal contrasting module.
arXiv Detail & Related papers (2021-06-26T23:56:31Z) - Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring
Sequential Events Detection for Dense Video Captioning [63.91369308085091]
We propose a novel and simple model for event sequence generation and explore temporal relationships of the event sequence in the video.
The proposed model omits inefficient two-stage proposal generation and directly generates event boundaries conditioned on bi-directional temporal dependency in one pass.
The overall system achieves state-of-the-art performance on the dense-captioning events in video task with 9.894 METEOR score on the challenge testing set.
arXiv Detail & Related papers (2020-06-14T13:21:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.