Unsupervised Key Event Detection from Massive Text Corpora
- URL: http://arxiv.org/abs/2206.04153v1
- Date: Wed, 8 Jun 2022 20:31:02 GMT
- Title: Unsupervised Key Event Detection from Massive Text Corpora
- Authors: Yunyi Zhang, Fang Guo, Jiaming Shen, Jiawei Han
- Abstract summary: We propose a new task, key event detection at the intermediate level, aiming to detect from a news corpus key events.
This task can bridge event understanding and structuring and is inherently challenging because of the thematic and temporal closeness of key events.
We develop an unsupervised key event detection framework, EvMine, that extracts temporally frequent peak phrases using a novel ttf-itf score.
- Score: 42.31889135421941
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated event detection from news corpora is a crucial task towards mining
fast-evolving structured knowledge. As real-world events have different
granularities, from the top-level themes to key events and then to event
mentions corresponding to concrete actions, there are generally two lines of
research: (1) theme detection identifies from a news corpus major themes (e.g.,
"2019 Hong Kong Protests" vs. "2020 U.S. Presidential Election") that have very
distinct semantics; and (2) action extraction extracts from one document
mention-level actions (e.g., "the police hit the left arm of the protester")
that are too fine-grained for comprehending the event. In this paper, we
propose a new task, key event detection at the intermediate level, aiming to
detect from a news corpus key events (e.g., "HK Airport Protest on Aug.
12-14"), each happening at a particular time/location and focusing on the same
topic. This task can bridge event understanding and structuring and is
inherently challenging because of the thematic and temporal closeness of key
events and the scarcity of labeled data due to the fast-evolving nature of news
articles. To address these challenges, we develop an unsupervised key event
detection framework, EvMine, that (1) extracts temporally frequent peak phrases
using a novel ttf-itf score, (2) merges peak phrases into event-indicative
feature sets by detecting communities from our designed peak phrase graph that
captures document co-occurrences, semantic similarities, and temporal closeness
signals, and (3) iteratively retrieves documents related to each key event by
training a classifier with automatically generated pseudo labels from the
event-indicative feature sets and refining the detected key events using the
retrieved documents. Extensive experiments and case studies show EvMine
outperforms all the baseline methods and its ablations on two real-world news
corpora.
Related papers
- Double Mixture: Towards Continual Event Detection from Speech [60.33088725100812]
Speech event detection is crucial for multimedia retrieval, involving the tagging of both semantic and acoustic events.
This paper tackles two primary challenges in speech event detection: the continual integration of new events without forgetting previous ones, and the disentanglement of semantic from acoustic events.
We propose a novel method, 'Double Mixture,' which merges speech expertise with robust memory mechanisms to enhance adaptability and prevent forgetting.
arXiv Detail & Related papers (2024-04-20T06:32:00Z) - Towards Event Extraction from Speech with Contextual Clues [61.164413398231254]
We introduce the Speech Event Extraction (SpeechEE) task and construct three synthetic training sets and one human-spoken test set.
Compared to event extraction from text, SpeechEE poses greater challenges mainly due to complex speech signals that are continuous and have no word boundaries.
Our method brings significant improvements on all datasets, achieving a maximum F1 gain of 10.7%.
arXiv Detail & Related papers (2024-01-27T11:07:19Z) - Unifying Event Detection and Captioning as Sequence Generation via
Pre-Training [53.613265415703815]
We propose a unified pre-training and fine-tuning framework to enhance the inter-task association between event detection and captioning.
Our model outperforms the state-of-the-art methods, and can be further boosted when pre-trained on extra large-scale video-text data.
arXiv Detail & Related papers (2022-07-18T14:18:13Z) - PILED: An Identify-and-Localize Framework for Few-Shot Event Detection [79.66042333016478]
In our study, we employ cloze prompts to elicit event-related knowledge from pretrained language models.
We minimize the number of type-specific parameters, enabling our model to quickly adapt to event detection tasks for new types.
arXiv Detail & Related papers (2022-02-15T18:01:39Z) - Learning Constraints and Descriptive Segmentation for Subevent Detection [74.48201657623218]
We propose an approach to learning and enforcing constraints that capture dependencies between subevent detection and EventSeg prediction.
We adopt Rectifier Networks for constraint learning and then convert the learned constraints to a regularization term in the loss function of the neural model.
arXiv Detail & Related papers (2021-09-13T20:50:37Z) - Embed2Detect: Temporally Clustered Embedded Words for Event Detection in
Social Media [1.7205106391379026]
The adoption of word embeddings gives Embed2Detect the capability to incorporate powerful semantical features into event detection.
The obtained results show that Embed2Detect is capable of effective and efficient event detection.
arXiv Detail & Related papers (2020-06-10T15:52:52Z) - Complex networks for event detection in heterogeneous high volume news
streams [0.0]
The volume and rate of online news increases the need for automated event detection methods thatcan operate in real time.
We develop a network-based approach that makes the workingassumption that important news events always involve named entities that are linked in news articles.
arXiv Detail & Related papers (2020-05-28T02:45:43Z) - Seeing the Forest and the Trees: Detection and Cross-Document
Coreference Resolution of Militarized Interstate Disputes [3.8073142980733]
I provide a data set for evaluating methods to identify certain political events in text and to link related texts to one another based on shared events.
The data set, Headlines of War, is built on the Militarized Interstate Disputes data set and offers headlines classified by dispute status and headline pairs labeled with coreference indicators.
I introduce a model capable of accomplishing both tasks. The multi-task convolutional neural network is shown to be capable of recognizing events and event coreferences given the headlines' texts and publication dates.
arXiv Detail & Related papers (2020-05-06T17:20:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.