Massively Multi-Lingual Event Understanding: Extraction, Visualization,
and Search
- URL: http://arxiv.org/abs/2305.10561v1
- Date: Wed, 17 May 2023 20:41:51 GMT
- Title: Massively Multi-Lingual Event Understanding: Extraction, Visualization,
and Search
- Authors: Chris Jenkins, Shantanu Agarwal, Joel Barry, Steven Fincke, Elizabeth
Boschee
- Abstract summary: ISI-Clear is a state-of-the-art, cross-lingual, zero-shot event extraction system.
It makes global events available on-demand, processing user-supplied text in 100 languages.
- Score: 2.633652471066059
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present ISI-Clear, a state-of-the-art, cross-lingual,
zero-shot event extraction system and accompanying user interface for event
visualization & search. Using only English training data, ISI-Clear makes
global events available on-demand, processing user-supplied text in 100
languages ranging from Afrikaans to Yiddish. We provide multiple event-centric
views of extracted events, including both a graphical representation and a
document-level summary. We also integrate existing cross-lingual search
algorithms with event extraction capabilities to provide cross-lingual
event-centric search, allowing English-speaking users to search over events
automatically extracted from a corpus of non-English documents, using either
English natural language queries (e.g. cholera outbreaks in Iran) or structured
queries (e.g. find all events of type Disease-Outbreak with agent cholera and
location Iran).
Related papers
- Grounding Partially-Defined Events in Multimodal Data [61.0063273919745]
We introduce a multimodal formulation for partially-defined events and cast the extraction of these events as a three-stage span retrieval task.
We propose a benchmark for this task, MultiVENT-G, that consists of 14.5 hours of densely annotated current event videos and 1,168 text documents, containing 22.8K labeled event-centric entities.
Results illustrate the challenges that abstract event understanding poses and demonstrates promise in event-centric video-language systems.
arXiv Detail & Related papers (2024-10-07T17:59:48Z) - A diverse Multilingual News Headlines Dataset from around the World [57.37355895609648]
Babel Briefings is a novel dataset featuring 4.7 million news headlines from August 2020 to November 2021, across 30 languages and 54 locations worldwide.
It serves as a high-quality dataset for training or evaluating language models as well as offering a simple, accessible collection of articles.
arXiv Detail & Related papers (2024-03-28T12:08:39Z) - MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous
Informal Texts [7.43647091073357]
Event detection (ED) identifies and classifies event triggers from unstructured texts.
We propose a new large-scale Chinese event detection dataset based on user reviews, text conversations, and phone conversations.
arXiv Detail & Related papers (2022-11-25T05:05:29Z) - MEE: A Novel Multilingual Event Extraction Dataset [62.80569691825534]
Event Extraction aims to recognize event mentions and their arguments from text.
The lack of high-quality multilingual EE datasets for model training and evaluation has been the main hindrance.
We propose a novel Multilingual Event Extraction dataset (EE) that provides annotation for more than 50K event mentions in 8 typologically different languages.
arXiv Detail & Related papers (2022-11-11T02:01:41Z) - Zero-Shot On-the-Fly Event Schema Induction [61.91468909200566]
We present a new approach in which large language models are utilized to generate source documents that allow predicting, given a high-level event definition, the specific events, arguments, and relations between them.
Using our model, complete schemas on any topic can be generated on-the-fly without any manual data collection, i.e., in a zero-shot manner.
arXiv Detail & Related papers (2022-10-12T14:37:00Z) - Event Extraction: A Survey [3.3758186776249928]
Extracting the reported events from text is one of the key research themes in natural language processing.
The applications of event extraction spans across a wide range of domains such as newswire, biomedical domain, history and humanity, and cyber security.
This report presents a comprehensive survey for event detection from textual documents.
arXiv Detail & Related papers (2022-10-07T09:36:44Z) - PILED: An Identify-and-Localize Framework for Few-Shot Event Detection [79.66042333016478]
In our study, we employ cloze prompts to elicit event-related knowledge from pretrained language models.
We minimize the number of type-specific parameters, enabling our model to quickly adapt to event detection tasks for new types.
arXiv Detail & Related papers (2022-02-15T18:01:39Z) - Topic-time Heatmaps for Human-in-the-loop Topic Detection and Tracking [3.7057859167913456]
Topic Detection and Tracking (TDT) aims to organize a collection of news media into clusters of stories that pertain to the same real-world event.
To apply TDT models to practical applications such as search engines and discovery tools, human guidance is needed to pin down the scope of an "event" for the corpus of interest.
We generate a visual overview of the entire corpus, allowing the user to select regions of interest from the overview, and then ask a series of questions to affirm (or reject) that the selected documents belong to the same event.
arXiv Detail & Related papers (2021-10-12T19:17:56Z) - Integrating Deep Event-Level and Script-Level Information for Script
Event Prediction [60.67635412135681]
We propose a Transformer-based model, called MCPredictor, which integrates deep event-level and script-level information for script event prediction.
The experimental results on the widely-used New York Times corpus demonstrate the effectiveness and superiority of the proposed model.
arXiv Detail & Related papers (2021-09-24T07:37:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.