MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous
Informal Texts
- URL: http://arxiv.org/abs/2211.13896v1
- Date: Fri, 25 Nov 2022 05:05:29 GMT
- Title: MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous
Informal Texts
- Authors: Xiangyu Xi, Jianwei Lv, Shuaipeng Liu, Wei Ye, Fan Yang and Guanglu
Wan
- Abstract summary: Event detection (ED) identifies and classifies event triggers from unstructured texts.
We propose a new large-scale Chinese event detection dataset based on user reviews, text conversations, and phone conversations.
- Score: 7.43647091073357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event detection (ED) identifies and classifies event triggers from
unstructured texts, serving as a fundamental task for information extraction.
Despite the remarkable progress achieved in the past several years, most
research efforts focus on detecting events from formal texts (e.g., news
articles, Wikipedia documents, financial announcements). Moreover, the texts in
each dataset are either from a single source or multiple yet relatively
homogeneous sources. With massive amounts of user-generated text accumulating
on the Web and inside enterprises, identifying meaningful events in these
informal texts, usually from multiple heterogeneous sources, has become a
problem of significant practical value. As a pioneering exploration that
expands event detection to the scenarios involving informal and heterogeneous
texts, we propose a new large-scale Chinese event detection dataset based on
user reviews, text conversations, and phone conversations in a leading
e-commerce platform for food service. We carefully investigate the proposed
dataset's textual informality and multi-source heterogeneity characteristics by
inspecting data samples quantitatively and qualitatively. Extensive experiments
with state-of-the-art event detection methods verify the unique challenges
posed by these characteristics, indicating that multi-source informal event
detection remains an open problem and requires further efforts. Our benchmark
and code are released at \url{https://github.com/myeclipse/MUSIED}.
Related papers
- AMPLE: Emotion-Aware Multimodal Fusion Prompt Learning for Fake News Detection [0.1499944454332829]
This paper introduces Emotion-textbfAware textbfMultimodal Fusion textbfPrompt textbfLtextbfEarning (textbfAMPLE) framework to address the above issue.
This framework extracts emotional elements from texts by leveraging sentiment analysis tools.
It then employs Multi-Head Cross-Attention (MCA) mechanisms and similarity-aware fusion methods to integrate multimodal data.
arXiv Detail & Related papers (2024-10-21T02:19:24Z) - Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - Towards Unified Multi-granularity Text Detection with Interactive Attention [56.79437272168507]
"Detect Any Text" is an advanced paradigm that unifies scene text detection, layout analysis, and document page detection into a cohesive, end-to-end model.
A pivotal innovation in DAT is the across-granularity interactive attention module, which significantly enhances the representation learning of text instances.
Tests demonstrate that DAT achieves state-of-the-art performances across a variety of text-related benchmarks.
arXiv Detail & Related papers (2024-05-30T07:25:23Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Reliable Shot Identification for Complex Event Detection via
Visual-Semantic Embedding [72.9370352430965]
We propose a visual-semantic guided loss method for event detection in videos.
Motivated by curriculum learning, we introduce a negative elastic regularization term to start training the classifier with instances of high reliability.
An alternative optimization algorithm is developed to solve the proposed challenging non-net regularization problem.
arXiv Detail & Related papers (2021-10-12T11:46:56Z) - On Exploring and Improving Robustness of Scene Text Detection Models [20.15225372544634]
We evaluate scene text detection models ICDAR2015-C (IC15-C) and CTW1500-C (CTW-C)
We perform a robustness analysis of six key components: pre-training data, backbone, feature fusion module, multi-scale predictions, representation of text instances and loss function.
We present a simple yet effective data-based method to destroy the smoothness of text regions by merging background and foreground.
arXiv Detail & Related papers (2021-10-12T02:36:48Z) - COfEE: A Comprehensive Ontology for Event Extraction from text, with an
online annotation tool [3.8995911009078816]
Event Extraction (EE) seeks to derive information about specific incidents and their actors from the text.
EE is useful in many domains such as building a knowledge base, information retrieval, summarization and online monitoring systems.
COfEE consists of two hierarchy levels (event types and event sub-types) that include new categories relating to environmental issues, cyberspace, criminal activity and natural disasters.
arXiv Detail & Related papers (2021-07-21T19:43:22Z) - Detecting Ongoing Events Using Contextual Word and Sentence Embeddings [110.83289076967895]
This paper introduces the Ongoing Event Detection (OED) task.
The goal is to detect ongoing event mentions only, as opposed to historical, future, hypothetical, or other forms or events that are neither fresh nor current.
Any application that needs to extract structured information about ongoing events from unstructured texts can take advantage of an OED system.
arXiv Detail & Related papers (2020-07-02T20:44:05Z) - Complex networks for event detection in heterogeneous high volume news
streams [0.0]
The volume and rate of online news increases the need for automated event detection methods thatcan operate in real time.
We develop a network-based approach that makes the workingassumption that important news events always involve named entities that are linked in news articles.
arXiv Detail & Related papers (2020-05-28T02:45:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.