EvEval: A Comprehensive Evaluation of Event Semantics for Large Language
Models
- URL: http://arxiv.org/abs/2305.15268v1
- Date: Wed, 24 May 2023 15:55:40 GMT
- Title: EvEval: A Comprehensive Evaluation of Event Semantics for Large Language
Models
- Authors: Zhengwei Tao, Zhi Jin, Xiaoying Bai, Haiyan Zhao, Yanlin Feng, Jia Li,
Wenpeng Hu
- Abstract summary: Events serve as fundamental units of occurrence within various contexts.
Recent studies have begun leveraging large language models (LLMs) to address event semantic processing.
We propose an overarching framework for event semantic processing, encompassing understanding, reasoning, and prediction.
- Score: 31.704144542866636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Events serve as fundamental units of occurrence within various contexts. The
processing of event semantics in textual information forms the basis of
numerous natural language processing (NLP) applications. Recent studies have
begun leveraging large language models (LLMs) to address event semantic
processing. However, the extent that LLMs can effectively tackle these
challenges remains uncertain. Furthermore, the lack of a comprehensive
evaluation framework for event semantic processing poses a significant
challenge in evaluating these capabilities. In this paper, we propose an
overarching framework for event semantic processing, encompassing
understanding, reasoning, and prediction, along with their fine-grained
aspects. To comprehensively evaluate the event semantic processing abilities of
models, we introduce a novel benchmark called EVEVAL. We collect 8 datasets
that cover all aspects of event semantic processing. Extensive experiments are
conducted on EVEVAL, leading to several noteworthy findings based on the
obtained results.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - MAVEN-Fact: A Large-scale Event Factuality Detection Dataset [55.01875707021496]
We introduce MAVEN-Fact, a large-scale and high-quality EFD dataset based on the MAVEN dataset.
MAVEN-Fact includes factuality annotations of 112,276 events, making it the largest EFD dataset.
Experiments demonstrate that MAVEN-Fact is challenging for both conventional fine-tuned models and large language models (LLMs)
arXiv Detail & Related papers (2024-07-22T03:43:46Z) - Synergetic Event Understanding: A Collaborative Approach to Cross-Document Event Coreference Resolution with Large Language Models [41.524192769406945]
Cross-document event coreference resolution (CDECR) involves clustering event mentions across multiple documents that refer to the same real-world events.
Existing approaches utilize fine-tuning of small language models (SLMs) to address the compatibility among the contexts of event mentions.
We propose a collaborative approach for CDECR, leveraging the capabilities of both a universally capable LLM and a task-specific SLM.
arXiv Detail & Related papers (2024-06-04T09:35:47Z) - A Large-Scale Evaluation of Speech Foundation Models [110.95827399522204]
We establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the foundation model paradigm for speech.
We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads.
arXiv Detail & Related papers (2024-04-15T00:03:16Z) - Can Large Language Models Understand Context? [17.196362853457412]
This paper introduces a context understanding benchmark by adapting existing datasets to suit the evaluation of generative models.
Experimental results indicate that pre-trained dense models struggle with understanding more nuanced contextual features when compared to state-of-the-art fine-tuned models.
As LLM compression holds growing significance in both research and real-world applications, we assess the context understanding of quantized models under in-context-learning settings.
arXiv Detail & Related papers (2024-02-01T18:55:29Z) - TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction [131.7684896032888]
We present TextEE, a standardized, fair, and reproducible benchmark for event extraction.
TextEE comprises standardized data preprocessing scripts and splits for 16 datasets spanning eight diverse domains.
We evaluate five varied large language models on our TextEE benchmark and demonstrate how they struggle to achieve satisfactory performance.
arXiv Detail & Related papers (2023-11-16T04:43:03Z) - Semantic Pivoting Model for Effective Event Detection [19.205550116466604]
Event Detection aims to identify and classify mentions of event instances from unstructured articles.
Existing techniques for event detection only use homogeneous one-hot vectors to represent the event type classes, ignoring the fact that the semantic meaning of the types is important to the task.
We propose a Semantic Pivoting Model for Effective Event Detection (SPEED), which explicitly incorporates prior information during training and captures semantically meaningful correlations between input and events.
arXiv Detail & Related papers (2022-11-01T19:20:34Z) - Actuarial Applications of Natural Language Processing Using
Transformers: Case Studies for Using Text Features in an Actuarial Context [0.0]
This tutorial demonstrates to incorporate text data into actuarial classification and regression tasks.
The main focus is on methods employing transformer-based models.
The case studies tackle challenges related to a multi-lingual setting and long input sequences.
arXiv Detail & Related papers (2022-06-04T15:39:30Z) - Did the Cat Drink the Coffee? Challenging Transformers with Generalized
Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit
Our results show that TLMs can reach performances that are comparable to those achieved by SDM.
However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.