SAGA: A Participant-specific Examination of Story Alternatives and Goal Applicability for a Deeper Understanding of Complex Events
- URL: http://arxiv.org/abs/2408.05793v1
- Date: Sun, 11 Aug 2024 14:52:40 GMT
- Title: SAGA: A Participant-specific Examination of Story Alternatives and Goal Applicability for a Deeper Understanding of Complex Events
- Authors: Sai Vallurupalli, Katrin Erk, Francis Ferraro,
- Abstract summary: We argue that such knowledge can be elicited through a participant achievement lens.
We analyze a complex event in a narrative according to the intended achievements of the participants.
We show that smaller models fine-tuned on our dataset can achieve performance surpassing larger models.
- Score: 13.894639630989563
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Interpreting and assessing goal driven actions is vital to understanding and reasoning over complex events. It is important to be able to acquire the knowledge needed for this understanding, though doing so is challenging. We argue that such knowledge can be elicited through a participant achievement lens. We analyze a complex event in a narrative according to the intended achievements of the participants in that narrative, the likely future actions of the participants, and the likelihood of goal success. We collect 6.3K high quality goal and action annotations reflecting our proposed participant achievement lens, with an average weighted Fleiss-Kappa IAA of 80%. Our collection contains annotated alternate versions of each narrative. These alternate versions vary minimally from the "original" story, but can license drastically different inferences. Our findings suggest that while modern large language models can reflect some of the goal-based knowledge we study, they find it challenging to fully capture the design and intent behind concerted actions, even when the model pretraining included the data from which we extracted the goal knowledge. We show that smaller models fine-tuned on our dataset can achieve performance surpassing larger models.
Related papers
- ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions [66.20773952864802]
We develop a dataset consisting of 8.5k images and 59.3k inferences about actions grounded in those images.
We propose ActionCOMET, a framework to discern knowledge present in language models specific to the provided visual input.
arXiv Detail & Related papers (2024-10-17T15:22:57Z) - A Comprehensive Review of Few-shot Action Recognition [64.47305887411275]
Few-shot action recognition aims to address the high cost and impracticality of manually labeling complex and variable video data.
It requires accurately classifying human actions in videos using only a few labeled examples per class.
arXiv Detail & Related papers (2024-07-20T03:53:32Z) - GameEval: Evaluating LLMs on Conversational Games [93.40433639746331]
We propose GameEval, a novel approach to evaluating large language models (LLMs)
GameEval treats LLMs as game players and assigns them distinct roles with specific goals achieved by launching conversations of various forms.
We show that GameEval can effectively differentiate the capabilities of various LLMs, providing a comprehensive assessment of their integrated abilities to solve complex problems.
arXiv Detail & Related papers (2023-08-19T14:33:40Z) - Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning [6.253919624802853]
We propose a two-step, neuro-symbolic framework called ALGO to infer activities in egocentric videos with limited supervision.
First, we propose a neuro-symbolic prompting approach that uses object-centric vision-language models as a noisy oracle to ground objects in the video.
Second, driven by prior commonsense knowledge, we discover plausible activities through an energy-based symbolic pattern theory framework.
arXiv Detail & Related papers (2023-05-26T03:21:30Z) - POQue: Asking Participant-specific Outcome Questions for a Deeper
Understanding of Complex Events [26.59626509200256]
We show that crowd workers are able to infer the collective impact of salient events that make up the situation.
By creating a multi-step interface, we collect a high quality annotated dataset of 8K short newswire narratives and ROCStories.
Our dataset, POQue, enables the exploration and development of models that address multiple aspects of semantic understanding.
arXiv Detail & Related papers (2022-12-05T22:23:27Z) - H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding
Object Articulations from Interactions [62.510951695174604]
"Hypothesize, Simulate, Act, Update, and Repeat" (H-SAUR) is a probabilistic generative framework that generates hypotheses about how objects articulate given input observations.
We show that the proposed model significantly outperforms the current state-of-the-art articulated object manipulation framework.
We further improve the test-time efficiency of H-SAUR by integrating a learned prior from learning-based vision models.
arXiv Detail & Related papers (2022-10-22T18:39:33Z) - Are All Steps Equally Important? Benchmarking Essentiality Detection of
Events [92.92425231146433]
This paper examines the extent to which current models comprehend the essentiality of step events in relation to a goal event.
We contribute a high-quality corpus of (goal, step) pairs gathered from the community guideline website WikiHow.
The high inter-annotator agreement demonstrates that humans possess a consistent understanding of event essentiality.
arXiv Detail & Related papers (2022-10-08T18:00:22Z) - WinoGAViL: Gamified Association Benchmark to Challenge
Vision-and-Language Models [91.92346150646007]
In this work, we introduce WinoGAViL: an online game to collect vision-and-language associations.
We use the game to collect 3.5K instances, finding that they are intuitive for humans but challenging for state-of-the-art AI models.
Our analysis as well as the feedback we collect from players indicate that the collected associations require diverse reasoning skills.
arXiv Detail & Related papers (2022-07-25T23:57:44Z) - It's the Same Old Story! Enriching Event-Centric Knowledge Graphs by
Narrative Aspects [0.3655021726150368]
We introduce a novel and lightweight structure for event-centric knowledge graphs, which for the first time allows for queries incorporating viewpoint-dependent and narrative aspects.
Our experiments prove the effective incorporation of subjective attributions for event participants and show the benefits of specifically tailored indexes for narrative query processing.
arXiv Detail & Related papers (2022-05-08T14:00:41Z) - Visual Goal-Step Inference using wikiHow [29.901908251322684]
Inferring the sub-sequence of steps of a goal can help artificial intelligence systems reason about human activities.
We propose the Visual Goal-Step Inference (VGSI) task where a model is given a textual goal and must choose a plausible step towards that goal from among four candidate images.
We show that the knowledge learned from our data can effectively transfer to other datasets like HowTo100M, increasing the multiple-choice accuracy by 15% to 20%.
arXiv Detail & Related papers (2021-04-12T22:20:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.