Few-Shot Character Understanding in Movies as an Assessment to
Meta-Learning of Theory-of-Mind
- URL: http://arxiv.org/abs/2211.04684v2
- Date: Fri, 2 Feb 2024 22:45:05 GMT
- Title: Few-Shot Character Understanding in Movies as an Assessment to
Meta-Learning of Theory-of-Mind
- Authors: Mo Yu, Qiujing Wang, Shunchi Zhang, Yisi Sang, Kangsheng Pu, Zekai
Wei, Han Wang, Liyan Xu, Jing Li, Yue Yu, Jie Zhou
- Abstract summary: Humans can quickly understand new fictional characters with a few observations, mainly by drawing analogies to fictional and real people they already know.
This reflects the few-shot and meta-learning essence of humans' inference of characters' mental states, i.e., theory-of-mind (ToM)
We fill this gap with a novel NLP dataset, ToM-in-AMC, the first assessment of machines' meta-learning of ToM in a realistic narrative understanding scenario.
- Score: 47.13015852330866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When reading a story, humans can quickly understand new fictional characters
with a few observations, mainly by drawing analogies to fictional and real
people they already know. This reflects the few-shot and meta-learning essence
of humans' inference of characters' mental states, i.e., theory-of-mind (ToM),
which is largely ignored in existing research. We fill this gap with a novel
NLP dataset, ToM-in-AMC, the first assessment of machines' meta-learning of ToM
in a realistic narrative understanding scenario. Our dataset consists of ~1,000
parsed movie scripts, each corresponding to a few-shot character understanding
task that requires models to mimic humans' ability of fast digesting characters
with a few starting scenes in a new movie.
We propose a novel ToM prompting approach designed to explicitly assess the
influence of multiple ToM dimensions. It surpasses existing baseline models,
underscoring the significance of modeling multiple ToM dimensions for our task.
Our extensive human study verifies that humans are capable of solving our
problem by inferring characters' mental states based on their previously seen
movies. In comparison, our systems based on either state-of-the-art large
language models (GPT-4) or meta-learning algorithms lags >20% behind,
highlighting a notable limitation in existing approaches' ToM capabilities.
Related papers
- Measuring Psychological Depth in Language Models [50.48914935872879]
We introduce the Psychological Depth Scale (PDS), a novel framework rooted in literary theory that measures an LLM's ability to produce authentic and narratively complex stories.
We empirically validate our framework by showing that humans can consistently evaluate stories based on PDS (0.72 Krippendorff's alpha)
Surprisingly, GPT-4 stories either surpassed or were statistically indistinguishable from highly-rated human-written stories sourced from Reddit.
arXiv Detail & Related papers (2024-06-18T14:51:54Z) - OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models [17.042114879350788]
Neural Theory-of-Mind (N-ToM) machine's ability to understand and keep track of the mental states of others is pivotal in developing socially intelligent agents.
OpenToM is a new benchmark for assessing N-ToM with longer and clearer narrative stories, explicit personality traits, and actions triggered by character intentions.
We reveal that state-of-the-art LLMs thrive at modeling certain aspects of mental states in the physical world but fall short when tracking characters' mental states in the psychological world.
arXiv Detail & Related papers (2024-02-08T20:35:06Z) - MMToM-QA: Multimodal Theory of Mind Question Answering [80.87550820953236]
Theory of Mind (ToM) is an essential ingredient for developing machines with human-level social intelligence.
Recent machine learning models, particularly large language models, seem to show some aspects of ToM understanding.
Human ToM, on the other hand, is more than video or text understanding.
People can flexibly reason about another person's mind based on conceptual representations extracted from any available data.
arXiv Detail & Related papers (2024-01-16T18:59:24Z) - Think Twice: Perspective-Taking Improves Large Language Models'
Theory-of-Mind Capabilities [63.90227161974381]
SimToM is a novel prompting framework inspired by Simulation Theory's notion of perspective-taking.
Our approach, which requires no additional training and minimal prompt-tuning, shows substantial improvement over existing methods.
arXiv Detail & Related papers (2023-11-16T22:49:27Z) - Towards A Holistic Landscape of Situated Theory of Mind in Large
Language Models [14.491223187047378]
Large Language Models (LLMs) have generated considerable interest and debate regarding their potential emergence of Theory of Mind (ToM)
Several recent inquiries reveal a lack of robust ToM in these models and pose a pressing demand to develop new benchmarks.
We taxonomize machine ToM into 7 mental state categories and delineate existing benchmarks to identify under-explored aspects of ToM.
arXiv Detail & Related papers (2023-10-30T15:12:09Z) - TVShowGuess: Character Comprehension in Stories as Speaker Guessing [23.21452223968301]
We propose a new task for assessing machines' skills of understanding fictional characters in narrative stories.
The task, TVShowGuess, builds on the scripts of TV series and takes the form of guessing the anonymous main characters based on the backgrounds of the scenes and the dialogues.
Our human study supports that this form of task covers comprehension of multiple types of character persona, including understanding characters' personalities, facts and memories of personal experience.
arXiv Detail & Related papers (2022-04-16T05:15:04Z) - Affect2MM: Affective Analysis of Multimedia Content Using Emotion
Causality [84.69595956853908]
We present Affect2MM, a learning method for time-series emotion prediction for multimedia content.
Our goal is to automatically capture the varying emotions depicted by characters in real-life human-centric situations and behaviors.
arXiv Detail & Related papers (2021-03-11T09:07:25Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.