Related papers: Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses

Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses

URL: http://arxiv.org/abs/2409.14324v1
Date: Sun, 22 Sep 2024 05:50:18 GMT
Title: Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
Authors: Hung-Ting Su, Ya-Ching Hsu, Xudong Lin, Xiang-Qian Shi, Yulei Niu, Han-Yuan Hsu, Hung-yi Lee, Winston H. Hsu,
Abstract summary: Large language models equipped with chain-of-thoughts (CoT) prompting have shown significant multi-step reasoning capabilities. This study utilizes tropes in movie synopses to assess the abstract reasoning abilities of state-of-the-art LLMs. We introduce a trope-wise querying approach to address these challenges and boost the F1 score by 11.8 points.
Score: 66.7212332602784
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) equipped with chain-of-thoughts (CoT) prompting have shown significant multi-step reasoning capabilities in factual content like mathematics, commonsense, and logic. However, their performance in narrative reasoning, which demands greater abstraction capabilities, remains unexplored. This study utilizes tropes in movie synopses to assess the abstract reasoning abilities of state-of-the-art LLMs and uncovers their low performance. We introduce a trope-wise querying approach to address these challenges and boost the F1 score by 11.8 points. Moreover, while prior studies suggest that CoT enhances multi-step reasoning, this study shows CoT can cause hallucinations in narrative content, reducing GPT-4's performance. We also introduce an Adversarial Injection method to embed trope-related text tokens into movie synopses without explicit tropes, revealing CoT's heightened sensitivity to such injections. Our comprehensive analysis provides insights for future research directions.

Related papers

Seeing Through the Chain: Mitigate Hallucination in Multimodal Reasoning Models via CoT Compression and Contrastive Preference Optimization [78.94590726578014]
multimodal reasoning models (MLRMs) remain prone to hallucinations, and effective solutions are still underexplored.<n>We propose C3PO, a training-based mitigation framework comprising textbfCompression and textbfPreference textbfOptimization.
arXiv Detail & Related papers (2026-02-03T11:00:55Z)
Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning [23.364264811510598]
Chain-of-Thought (CoT) prompting has achieved remarkable success in unlocking the reasoning capabilities of Large Language Models (LLMs)<n>We introduce Render-of-Thought (RoT), the first framework to reify the reasoning chain by rendering textual steps into images.<n>Our method achieves 3-4x token compression and substantial inference acceleration compared to explicit CoT.
arXiv Detail & Related papers (2026-01-21T08:09:25Z)
Rethinking Chain-of-Thought Reasoning for Videos [19.579424881079447]
Chain-of-thought (CoT) reasoning has been highly successful in solving complex tasks in natural language processing.<n>Recent multimodal large language models (MLLMs) have extended this paradigm to video reasoning.<n>Motivated by empirical observations, we hypothesize that concise reasoning combined with a reduced set of visual tokens can be sufficient for effective video reasoning.
arXiv Detail & Related papers (2025-12-10T13:05:55Z)
The Challenge of Teaching Reasoning to LLMs Without RL or Distillation [31.973226821366325]
Reasoning-capable language models achieve state-of-the-art performance in diverse complex tasks by generating long, explicit Chain-of-Thought traces.<n>We ask whether long CoT can be induced in a base model using only prompting or minimal tuning.<n>The resulting model outperforms the much larger textttQwen2.5-Math-72B-Instruct, showing that a handful of high-quality examples can unlock strong reasoning capabilities.
arXiv Detail & Related papers (2025-07-14T01:14:50Z)
TurnaboutLLM: A Deductive Reasoning Benchmark from Detective Games [9.196636783247135]
This paper introduces TurnaboutLLM, a novel framework and dataset for evaluating the deductive reasoning abilities of Large Language Models (LLMs)<n>The framework tasks LLMs with identifying contradictions between testimonies and evidences within long narrative contexts.<n>We evaluate twelve state-of-the-art LLMs on the dataset, hinting at limitations of popular strategies for enhancing deductive reasoning.
arXiv Detail & Related papers (2025-05-21T16:22:32Z)
When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs [16.659986373052217]
Chain-of-thought reasoning can significantly degrade instruction-following accuracy.<n>This is the first work to systematically expose reasoning-induced failures in instruction-following.
arXiv Detail & Related papers (2025-05-16T16:36:00Z)
The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning [39.613595533503144]
Chain-of-Thought (CoT) prompting has been widely recognized for its ability to enhance reasoning capabilities in large language models. We show that CoT consistently underperforms direct answering across varying model scales and benchmark complexities. Our analysis uncovers a fundamental explicit-implicit duality driving CoT's performance in pattern-based ICL.
arXiv Detail & Related papers (2025-04-07T13:51:06Z)
Attention Reveals More Than Tokens: Training-Free Long-Context Reasoning with Attention-guided Retrieval [33.84832445715185]
Large Language Models (LLMs) often exhibit substantially shorter effective context lengths than their claimed capacities. We propose a novel training-free algorithm, Attrieval, which leverages attention weights to retrieve relevant facts from the long context. Our results demonstrate that Attrieval enhances long-context reasoning capability notably on both synthetic and real-world QA datasets.
arXiv Detail & Related papers (2025-03-12T20:34:14Z)
STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training [87.58996020705258]
Video Large Language Models (Video-LLMs) have recently shown strong derivation in basic video understanding tasks. Video-LLMs struggle with compositional reasoning that requires multi-step explicit-temporal inference across object relations, interactions and events. We propose STEP, a novel graph-guided self-training method that enables VideoLLMs to generate reasoning-rich finetuning data from any raw videos to improve itself.
arXiv Detail & Related papers (2024-11-29T11:54:55Z)
Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies [69.28082193942991]
This paper introduces a novel dataset, Tropes in Movies (TiM), designed as a testbed for exploring two critical yet previously overlooked video reasoning skills. utilizing tropes from movie storytelling, TiM evaluates the reasoning capabilities of state-of-the-art LLM-based approaches. To address these deficiencies, we propose Face-Enhanced Viper of Role Interactions (FEVoRI) and Context Query Reduction (ConQueR)
arXiv Detail & Related papers (2024-06-16T12:58:31Z)
Optimizing Language Model's Reasoning Abilities with Weak Supervision [48.60598455782159]
We present textscPuzzleBen, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales. A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities.
arXiv Detail & Related papers (2024-05-07T07:39:15Z)
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents [80.5213198675411]
Large language models (LLMs) have dramatically enhanced the field of language intelligence. LLMs leverage the intriguing chain-of-thought (CoT) reasoning techniques, obliging them to formulate intermediate steps en route to deriving an answer. Recent research endeavors have extended CoT reasoning methodologies to nurture the development of autonomous language agents.
arXiv Detail & Related papers (2023-11-20T14:30:55Z)
Assessing Step-by-Step Reasoning against Lexical Negation: A Case Study on Syllogism [19.590120229602103]
Large language models (LLMs) take advantage of step-by-step reasoning instructions, e.g., chain-of-thought (CoT) prompting. In this study, we inspect the step-by-step reasoning ability of LLMs with a focus on negation.
arXiv Detail & Related papers (2023-10-23T12:40:41Z)
Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings [61.04460792203266]
We introduce VCoT, a novel method that leverages chain-of-thought prompting with vision-language grounding to bridge the logical gaps within sequential data. Our method uses visual guidance to generate synthetic multimodal infillings that add consistent and novel information to reduce the logical gaps for downstream tasks.
arXiv Detail & Related papers (2023-05-03T17:58:29Z)
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters [82.84696222087396]
Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs) We show that CoT reasoning is possible even with invalid demonstrations.
arXiv Detail & Related papers (2022-12-20T05:20:54Z)
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango [11.344587937052697]
This work initiates the preliminary steps towards a deeper understanding of reasoning mechanisms in large language models. Our work centers around querying the model while controlling for all but one of the components in a prompt: symbols, patterns, and text. We posit that text imbues patterns with commonsense knowledge and meaning.
arXiv Detail & Related papers (2022-09-16T02:54:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.