Situation and Behavior Understanding by Trope Detection on Films
- URL: http://arxiv.org/abs/2101.07632v1
- Date: Tue, 19 Jan 2021 14:09:54 GMT
- Title: Situation and Behavior Understanding by Trope Detection on Films
- Authors: Chen-Hsi Chang, Hung-Ting Su, Juiheng Hsu, Yu-Siang Wang, Yu-Cheng
Chang, Zhe Yu Liu, Ya-Liang Chang, Wen-Feng Cheng, Ke-Jyun Wang and Winston
H. Hsu
- Abstract summary: We present a novel task, trope detection on films, in an effort to create a situation and behavior understanding for machines.
We introduce a new dataset, Tropes in Movie Synopses (TiMoS), with 5623 movie synopses and 95 different tropes collecting from a Wikipedia-style database, TVTropes.
We present a multi-stream comprehension network (MulCom) leveraging multi-level attention of words, sentences, and role relations.
- Score: 26.40954537814751
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The human ability of deep cognitive skills are crucial for the development of
various real-world applications that process diverse and abundant user
generated input. While recent progress of deep learning and natural language
processing have enabled learning system to reach human performance on some
benchmarks requiring shallow semantics, such human ability still remains
challenging for even modern contextual embedding models, as pointed out by many
recent studies. Existing machine comprehension datasets assume sentence-level
input, lack of casual or motivational inferences, or could be answered with
question-answer bias. Here, we present a challenging novel task, trope
detection on films, in an effort to create a situation and behavior
understanding for machines. Tropes are storytelling devices that are frequently
used as ingredients in recipes for creative works. Comparing to existing movie
tag prediction tasks, tropes are more sophisticated as they can vary widely,
from a moral concept to a series of circumstances, and embedded with
motivations and cause-and-effects. We introduce a new dataset, Tropes in Movie
Synopses (TiMoS), with 5623 movie synopses and 95 different tropes collecting
from a Wikipedia-style database, TVTropes. We present a multi-stream
comprehension network (MulCom) leveraging multi-level attention of words,
sentences, and role relations. Experimental result demonstrates that modern
models including BERT contextual embedding, movie tag prediction systems, and
relational networks, perform at most 37% of human performance (23.97/64.87) in
terms of F1 score. Our MulCom outperforms all modern baselines, by 1.5 to 5.0
F1 score and 1.5 to 3.0 mean of average precision (mAP) score. We also provide
a detailed analysis and human evaluation to pave ways for future research.
Related papers
- Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies [69.28082193942991]
This paper introduces a novel dataset, Tropes in Movies (TiM), designed as a testbed for exploring two critical yet previously overlooked video reasoning skills.
utilizing tropes from movie storytelling, TiM evaluates the reasoning capabilities of state-of-the-art LLM-based approaches.
To address these deficiencies, we propose Face-Enhanced Viper of Role Interactions (FEVoRI) and Context Query Reduction (ConQueR)
arXiv Detail & Related papers (2024-06-16T12:58:31Z) - ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life
Videos [53.92440577914417]
ACQUIRED consists of 3.9K annotated videos, encompassing a wide range of event types and incorporating both first and third-person viewpoints.
Each video is annotated with questions that span three distinct dimensions of reasoning, including physical, social, and temporal.
We benchmark our dataset against several state-of-the-art language-only and multimodal models and experimental results demonstrate a significant performance gap.
arXiv Detail & Related papers (2023-11-02T22:17:03Z) - JECC: Commonsense Reasoning Tasks Derived from Interactive Fictions [75.42526766746515]
We propose a new commonsense reasoning dataset based on human's Interactive Fiction (IF) gameplay walkthroughs.
Our dataset focuses on the assessment of functional commonsense knowledge rules rather than factual knowledge.
Experiments show that the introduced dataset is challenging to previous machine reading models as well as the new large language models.
arXiv Detail & Related papers (2022-10-18T19:20:53Z) - Explainable Verbal Deception Detection using Transformers [1.5104201344012347]
This paper proposes and evaluates six deep-learning models, including combinations of BERT (and RoBERTa), MultiHead Attention, co-attentions, and transformers.
The findings suggest that our transformer-based models can enhance automated deception detection performances (+2.11% in accuracy)
arXiv Detail & Related papers (2022-10-06T17:36:00Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - TrUMAn: Trope Understanding in Movies and Animations [19.80173687261055]
We present a Trope Understanding and Storytelling (TrUSt) dataset with a new Conceptual module.
TrUSt guides the video encoder by performing video storytelling on a latent space.
Experimental results demonstrate that state-of-the-art learning systems on existing tasks reach only 12.01% of accuracy with raw input signals.
arXiv Detail & Related papers (2021-08-10T09:34:14Z) - MERLOT: Multimodal Neural Script Knowledge Models [74.05631672657452]
We introduce MERLOT, a model that learns multimodal script knowledge by watching millions of YouTube videos with transcribed speech.
MERLOT exhibits strong out-of-the-box representations of temporal commonsense, and achieves state-of-the-art performance on 12 different video QA datasets.
On Visual Commonsense Reasoning, MERLOT answers questions correctly with 80.6% accuracy, outperforming state-of-the-art models of similar size by over 3%.
arXiv Detail & Related papers (2021-06-04T17:57:39Z) - My Teacher Thinks The World Is Flat! Interpreting Automatic Essay
Scoring Mechanism [71.34160809068996]
Recent work shows that automated scoring systems are prone to even common-sense adversarial samples.
We utilize recent advances in interpretability to find the extent to which features such as coherence, content and relevance are important for automated scoring mechanisms.
We also find that since the models are not semantically grounded with world-knowledge and common sense, adding false facts such as the world is flat'' actually increases the score instead of decreasing it.
arXiv Detail & Related papers (2020-12-27T06:19:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.