Analyzing Folktales of Different Regions Using Topic Modeling and
Clustering
- URL: http://arxiv.org/abs/2206.04221v1
- Date: Thu, 9 Jun 2022 02:04:18 GMT
- Title: Analyzing Folktales of Different Regions Using Topic Modeling and
Clustering
- Authors: Jacob Werzinsky, Zhiyan Zhong, Xuedan Zou
- Abstract summary: This paper employs two major natural language processing techniques, topic modeling and clustering, to find patterns in folktales.
We show that the common trends between folktales are family, food, traditional gender roles, mythological figures, and animals.
Our results demonstrate the prevalence of certain elements in cultures across the world.
- Score: 2.2559617939136505
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper employs two major natural language processing techniques, topic
modeling and clustering, to find patterns in folktales and reveal cultural
relationships between regions. In particular, we used Latent Dirichlet
Allocation and BERTopic to extract the recurring elements as well as K-means
clustering to group folktales. Our paper tries to answer the question what are
the similarities and differences between folktales, and what do they say about
culture. Here we show that the common trends between folktales are family,
food, traditional gender roles, mythological figures, and animals. Also,
folktales topics differ based on geographical location with folktales found in
different regions having different animals and environment. We were not
surprised to find that religious figures and animals are some of the common
topics in all cultures. However, we were surprised that European and Asian
folktales were often paired together. Our results demonstrate the prevalence of
certain elements in cultures across the world. We anticipate our work to be a
resource to future research of folktales and an example of using natural
language processing to analyze documents in specific domains. Furthermore,
since we only analyzed the documents based on their topics, more work could be
done in analyzing the structure, sentiment, and the characters of these
folktales.
Related papers
- Folklore in Software Engineering: A Definition and Conceptual Foundations [2.203528070421802]
We explore the concept of folklore within software engineering, drawing from folklore studies to define and characterize narratives, myths, rituals, humor, and informal knowledge that circulate within software development communities.<n>We conducted semi-structured interviews with 12 industrial practitioners in Sweden to explore how such narratives are recognized or transmitted within their daily work and how they affect it.<n>We argue that making the concept of software engineering folklore explicit provides a foundation for subsequent ethnography and folklore studies and for reflective practice that can preserve context-effective exemplars while challenging unhelpful folklore.
arXiv Detail & Related papers (2026-01-29T14:56:32Z) - Talking to Extraordinary Objects: Folktales Offer Analogies for Interacting with Technology [0.0]
In the world of folktales, language is everywhere and talking to extraordinary objects is not unusual.<n>This overview presents examples of the analogies that folktales offer.
arXiv Detail & Related papers (2026-01-10T01:04:24Z) - Biased Tales: Cultural and Topic Bias in Generating Children's Stories [40.7784118893226]
Biased Tales is a dataset designed to analyze how biases influence protagonists' attributes and story elements.<n>Our analysis uncovers striking disparities. When the protagonist is described as a girl (as compared to a boy), appearance-related attributes increase by 55.26%.<n> Stories featuring non-Western children disproportionately emphasize cultural heritage, tradition, and family themes far more than those for Western children.
arXiv Detail & Related papers (2025-09-09T16:51:16Z) - Cross-Lingual and Cross-Cultural Variation in Image Descriptions [2.8664758928324883]
We conduct the first large-scale empirical study of cross-lingual variation in image descriptions.
We use a multimodal dataset with 31 languages and images from diverse locations.
Our analysis reveals that pairs of languages that are geographically or genetically closer tend to mention the same entities more frequently.
arXiv Detail & Related papers (2024-09-25T05:57:09Z) - Are Fairy Tales Fair? Analyzing Gender Bias in Temporal Narrative Event
Chains of Children's Fairy Tales [46.65377334112404]
Social biases and stereotypes are embedded in our culture in part through their presence in our stories.
We propose a computational pipeline that automatically extracts a story's temporal narrative verb-based event chain for each of its characters.
We also present a verb-based event annotation scheme that can facilitate bias analysis by including categories such as those that align with traditional stereotypes.
arXiv Detail & Related papers (2023-05-26T05:29:37Z) - Comparing Biases and the Impact of Multilingual Training across Multiple
Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z) - A Moral- and Event- Centric Inspection of Gender Bias in Fairy Tales at
A Large Scale [50.92540580640479]
We computationally analyze gender bias in a fairy tale dataset containing 624 fairy tales from 7 different cultures.
We find that the number of male characters is two times that of female characters, showing a disproportionate gender representation.
Female characters turn out more associated with care-, loyalty- and sanctity- related moral words, while male characters are more associated with fairness- and authority- related moral words.
arXiv Detail & Related papers (2022-11-25T19:38:09Z) - American cultural regions mapped through the lexical analysis of social
media [1.8199326045904993]
This work takes a crucial step in this direction by introducing a method to infer cultural regions based on the automatic analysis of large datasets from microblogging posts.
Specifically, regional variations in written discourse are measured in American social media.
Through a hierarchical clustering of the data in this lower-dimensional space, this method yields clear cultural areas and the topics of discussion that define them.
arXiv Detail & Related papers (2022-08-16T10:18:47Z) - Computational Lens on Cognition: Study Of Autobiographical Versus
Imagined Stories With Large-Scale Language Models [95.88620740809004]
We study differences in the narrative flow of events in autobiographical versus imagined stories using GPT-3.
We found that imagined stories have higher sequentiality than autobiographical stories.
In comparison to imagined stories, autobiographical stories contain more concrete words and words related to the first person.
arXiv Detail & Related papers (2022-01-07T20:10:47Z) - Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning [49.04866469947569]
We construct a Geo-Diverse Visual Commonsense Reasoning dataset (GD-VCR) to test vision-and-language models' ability to understand cultural and geo-location-specific commonsense.
We find that the performance of both models for non-Western regions including East Asia, South Asia, and Africa is significantly lower than that for Western region.
arXiv Detail & Related papers (2021-09-14T17:52:55Z) - It's not Rocket Science : Interpreting Figurative Language in Narratives [48.84507467131819]
We study the interpretation of two non-compositional figurative languages (idioms and similes)
Our experiments show that models based solely on pre-trained language models perform substantially worse than humans on these tasks.
We additionally propose knowledge-enhanced models, adopting human strategies for interpreting figurative language.
arXiv Detail & Related papers (2021-08-31T21:46:35Z) - Legends: Folklore on Reddit [0.4924126492174801]
We introduce Reddit legends, a collection of venerated old posts that have become famous on Reddit.
We show that Reddit legends can indeed be considered as folklore and that they are amendable to systematic text-based approaches.
arXiv Detail & Related papers (2020-07-01T20:55:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.