STONYBOOK: A System and Resource for Large-Scale Analysis of Novels
- URL: http://arxiv.org/abs/2311.03614v1
- Date: Mon, 6 Nov 2023 23:46:40 GMT
- Title: STONYBOOK: A System and Resource for Large-Scale Analysis of Novels
- Authors: Charuta Pethe, Allen Kim, Rajesh Prabhakar, Tanzir Pial, Steven Skiena
- Abstract summary: Books have historically been the primary mechanism through which narratives are transmitted.
We have developed a collection of resources for the large-scale analysis of novels.
- Score: 11.304581370821756
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Books have historically been the primary mechanism through which narratives
are transmitted. We have developed a collection of resources for the
large-scale analysis of novels, including: (1) an open source end-to-end NLP
analysis pipeline for the annotation of novels into a standard XML format, (2)
a collection of 49,207 distinct cleaned and annotated novels, and (3) a
database with an associated web interface for the large-scale aggregate
analysis of these literary works. We describe the major functionalities
provided in the annotation system along with their utilities. We present
samples of analysis artifacts from our website, such as visualizations of
character occurrences and interactions, similar books, representative
vocabulary, part of speech statistics, and readability metrics. We also
describe the use of the annotated format in qualitative and quantitative
analysis across large corpora of novels.
Related papers
- BookWorm: A Dataset for Character Description and Analysis [59.186325346763184]
We define two tasks: character description, which generates a brief factual profile, and character analysis, which offers an in-depth interpretation.
We introduce the BookWorm dataset, pairing books from the Gutenberg Project with human-written descriptions and analyses.
Our findings show that retrieval-based approaches outperform hierarchical ones in both tasks.
arXiv Detail & Related papers (2024-10-14T10:55:58Z) - Bridging Research and Readers: A Multi-Modal Automated Academic Papers
Interpretation System [47.13932723910289]
We introduce an open-source multi-modal automated academic paper interpretation system (MMAPIS) with three-step process stages.
It employs the hybrid modality preprocessing and alignment module to extract plain text, and tables or figures from documents separately.
It then aligns this information based on the section names they belong to, ensuring that data with identical section names are categorized under the same section.
It utilizes the extracted section names to divide the article into shorter text segments, facilitating specific summarizations both within and between sections via LLMs.
arXiv Detail & Related papers (2024-01-17T11:50:53Z) - Panel Transitions for Genre Analysis in Visual Narratives [1.320904960556043]
We present a novel approach to do a multi-modal analysis of genre based on comics and manga-style visual narratives.
We highlight some of the limitations and challenges of our existing computational approaches in modeling subjective labels.
arXiv Detail & Related papers (2023-12-14T08:05:09Z) - One Graph to Rule them All: Using NLP and Graph Neural Networks to
analyse Tolkien's Legendarium [3.0448872422956432]
We study character networks extracted from a text corpus of J.R.R. Tolkien's Legendarium.
We show that this perspective helps us to analyse and visualise the narrative style that characterises Tolkien's works.
arXiv Detail & Related papers (2022-10-14T14:47:56Z) - BASS: Boosting Abstractive Summarization with Unified Semantic Graph [49.48925904426591]
BASS is a framework for Boosting Abstractive Summarization based on a unified Semantic graph.
A graph-based encoder-decoder model is proposed to improve both the document representation and summary generation process.
Empirical results show that the proposed architecture brings substantial improvements for both long-document and multi-document summarization tasks.
arXiv Detail & Related papers (2021-05-25T16:20:48Z) - Modeling Social Readers: Novel Tools for Addressing Reception from
Online Book Reviews [0.0]
We study the readers' distillation of the main storylines in a novel using a corpus of reviews of five popular novels.
We make three important contributions to the study of infinite vocabulary networks.
We present a new sequencing algorithm, REV2SEQ, that generates a consensus sequence of events based on partial trajectories aggregated from the reviews.
arXiv Detail & Related papers (2021-05-03T20:10:14Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - Quasi Error-free Text Classification and Authorship Recognition in a
large Corpus of English Literature based on a Novel Feature Set [0.0]
We show that in the entire GLEC quasi error-free text classification and authorship recognition is possible with a method using the same set of five style and five content features.
Our data pave the way for many future computational and empirical studies of literature or experiments in reading psychology.
arXiv Detail & Related papers (2020-10-21T07:39:55Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - A Hierarchical Network for Abstractive Meeting Summarization with
Cross-Domain Pretraining [52.11221075687124]
We propose a novel abstractive summary network that adapts to the meeting scenario.
We design a hierarchical structure to accommodate long meeting transcripts and a role vector to depict the difference among speakers.
Our model outperforms previous approaches in both automatic metrics and human evaluation.
arXiv Detail & Related papers (2020-04-04T21:00:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.