Multiple regression techniques for modeling dates of first performances
of Shakespeare-era plays
- URL: http://arxiv.org/abs/2104.05929v2
- Date: Wed, 14 Apr 2021 23:37:45 GMT
- Title: Multiple regression techniques for modeling dates of first performances
of Shakespeare-era plays
- Authors: Pablo Moscato, Hugh Craig, Gabriel Egan, Mohammad Nazmul Haque, Kevin
Huang, Julia Sloan, Jon Corrales de Oliveira
- Abstract summary: We took a set of Shakespeare-era plays (181 plays from the period 1585--1610) and added the best-guess dates for them from a standard reference work as metadata.
We applied 11 regression methods to predict the dates of the plays at an 80/20 training/test split.
An in-depth analysis of the most commonly occurring 20 words in the models in 100 independent runs helps explain the trends in linguistic and stylistic terms.
- Score: 2.1827922098806214
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The date of the first performance of a play of Shakespeare's time must
usually be guessed with reference to multiple indirect external sources, or to
some aspect of the content or style of the play. Identifying these dates is
important to literary history and to accounts of developing authorial styles,
such as Shakespeare's. In this study, we took a set of Shakespeare-era plays
(181 plays from the period 1585--1610), added the best-guess dates for them
from a standard reference work as metadata, and calculated a set of
probabilities of individual words in these samples. We applied 11 regression
methods to predict the dates of the plays at an 80/20 training/test split. We
withdrew one play at a time, used the best-guess date metadata with the
probabilities and weightings to infer its date, and thus built a model of
date-probabilities interaction. We introduced a memetic algorithm-based
Continued Fraction Regression (CFR) which delivered models using a small number
of variables, leading to an interpretable model and reduced dimensionality. An
in-depth analysis of the most commonly occurring 20 words in the CFR models in
100 independent runs helps explain the trends in linguistic and stylistic
terms. The analysis with the subset of words revealed an interesting
correlation of signature words with the Shakespeare-era play's genre.
Related papers
- CAST: Corpus-Aware Self-similarity Enhanced Topic modelling [16.562349140796115]
We introduce CAST: Corpus-Aware Self-similarity Enhanced Topic modelling, a novel topic modelling method.
We find self-similarity to be an effective metric to prevent functional words from acting as candidate topic words.
Our approach significantly enhances the coherence and diversity of generated topics, as well as the topic model's ability to handle noisy data.
arXiv Detail & Related papers (2024-10-19T15:27:11Z) - Reverse-Engineering the Reader [43.26660964074272]
We introduce a novel alignment technique in which we fine-tune a language model to implicitly optimize the parameters of a linear regressor.
Using words as a test case, we evaluate our technique across multiple model sizes and datasets.
We find an inverse relationship between psychometric power and a model's performance on downstream NLP tasks as well as its perplexity on held-out test data.
arXiv Detail & Related papers (2024-10-16T23:05:01Z) - Causal Micro-Narratives [62.47217054314046]
We present a novel approach to classify causal micro-narratives from text.
These narratives are sentence-level explanations of the cause(s) and/or effect(s) of a target subject.
arXiv Detail & Related papers (2024-10-07T17:55:10Z) - Deep Time Series Models: A Comprehensive Survey and Benchmark [74.28364194333447]
Time series data is of great significance in real-world scenarios.
Recent years have witnessed remarkable breakthroughs in the time series community.
We release Time Series Library (TSLib) as a fair benchmark of deep time series models for diverse analysis tasks.
arXiv Detail & Related papers (2024-07-18T08:31:55Z) - LFED: A Literary Fiction Evaluation Dataset for Large Language Models [58.85989777743013]
We collect 95 literary fictions that are either originally written in Chinese or translated into Chinese, covering a wide range of topics across several centuries.
We define a question taxonomy with 8 question categories to guide the creation of 1,304 questions.
We conduct an in-depth analysis to ascertain how specific attributes of literary fictions (e.g., novel types, character numbers, the year of publication) impact LLM performance in evaluations.
arXiv Detail & Related papers (2024-05-16T15:02:24Z) - Contrastive Difference Predictive Coding [79.74052624853303]
We introduce a temporal difference version of contrastive predictive coding that stitches together pieces of different time series data to decrease the amount of data required to learn predictions of future events.
We apply this representation learning method to derive an off-policy algorithm for goal-conditioned RL.
arXiv Detail & Related papers (2023-10-31T03:16:32Z) - Prediction Model For Wordle Game Results With High Robustness [0.0]
This study focuses on the dynamics of Wordle using data analysis and machine learning.
To predict word difficulty, we employed a Backpropagation Neural Network, overcoming overfitting via feature engineering.
Our findings indicate that on March 1st, 2023, around 12,884 results will be submitted and the word "eerie" averages 4.8 attempts, falling into the hardest difficulty cluster.
arXiv Detail & Related papers (2023-09-25T16:10:35Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - A data science and machine learning approach to continuous analysis of
Shakespeare's plays [0.0]
We apply machine learning analysis to the work of William Shakespeare.
The analysis shows clear changes in the style of writing over time.
Applying machine learning to make a stylometric prediction of the year of the play shows a Pearson correlation of 0.71.
arXiv Detail & Related papers (2023-01-15T06:25:50Z) - Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context.
Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR.
For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.