A ripple in time: a discontinuity in American history
- URL: http://arxiv.org/abs/2312.01185v6
- Date: Wed, 09 Oct 2024 21:40:46 GMT
- Title: A ripple in time: a discontinuity in American history
- Authors: Alexander Kolpakov, Igor Rivin,
- Abstract summary: We suggest a novel approach to discover temporal (related and unrelated to language dilation) and personality (authorship attribution) aspects in historical datasets.
We exemplify our approach on the State of the Union addresses given by the past 42 US presidents.
- Score: 49.84018914962972
- License:
- Abstract: In this technical note we suggest a novel approach to discover temporal (related and unrelated to language dilation) and personality (authorship attribution) aspects in historical datasets. We exemplify our approach on the State of the Union addresses given by the past 42 US presidents: this dataset is known for its relatively small amount of data, and high variability of the size and style of texts. Nevertheless, we manage to achieve about 95\% accuracy on the authorship attribution task, and pin down the date of writing to a single presidential term.
Related papers
- Contrastive Entity Coreference and Disambiguation for Historical Texts [2.446672595462589]
Existing entity disambiguation methods often fall short in accuracy for historical documents, which are replete with individuals not remembered in contemporary knowledgebases.
This study makes three key contributions to improve cross-document coreference resolution and disambiguation in historical texts.
arXiv Detail & Related papers (2024-06-21T18:22:14Z) - Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by
Self-Supervised Representation Mixing and Embedding Initialization [57.38123229553157]
This paper presents an effective transfer learning framework for language adaptation in text-to-speech systems.
We focus on achieving language adaptation using minimal labeled and unlabeled data.
Experimental results show that our framework is able to synthesize intelligible speech in unseen languages with only 4 utterances of labeled data and 15 minutes of unlabeled data.
arXiv Detail & Related papers (2024-01-23T21:55:34Z) - FRACAS: A FRench Annotated Corpus of Attribution relations in newS [0.0]
We present a manually annotated corpus of 1676 newswire texts in French for quotation extraction and source attribution.
We first describe the composition of our corpus and the choices that were made in selecting the data.
We then detail our inter-annotator agreement between the 8 annotators who worked on manual labelling.
arXiv Detail & Related papers (2023-09-19T13:19:54Z) - Evaluation of Faithfulness Using the Longest Supported Subsequence [52.27522262537075]
We introduce a novel approach to evaluate faithfulness of machine-generated text by computing the longest noncontinuous of the claim that is supported by the context.
Using a new human-annotated dataset, we finetune a model to generate Longest Supported Subsequence (LSS)
Our proposed metric demonstrates an 18% enhancement over the prevailing state-of-the-art metric for faithfulness on our dataset.
arXiv Detail & Related papers (2023-08-23T14:18:44Z) - Transfer Learning across Several Centuries: Machine and Historian
Integrated Method to Decipher Royal Secretary's Diary [1.105375732595832]
NER in historical text has faced challenges such as scarcity of annotated corpus, multilanguage variety, various noise, and different convention far different from the contemporary language model.
This paper introduces Korean historical corpus (Diary of Royal secretary which is named SeungJeongWon) recorded over several centuries and recently added with named entity information as well as phrase markers which historians carefully annotated.
arXiv Detail & Related papers (2023-06-26T11:00:35Z) - The Effects of Character-Level Data Augmentation on Style-Based Dating
of Historical Manuscripts [5.285396202883411]
This article explores the influence of data augmentation on the dating of historical manuscripts.
Linear Support Vector Machines were trained with k-fold cross-validation on textural and grapheme-based features extracted from historical manuscripts.
Results show that training models with augmented data improve the performance of historical manuscripts dating by 1% - 3% in cumulative scores.
arXiv Detail & Related papers (2022-12-15T15:55:44Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Summarising Historical Text in Modern Languages [13.886432536330805]
We introduce the task of historical text summarisation, where documents in historical forms of a language are summarised in the corresponding modern language.
This is a fundamentally important routine to historians and digital humanities researchers but has never been automated.
We compile a high-quality gold-standard text summarisation dataset, which consists of historical German and Chinese news from hundreds of years ago summarised in modern German or Chinese.
arXiv Detail & Related papers (2021-01-26T13:00:07Z) - Temporal Common Sense Acquisition with Minimal Supervision [77.8308414884754]
This work proposes a novel sequence modeling approach that exploits explicit and implicit mentions of temporal common sense.
Our method is shown to give quality predictions of various dimensions of temporal common sense.
It also produces representations of events for relevant tasks such as duration comparison, parent-child relations, event coreference and temporal QA.
arXiv Detail & Related papers (2020-05-08T22:20:16Z) - Politeness Transfer: A Tag and Generate Approach [167.9924201435888]
This paper introduces a new task of politeness transfer.
It involves converting non-polite sentences to polite sentences while preserving the meaning.
We design a tag and generate pipeline that identifies stylistic attributes and subsequently generates a sentence in the target style.
arXiv Detail & Related papers (2020-04-29T15:08:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.