An Analysis of Letter Dynamics in the English Alphabet
- URL: http://arxiv.org/abs/2401.15560v1
- Date: Sun, 28 Jan 2024 03:54:41 GMT
- Title: An Analysis of Letter Dynamics in the English Alphabet
- Authors: Neil Zhao, Diana Zheng
- Abstract summary: We expanded on the statistical analysis of the English alphabet by examining the average frequency which each letter appears in different categories of writings.
We developed a metric known as distance, d that can be used to algorithmically recognize different categories of writings.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The frequency with which the letters of the English alphabet appear in
writings has been applied to the field of cryptography, the development of
keyboard mechanics, and the study of linguistics. We expanded on the
statistical analysis of the English alphabet by examining the average frequency
which each letter appears in different categories of writings. We evaluated
news articles, novels, plays, scientific publications and calculated the
frequency of each letter of the alphabet, the information density of each
letter, and the overall letter distribution. Furthermore, we developed a metric
known as distance, d that can be used to algorithmically recognize different
categories of writings. The results of our study can be applied to information
transmission, large data curation, and linguistics.
Related papers
- Structured Analysis and Comparison of Alphabets in Historical Handwritten Ciphers [3.423211639513232]
We propose the CSI metric, a novel way of comparing pairs of ciphered documents.
We assess their effectiveness in an unsupervised clustering scenario utilising visual features, including SIFT, pre-trained learnt embeddings, and OCR descriptors.
arXiv Detail & Related papers (2024-10-29T10:12:16Z) - Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia [49.80565462746646]
We introduce the InfoGap method -- an efficient and reliable approach to locating information gaps and inconsistencies in articles at the fact level.
We evaluate InfoGap by analyzing LGBT people's portrayals, across 2.7K biography pages on English, Russian, and French Wikipedias.
arXiv Detail & Related papers (2024-10-05T20:40:49Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Information-Theoretic Characterization of Vowel Harmony: A
Cross-Linguistic Study on Word Lists [18.138642719651994]
We define an information-theoretic measure of harmonicity based on predictability of vowels in a natural language lexicon.
We estimate this harmonicity using phoneme-level language models (PLMs)
Our work demonstrates that word lists are a valuable resource for typological research.
arXiv Detail & Related papers (2023-08-09T11:32:16Z) - A Dataset of Inertial Measurement Units for Handwritten English
Alphabets [16.74710649245842]
This paper presents an end-to-end methodology for collecting datasets to recognize handwritten English alphabets.
The IMUs are utilized to capture the dynamic movement patterns associated with handwriting, enabling more accurate recognition of alphabets.
arXiv Detail & Related papers (2023-07-05T17:54:36Z) - Differentiable Allophone Graphs for Language-Universal Speech
Recognition [77.2981317283029]
Building language-universal speech recognition systems entails producing phonological units of spoken sound that can be shared across languages.
We present a general framework to derive phone-level supervision from only phonemic transcriptions and phone-to-phoneme mappings.
We build a universal phone-based speech recognition model with interpretable probabilistic phone-to-phoneme mappings for each language.
arXiv Detail & Related papers (2021-07-24T15:09:32Z) - A Survey of Embedding Space Alignment Methods for Language and Knowledge
Graphs [77.34726150561087]
We survey the current research landscape on word, sentence and knowledge graph embedding algorithms.
We provide a classification of the relevant alignment techniques and discuss benchmark datasets used in this field of research.
arXiv Detail & Related papers (2020-10-26T16:08:13Z) - A Large Multi-Target Dataset of Common Bengali Handwritten Graphemes [1.009810782568186]
We propose a labeling scheme that makes segmentation in-side alpha-syllabary words linear.
The dataset contains 411k curated samples of 1295 unique commonly used Bengali graphemes.
The dataset is open-sourced as a part of a public Handwritten Grapheme Classification Challenge on Kaggle.
arXiv Detail & Related papers (2020-10-01T01:51:45Z) - The 'Letter' Distribution in the Chinese Language [24.507787098011907]
Studies have found that letters in some alphabetic writing languages have strikingly similar statistical usage frequency distributions.
This study provides new evidence of the consistency of human languages.
arXiv Detail & Related papers (2020-05-26T05:18:56Z) - Linguistic Typology Features from Text: Inferring the Sparse Features of
World Atlas of Language Structures [73.06435180872293]
We construct a recurrent neural network predictor based on byte embeddings and convolutional layers.
We show that some features from various linguistic types can be predicted reliably.
arXiv Detail & Related papers (2020-04-30T21:00:53Z) - Evaluating Transformer-Based Multilingual Text Classification [55.53547556060537]
We argue that NLP tools perform unequally across languages with different syntactic and morphological structures.
We calculate word order and morphological similarity indices to aid our empirical study.
arXiv Detail & Related papers (2020-04-29T03:34:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.