A scale of conceptual orality and literacy: Automatic text categorization in the tradition of "Nähe und Distanz"
- URL: http://arxiv.org/abs/2502.03252v1
- Date: Wed, 05 Feb 2025 15:08:37 GMT
- Title: A scale of conceptual orality and literacy: Automatic text categorization in the tradition of "Nähe und Distanz"
- Authors: Volker Emmrich,
- Abstract summary: It is stipulated that written texts can be rated on a scale of conceptual orality and literacy by linguistic features.
This article establishes such a scale based on PCA and combines it with automatic analysis.
The scale is also discussed with a view to its use in corpus compilation and as a guide for analyzes in larger corpora.
- Score: 0.0
- License:
- Abstract: Koch and Oesterreicher's model of "N\"ahe und Distanz" (N\"ahe = immediacy, conceptual orality; Distanz = distance, conceptual literacy) is constantly used in German linguistics. However, there is no statistical foundation for use in corpus linguistic analyzes, while it is increasingly moving into empirical corpus linguistics. Theoretically, it is stipulated, among other things, that written texts can be rated on a scale of conceptual orality and literacy by linguistic features. This article establishes such a scale based on PCA and combines it with automatic analysis. Two corpora of New High German serve as examples. When evaluating established features, a central finding is that features of conceptual orality and literacy must be distinguished in order to rank texts in a differentiated manner. The scale is also discussed with a view to its use in corpus compilation and as a guide for analyzes in larger corpora. With a theory-driven starting point and as a "tailored" dimension, the approach compared to Biber's Dimension 1 is particularly suitable for these supporting, controlling tasks.
Related papers
- Entropy and type-token ratio in gigaword corpora [0.0]
We investigate entropy and text-token ratio, two metrics for lexical diversities, in six massive linguistic datasets in English, Spanish, and Turkish.
We find a functional relation between entropy and text-token ratio that holds across the corpora under consideration.
Our results contribute to the theoretical understanding of text structure and offer practical implications for fields like natural language processing.
arXiv Detail & Related papers (2024-11-15T14:40:59Z) - Tracing the Genealogies of Ideas with Large Language Model Embeddings [0.0]
I present a novel method to detect intellectual influence across a large corpus.
I apply this method to vectorize sentences from a corpus of roughly 400,000 nonfiction books and academic publications from the 19th century.
arXiv Detail & Related papers (2024-01-13T18:42:27Z) - SenteCon: Leveraging Lexicons to Learn Human-Interpretable Language
Representations [51.08119762844217]
SenteCon is a method for introducing human interpretability in deep language representations.
We show that SenteCon provides high-level interpretability at little to no cost to predictive performance on downstream tasks.
arXiv Detail & Related papers (2023-05-24T05:06:28Z) - O-Dang! The Ontology of Dangerous Speech Messages [53.15616413153125]
We present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG)
O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community.
It provides a model for encoding both gold standard and single-annotator labels in the KG.
arXiv Detail & Related papers (2022-07-13T11:50:05Z) - An Informational Space Based Semantic Analysis for Scientific Texts [62.997667081978825]
This paper introduces computational methods for semantic analysis and the quantifying the meaning of short scientific texts.
The representation of scientific-specific meaning is standardised by replacing the situation representations, rather than psychological properties.
The research in this paper conducts the base for the geometric representation of the meaning of texts.
arXiv Detail & Related papers (2022-05-31T11:19:32Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - A Neural Network-Based Linguistic Similarity Measure for Entrainment in
Conversations [12.052672647509732]
Linguistic entrainment is a phenomenon where people tend to mimic each other in conversation.
Most of the current similarity measures are based on bag-of-words approaches.
We propose to use a neural network model to perform the similarity measure for entrainment.
arXiv Detail & Related papers (2021-09-04T19:48:17Z) - Metrical Tagging in the Wild: Building and Annotating Poetry Corpora
with Rhythmic Features [0.0]
We provide large poetry corpora for English and German, and annotate prosodic features in smaller corpora to train corpus driven neural models.
We show that BiLSTM-CRF models with syllable embeddings outperform a CRF baseline and different BERT-based approaches.
arXiv Detail & Related papers (2021-02-17T16:38:57Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - A frame semantics based approach to comparative study of digitized
corpus [0.0]
The paper focuses on the morphologic, syntactic, and semantic annotation process of English-Arabic aligned corpus created from a digitized novels.
The present study argues that differences in motion events conceptualization across languages can be described with frame structure and frame-to-frame relations.
arXiv Detail & Related papers (2020-05-29T22:56:25Z) - A computational model implementing subjectivity with the 'Room Theory'.
The case of detecting Emotion from Text [68.8204255655161]
This work introduces a new method to consider subjectivity and general context dependency in text analysis.
By using similarity measure between words, we are able to extract the relative relevance of the elements in the benchmark.
This method could be applied to all the cases where evaluating subjectivity is relevant to understand the relative value or meaning of a text.
arXiv Detail & Related papers (2020-05-12T21:26:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.