Author-Specific Linguistic Patterns Unveiled: A Deep Learning Study on Word Class Distributions
- URL: http://arxiv.org/abs/2501.10072v1
- Date: Fri, 17 Jan 2025 09:43:49 GMT
- Title: Author-Specific Linguistic Patterns Unveiled: A Deep Learning Study on Word Class Distributions
- Authors: Patrick Krauss, Achim Schilling,
- Abstract summary: This study investigates author-specific word class distributions using part-of-speech (POS) tagging and bigram analysis.
By leveraging deep neural networks, we classify literary authors based on POS tag vectors and bigram frequency matrices derived from their works.
- Score: 0.0
- License:
- Abstract: Deep learning methods have been increasingly applied to computational linguistics to uncover patterns in text data. This study investigates author-specific word class distributions using part-of-speech (POS) tagging and bigram analysis. By leveraging deep neural networks, we classify literary authors based on POS tag vectors and bigram frequency matrices derived from their works. We employ fully connected and convolutional neural network architectures to explore the efficacy of unigram and bigram-based representations. Our results demonstrate that while unigram features achieve moderate classification accuracy, bigram-based models significantly improve performance, suggesting that sequential word class patterns are more distinctive of authorial style. Multi-dimensional scaling (MDS) visualizations reveal meaningful clustering of authors' works, supporting the hypothesis that stylistic nuances can be captured through computational methods. These findings highlight the potential of deep learning and linguistic feature analysis for author profiling and literary studies.
Related papers
- Analysis and Visualization of Linguistic Structures in Large Language Models: Neural Representations of Verb-Particle Constructions in BERT [0.0]
This study investigates the internal representations of verb-particle combinations within large language models (LLMs)
We analyse the representational efficacy of its layers for various verb-particle constructions such as 'agree on', 'come back', and 'give up'
Results show that BERT's middle layers most effectively capture syntactic structures, with significant variability in representational accuracy across different verb categories.
arXiv Detail & Related papers (2024-12-19T09:21:39Z) - Improving Deep Representation Learning via Auxiliary Learnable Target Coding [69.79343510578877]
This paper introduces a novel learnable target coding as an auxiliary regularization of deep representation learning.
Specifically, a margin-based triplet loss and a correlation consistency loss on the proposed target codes are designed to encourage more discriminative representations.
arXiv Detail & Related papers (2023-05-30T01:38:54Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - SLCNN: Sentence-Level Convolutional Neural Network for Text
Classification [0.0]
Convolutional neural network (CNN) has shown remarkable success in the task of text classification.
New baseline models have been studied for text classification using CNN.
Results have shown that the proposed models have better performance, particularly in the longer documents.
arXiv Detail & Related papers (2023-01-27T13:16:02Z) - A Unified Understanding of Deep NLP Models for Text Classification [88.35418976241057]
We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification.
The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample.
A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples.
arXiv Detail & Related papers (2022-06-19T08:55:07Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Text analysis and deep learning: A network approach [0.0]
We propose a novel method that combines transformer models with network analysis to form a self-referential representation of language use within a corpus of interest.
Our approach produces linguistic relations strongly consistent with the underlying model as well as mathematically well-defined operations on them.
It represents, to the best of our knowledge, the first unsupervised method to extract semantic networks directly from deep language models.
arXiv Detail & Related papers (2021-10-08T14:18:36Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Be More with Less: Hypergraph Attention Networks for Inductive Text
Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task.
Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words.
We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z) - Semantic Sentiment Analysis Based on Probabilistic Graphical Models and
Recurrent Neural Network [0.0]
The purpose of this study is to investigate the use of semantics to perform sentiment analysis based on probabilistic graphical models and recurrent neural networks.
The datasets used for the experiments were IMDB movie reviews, Amazon Consumer Product reviews, and Twitter Review datasets.
arXiv Detail & Related papers (2020-08-06T11:59:00Z) - From text saliency to linguistic objects: learning linguistic
interpretable markers with a multi-channels convolutional architecture [2.064612766965483]
We propose a novel approach to inspect the hidden layers of a fitted CNN in order to extract interpretable linguistic objects from texts exploiting classification process.
We empirically demonstrate the efficiency of our approach on corpora from two different languages: English and French.
arXiv Detail & Related papers (2020-04-07T10:46:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.