Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers
Reveals Distinctive yet Consistent Individual Styles
- URL: http://arxiv.org/abs/2109.03158v2
- Date: Wed, 8 Sep 2021 22:10:06 GMT
- Title: Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers
Reveals Distinctive yet Consistent Individual Styles
- Authors: Jian Zhu and David Jurgens
- Abstract summary: We introduce a new approach to studying idiolects through a massive cross-author comparison to identify and encode stylistic features.
A neural model achieves strong performance at authorship identification on short texts.
We quantify the relative contributions of different linguistic elements to idiolectal variation.
- Score: 7.4037154707453965
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An individual's variation in writing style is often a function of both social
and personal attributes. While structured social variation has been extensively
studied, e.g., gender based variation, far less is known about how to
characterize individual styles due to their idiosyncratic nature. We introduce
a new approach to studying idiolects through a massive cross-author comparison
to identify and encode stylistic features. The neural model achieves strong
performance at authorship identification on short texts and through an
analogy-based probing task, showing that the learned representations exhibit
surprising regularities that encode qualitative and quantitative shifts of
idiolectal styles. Through text perturbation, we quantify the relative
contributions of different linguistic elements to idiolectal variation.
Furthermore, we provide a description of idiolects through measuring inter- and
intra-author variation, showing that variation in idiolects is often
distinctive yet consistent.
Related papers
- Personality Style Recognition via Machine Learning: Identifying
Anaclitic and Introjective Personality Styles from Patients' Speech [6.3042597209752715]
We use natural language processing (NLP) and machine learning tools for classification.
We test this on a dataset of recorded clinical diagnostic interviews (CDI) on a sample of 79 patients diagnosed with major depressive disorder (MDD)
We find that automated classification with language-derived features (i.e., based on LIWC) significantly outperforms questionnaire-based classification models.
arXiv Detail & Related papers (2023-11-07T15:56:19Z) - Enhancing Representation Generalization in Authorship Identification [9.148691357200216]
Authorship identification ascertains the authorship of texts whose origins remain undisclosed.
Modern authorship identification methods have proven effective in distinguishing authorial styles.
The presented work addresses the challenge of enhancing the generalization of stylistic representations in authorship identification.
arXiv Detail & Related papers (2023-09-30T17:11:00Z) - ALADIN-NST: Self-supervised disentangled representation learning of
artistic style through Neural Style Transfer [60.6863849241972]
We learn a representation of visual artistic style more strongly disentangled from the semantic content depicted in an image.
We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics.
arXiv Detail & Related papers (2023-04-12T10:33:18Z) - Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models.
By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes.
We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z) - Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in
Low-Resource English Varieties [3.3536302616846734]
We present a human-in-the-loop approach to generate and filter effective contrast sets via corpus-guided edits.
We show that our approach improves feature detection for both Indian English and African American English, demonstrate how it can assist linguistic research, and release our fine-tuned models for use by other researchers.
arXiv Detail & Related papers (2022-09-15T21:19:31Z) - Textual Stylistic Variation: Choices, Genres and Individuals [0.8057441774248633]
This chapter argues for more informed target metrics for the statistical processing of stylistic variation in text collections.
This chapter discusses variation given by genre, and contrasts it to variation occasioned by individual choice.
arXiv Detail & Related papers (2022-05-01T16:39:49Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - A Review of Text Style Transfer using Deep Learning [0.0]
Text style transfer is a task of adapting and/or changing the stylistic manner in which a sentence is written.
We point out the technological advances in deep neural networks that have been the driving force behind current successes in the fields of natural language understanding and generation.
The review is structured around two key stages in the text style transfer process, namely, representation learning and sentence generation in a new style.
arXiv Detail & Related papers (2021-09-30T14:06:36Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z) - Improving Disentangled Text Representation Learning with
Information-Theoretic Guidance [99.68851329919858]
discrete nature of natural language makes disentangling of textual representations more challenging.
Inspired by information theory, we propose a novel method that effectively manifests disentangled representations of text.
Experiments on both conditional text generation and text-style transfer demonstrate the high quality of our disentangled representation.
arXiv Detail & Related papers (2020-06-01T03:36:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.