Can Authorship Representation Learning Capture Stylistic Features?
- URL: http://arxiv.org/abs/2308.11490v2
- Date: Thu, 24 Aug 2023 20:52:01 GMT
- Title: Can Authorship Representation Learning Capture Stylistic Features?
- Authors: Andrew Wang, Cristina Aggazzotti, Rebecca Kotula, Rafael Rivera Soto,
Marcus Bishop, Nicholas Andrews
- Abstract summary: We show that representations learned for a surrogate authorship prediction task are indeed sensitive to writing style.
As a consequence, authorship representations may be expected to be robust to certain kinds of data shift, such as topic drift over time.
Our findings may open the door to downstream applications that require stylistic representations, such as style transfer.
- Score: 5.812943049068866
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatically disentangling an author's style from the content of their
writing is a longstanding and possibly insurmountable problem in computational
linguistics. At the same time, the availability of large text corpora furnished
with author labels has recently enabled learning authorship representations in
a purely data-driven manner for authorship attribution, a task that ostensibly
depends to a greater extent on encoding writing style than encoding content.
However, success on this surrogate task does not ensure that such
representations capture writing style since authorship could also be correlated
with other latent variables, such as topic. In an effort to better understand
the nature of the information these representations convey, and specifically to
validate the hypothesis that they chiefly encode writing style, we
systematically probe these representations through a series of targeted
experiments. The results of these experiments suggest that representations
learned for the surrogate authorship prediction task are indeed sensitive to
writing style. As a consequence, authorship representations may be expected to
be robust to certain kinds of data shift, such as topic drift over time.
Additionally, our findings may open the door to downstream applications that
require stylistic representations, such as style transfer.
Related papers
- Capturing Style in Author and Document Representation [4.323709559692927]
We propose a new architecture that learns embeddings for both authors and documents with a stylistic constraint.
We evaluate our method on three datasets: a literary corpus extracted from the Gutenberg Project, the Blog Authorship and IMDb62.
arXiv Detail & Related papers (2024-07-18T10:01:09Z) - Enhancing Representation Generalization in Authorship Identification [9.148691357200216]
Authorship identification ascertains the authorship of texts whose origins remain undisclosed.
Modern authorship identification methods have proven effective in distinguishing authorial styles.
The presented work addresses the challenge of enhancing the generalization of stylistic representations in authorship identification.
arXiv Detail & Related papers (2023-09-30T17:11:00Z) - SenteCon: Leveraging Lexicons to Learn Human-Interpretable Language
Representations [51.08119762844217]
SenteCon is a method for introducing human interpretability in deep language representations.
We show that SenteCon provides high-level interpretability at little to no cost to predictive performance on downstream tasks.
arXiv Detail & Related papers (2023-05-24T05:06:28Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - StoryTrans: Non-Parallel Story Author-Style Transfer with Discourse
Representations and Content Enhancing [73.81778485157234]
Long texts usually involve more complicated author linguistic preferences such as discourse structures than sentences.
We formulate the task of non-parallel story author-style transfer, which requires transferring an input story into a specified author style.
We use an additional training objective to disentangle stylistic features from the learned discourse representation to prevent the model from degenerating to an auto-encoder.
arXiv Detail & Related papers (2022-08-29T08:47:49Z) - Letter-level Online Writer Identification [86.13203975836556]
We focus on a novel problem, letter-level online writer-id, which requires only a few trajectories of written letters as identification cues.
A main challenge is that a person often writes a letter in different styles from time to time.
We refer to this problem as the variance of online writing styles (Var-O-Styles)
arXiv Detail & Related papers (2021-12-06T07:21:53Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - GTAE: Graph-Transformer based Auto-Encoders for Linguistic-Constrained
Text Style Transfer [119.70961704127157]
Non-parallel text style transfer has attracted increasing research interests in recent years.
Current approaches still lack the ability to preserve the content and even logic of original sentences.
We propose a method called Graph Transformer based Auto-GTAE, which models a sentence as a linguistic graph and performs feature extraction and style transfer at the graph level.
arXiv Detail & Related papers (2021-02-01T11:08:45Z) - Spectral Graph-based Features for Recognition of Handwritten Characters:
A Case Study on Handwritten Devanagari Numerals [0.0]
We propose an approach that exploits the robust graph representation and spectral graph embedding concept to represent handwritten characters.
For corroboration of the efficacy of the proposed method, extensive experiments were carried out on the standard handwritten numeral Computer Vision Pattern Recognition, Unit of Indian Statistical Institute Kolkata dataset.
arXiv Detail & Related papers (2020-07-07T08:40:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.