DeepStyle: User Style Embedding for Authorship Attribution of Short
Texts
- URL: http://arxiv.org/abs/2103.11798v1
- Date: Sun, 14 Mar 2021 15:56:37 GMT
- Title: DeepStyle: User Style Embedding for Authorship Attribution of Short
Texts
- Authors: Zhiqiang Hu, Roy Ka-Wei Lee, Lei Wang, Ee-Peng Lim and Bo Dai
- Abstract summary: Authorship attribution (AA) is an important and widely studied research topic with many applications.
Recent works have shown that deep learning methods could achieve significant accuracy improvement for the AA task.
We propose DeepStyle, a novel embedding-based framework that learns the representations of users' salient writing styles.
- Score: 57.503904346336384
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Authorship attribution (AA), which is the task of finding the owner of a
given text, is an important and widely studied research topic with many
applications. Recent works have shown that deep learning methods could achieve
significant accuracy improvement for the AA task. Nevertheless, most of these
proposed methods represent user posts using a single type of feature (e.g.,
word bi-grams) and adopt a text classification approach to address the task.
Furthermore, these methods offer very limited explainability of the AA results.
In this paper, we address these limitations by proposing DeepStyle, a novel
embedding-based framework that learns the representations of users' salient
writing styles. We conduct extensive experiments on two real-world datasets
from Twitter and Weibo. Our experiment results show that DeepStyle outperforms
the state-of-the-art baselines on the AA task.
Related papers
- DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive Learning [24.99797253885887]
We argue that the key to accomplishing this task lies in distinguishing writing styles of different authors.
We propose DeTeCtive, a multi-task auxiliary, multi-level contrastive learning framework.
Our method is compatible with a range of text encoders.
arXiv Detail & Related papers (2024-10-28T12:34:49Z) - Capturing Style in Author and Document Representation [4.323709559692927]
We propose a new architecture that learns embeddings for both authors and documents with a stylistic constraint.
We evaluate our method on three datasets: a literary corpus extracted from the Gutenberg Project, the Blog Authorship and IMDb62.
arXiv Detail & Related papers (2024-07-18T10:01:09Z) - A Survey on Deep Active Learning: Recent Advances and New Frontiers [27.07154361976248]
This work aims to serve as a useful and quick guide for researchers in overcoming difficulties in deep learning-based active learning (DAL)
This technique has gained increasing popularity due to its broad applicability, yet its survey papers, especially for deep learning-based active learning (DAL), remain scarce.
arXiv Detail & Related papers (2024-05-01T05:54:33Z) - Active Learning for Abstractive Text Summarization [50.79416783266641]
We propose the first effective query strategy for Active Learning in abstractive text summarization.
We show that using our strategy in AL annotation helps to improve the model performance in terms of ROUGE and consistency scores.
arXiv Detail & Related papers (2023-01-09T10:33:14Z) - Entity Disambiguation with Entity Definitions [50.01142092276296]
Local models have recently attained astounding performances in Entity Disambiguation (ED)
Previous works limited their studies to using, as the textual representation of each candidate, only its Wikipedia title.
In this paper, we address this limitation and investigate to what extent more expressive textual representations can mitigate it.
We report a new state of the art on 2 out of 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns.
arXiv Detail & Related papers (2022-10-11T17:46:28Z) - Fine-Grained Visual Entailment [51.66881737644983]
We propose an extension of this task, where the goal is to predict the logical relationship of fine-grained knowledge elements within a piece of text to an image.
Unlike prior work, our method is inherently explainable and makes logical predictions at different levels of granularity.
We evaluate our method on a new dataset of manually annotated knowledge elements and show that our method achieves 68.18% accuracy at this challenging task.
arXiv Detail & Related papers (2022-03-29T16:09:38Z) - Assisted Text Annotation Using Active Learning to Achieve High Quality
with Little Effort [9.379650501033465]
We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations.
We combine an active learning (AL) approach with a pre-trained language model to semi-automatically identify annotation categories.
Our preliminary results show that employing AL strongly reduces the number of annotations for correct classification of even complex and subtle frames.
arXiv Detail & Related papers (2021-12-15T13:14:58Z) - Letter-level Online Writer Identification [86.13203975836556]
We focus on a novel problem, letter-level online writer-id, which requires only a few trajectories of written letters as identification cues.
A main challenge is that a person often writes a letter in different styles from time to time.
We refer to this problem as the variance of online writing styles (Var-O-Styles)
arXiv Detail & Related papers (2021-12-06T07:21:53Z) - MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition [36.12001394921506]
We propose a new approach to handwritten text recognition.
We use a novel meta-learning framework which exploits additional new-writer data.
Our framework can be easily implemented on the top of most state-of-the-art HTR models.
arXiv Detail & Related papers (2021-04-05T12:35:39Z) - Be More with Less: Hypergraph Attention Networks for Inductive Text
Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task.
Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words.
We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.