A Self-supervised Representation Learning of Sentence Structure for
Authorship Attribution
- URL: http://arxiv.org/abs/2010.06786v2
- Date: Thu, 24 Feb 2022 15:39:48 GMT
- Title: A Self-supervised Representation Learning of Sentence Structure for
Authorship Attribution
- Authors: Fereshteh Jafariakinabad, Kien A. Hua
- Abstract summary: We propose a self-supervised framework for learning structural representations of sentences.
We evaluate the learned structural representations of sentences using different probing tasks, and subsequently utilize them in the authorship attribution task.
- Score: 3.5991811164452923
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Syntactic structure of sentences in a document substantially informs about
its authorial writing style. Sentence representation learning has been widely
explored in recent years and it has been shown that it improves the
generalization of different downstream tasks across many domains. Even though
utilizing probing methods in several studies suggests that these learned
contextual representations implicitly encode some amount of syntax, explicit
syntactic information further improves the performance of deep neural models in
the domain of authorship attribution. These observations have motivated us to
investigate the explicit representation learning of syntactic structure of
sentences. In this paper, we propose a self-supervised framework for learning
structural representations of sentences. The self-supervised network contains
two components; a lexical sub-network and a syntactic sub-network which take
the sequence of words and their corresponding structural labels as the input,
respectively. Due to the n-to-1 mapping of words to their structural labels,
each word will be embedded into a vector representation which mainly carries
structural information. We evaluate the learned structural representations of
sentences using different probing tasks, and subsequently utilize them in the
authorship attribution task. Our experimental results indicate that the
structural embeddings significantly improve the classification tasks when
concatenated with the existing pre-trained word embeddings.
Related papers
- Linguistic Structure Induction from Language Models [1.8130068086063336]
This thesis focuses on producing constituency and dependency structures from Language Models (LMs) in an unsupervised setting.
I present a detailed study on StructFormer (SF) which retrofits a transformer architecture with a encoder network to produce constituency and dependency structures.
I present six experiments to analyze and address this field's challenges.
arXiv Detail & Related papers (2024-03-11T16:54:49Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - Domain-Specific Word Embeddings with Structure Prediction [3.057136788672694]
We present an empirical evaluation on New York Times articles and two English Wikipedia datasets with articles on science and philosophy.
Our method, called Word2Vec with Structure Prediction (W2VPred), provides better performance than baselines in terms of the general analogy tests.
As a use case in the field of Digital Humanities we demonstrate how to raise novel research questions for high literature from the German Text Archive.
arXiv Detail & Related papers (2022-10-06T12:45:48Z) - TeKo: Text-Rich Graph Neural Networks with External Knowledge [75.91477450060808]
We propose a novel text-rich graph neural network with external knowledge (TeKo)
We first present a flexible heterogeneous semantic network that incorporates high-quality entities.
We then introduce two types of external knowledge, that is, structured triplets and unstructured entity description.
arXiv Detail & Related papers (2022-06-15T02:33:10Z) - Latent Topology Induction for Understanding Contextualized
Representations [84.7918739062235]
We study the representation space of contextualized embeddings and gain insight into the hidden topology of large language models.
We show there exists a network of latent states that summarize linguistic properties of contextualized representations.
arXiv Detail & Related papers (2022-06-03T11:22:48Z) - Dependency Induction Through the Lens of Visual Perception [81.91502968815746]
We propose an unsupervised grammar induction model that leverages word concreteness and a structural vision-based to jointly learn constituency-structure and dependency-structure grammars.
Our experiments show that the proposed extension outperforms the current state-of-the-art visually grounded models in constituency parsing even with a smaller grammar size.
arXiv Detail & Related papers (2021-09-20T18:40:37Z) - Syntactic Perturbations Reveal Representational Correlates of
Hierarchical Phrase Structure in Pretrained Language Models [22.43510769150502]
It is not entirely clear what aspects of sentence-level syntax are captured by vector-based language representations.
We show that Transformers build sensitivity to larger parts of the sentence along their layers, and that hierarchical phrase structure plays a role in this process.
arXiv Detail & Related papers (2021-04-15T16:30:31Z) - Unsupervised Distillation of Syntactic Information from Contextualized
Word Representations [62.230491683411536]
We tackle the task of unsupervised disentanglement between semantics and structure in neural language representations.
To this end, we automatically generate groups of sentences which are structurally similar but semantically different.
We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics.
arXiv Detail & Related papers (2020-10-11T15:13:18Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.