Deep Bayes Factor Scoring for Authorship Verification
- URL: http://arxiv.org/abs/2008.10105v1
- Date: Sun, 23 Aug 2020 21:00:33 GMT
- Title: Deep Bayes Factor Scoring for Authorship Verification
- Authors: Benedikt Boenninghoff and Julian Rupp and Robert M. Nickel and
Dorothea Kolossa
- Abstract summary: We present a hierarchical fusion of two well-known approaches into a single end-to-end learning procedure.
A deep metric learning framework at the bottom aims to learn a pseudo-metric that maps a document of variable length onto a fixed-sized feature vector.
At the top, we incorporate a probabilistic layer to perform Bayes factor scoring in the learned metric space.
- Score: 10.405174977499497
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The PAN 2020 authorship verification (AV) challenge focuses on a
cross-topic/closed-set AV task over a collection of fanfiction texts.
Fanfiction is a fan-written extension of a storyline in which a so-called
fandom topic describes the principal subject of the document. The data provided
in the PAN 2020 AV task is quite challenging because authors of texts across
multiple/different fandom topics are included. In this work, we present a
hierarchical fusion of two well-known approaches into a single end-to-end
learning procedure: A deep metric learning framework at the bottom aims to
learn a pseudo-metric that maps a document of variable length onto a
fixed-sized feature vector. At the top, we incorporate a probabilistic layer to
perform Bayes factor scoring in the learned metric space. We also provide text
preprocessing strategies to deal with the cross-topic issue.
Related papers
- TegFormer: Topic-to-Essay Generation with Good Topic Coverage and High
Text Coherence [8.422108048684215]
We propose a novel approach to topic-to-essay generation called TegFormer.
A emphTopic-Extension layer captures the interaction between the given topics and their domain-specific contexts.
An emphEmbedding-Fusion module combines the domain-specific word embeddings learnt from the given corpus and the general-purpose word embeddings provided by a GPT-2 model pre-trained on massive text data.
arXiv Detail & Related papers (2022-12-27T11:50:14Z) - Summarization with Graphical Elements [55.5913491389047]
We propose a new task: summarization with graphical elements.
We collect a high quality human labeled dataset to support research into the task.
arXiv Detail & Related papers (2022-04-15T17:16:41Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - The Topic Confusion Task: A Novel Scenario for Authorship Attribution [0.0]
Authorship attribution is the problem of identifying the most plausible author of an anonymous text from a set of candidate authors.
We propose the emphtopic confusion task, where we switch the author-topic configuration between training and testing set.
By evaluating different features, we show that stylometric features with part-of-speech tags are less susceptible to topic variations and can increase the accuracy of the attribution process.
arXiv Detail & Related papers (2021-04-17T12:50:58Z) - Topic Scaling: A Joint Document Scaling -- Topic Model Approach To Learn
Time-Specific Topics [0.0]
This paper proposes a new methodology to study sequential corpora by implementing a two-stage algorithm that learns time-based topics with respect to a scale of document positions.
The first stage ranks documents using Wordfish to estimate document positions that serve as a dependent variable to learn relevant topics.
The second stage ranks the inferred topics on the document scale to match their occurrences within the corpus and track their evolution.
arXiv Detail & Related papers (2021-03-31T12:35:36Z) - DeepStyle: User Style Embedding for Authorship Attribution of Short
Texts [57.503904346336384]
Authorship attribution (AA) is an important and widely studied research topic with many applications.
Recent works have shown that deep learning methods could achieve significant accuracy improvement for the AA task.
We propose DeepStyle, a novel embedding-based framework that learns the representations of users' salient writing styles.
arXiv Detail & Related papers (2021-03-14T15:56:37Z) - Topical Change Detection in Documents via Embeddings of Long Sequences [4.13878392637062]
We formulate the task of text segmentation as an independent supervised prediction task.
By fine-tuning on paragraphs of similar sections, we are able to show that learned features encode topic information.
Unlike previous approaches, which mostly operate on sentence-level, we consistently use a broader context.
arXiv Detail & Related papers (2020-12-07T12:09:37Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - Semantic Graphs for Generating Deep Questions [98.5161888878238]
We propose a novel framework which first constructs a semantic-level graph for the input document and then encodes the semantic graph by introducing an attention-based GGNN (Att-GGNN)
On the HotpotQA deep-question centric dataset, our model greatly improves performance over questions requiring reasoning over multiple facts, leading to state-of-the-art performance.
arXiv Detail & Related papers (2020-04-27T10:52:52Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.