LG4AV: Combining Language Models and Graph Neural Networks for Author
Verification
- URL: http://arxiv.org/abs/2109.01479v1
- Date: Fri, 3 Sep 2021 12:45:28 GMT
- Title: LG4AV: Combining Language Models and Graph Neural Networks for Author
Verification
- Authors: Maximilian Stubbemann, Gerd Stumme
- Abstract summary: We present our novel approach LG4AV which combines language models and graph neural networks for authorship verification.
By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features.
Our model can benefit from relations between authors that are meaningful with respect to the verification process.
- Score: 0.11421942894219898
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The automatic verification of document authorships is important in various
settings. Researchers are for example judged and compared by the amount and
impact of their publications and public figures are confronted by their posts
on social media platforms. Therefore, it is important that authorship
information in frequently used web services and platforms is correct. The
question whether a given document is written by a given author is commonly
referred to as authorship verification (AV). While AV is a widely investigated
problem in general, only few works consider settings where the documents are
short and written in a rather uniform style. This makes most approaches
unpractical for online databases and knowledge graphs in the scholarly domain.
Here, authorships of scientific publications have to be verified, often with
just abstracts and titles available. To this point, we present our novel
approach LG4AV which combines language models and graph neural networks for
authorship verification. By directly feeding the available texts in a
pre-trained transformer architecture, our model does not need any hand-crafted
stylometric features that are not meaningful in scenarios where the writing
style is, at least to some extent, standardized. By the incorporation of a
graph neural network structure, our model can benefit from relations between
authors that are meaningful with respect to the verification process. For
example, scientific authors are more likely to write about topics that are
addressed by their co-authors and twitter users tend to post about the same
subjects as people they follow. We experimentally evaluate our model and study
to which extent the inclusion of co-authorships enhances verification decisions
in bibliometric environments.
Related papers
- Capturing Style in Author and Document Representation [4.323709559692927]
We propose a new architecture that learns embeddings for both authors and documents with a stylistic constraint.
We evaluate our method on three datasets: a literary corpus extracted from the Gutenberg Project, the Blog Authorship and IMDb62.
arXiv Detail & Related papers (2024-07-18T10:01:09Z) - Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation [52.72682366640554]
Authorship Verification (AV) is a text classification task concerned with inferring whether a candidate text has been written by one specific author or by someone else.
It has been shown that many AV systems are vulnerable to adversarial attacks, where a malicious author actively tries to fool the classifier by either concealing their writing style, or by imitating the style of another author.
arXiv Detail & Related papers (2024-03-17T16:36:26Z) - Document AI: A Comparative Study of Transformer-Based, Graph-Based
Models, and Convolutional Neural Networks For Document Layout Analysis [3.231170156689185]
Document AI aims to automatically analyze documents by leveraging natural language processing and computer vision techniques.
One of the major tasks of Document AI is document layout analysis, which structures document pages by interpreting the content and spatial relationships of layout, image, and text.
arXiv Detail & Related papers (2023-08-29T16:58:03Z) - An Interactive UI to Support Sensemaking over Collections of Parallel
Texts [15.401895433726558]
With a large corpus of papers, it's cognitively demanding to pairwise compare and contrast them all with each other.
We present AVTALER, which combines peoples' unique skills, contextual awareness, and knowledge, together with the strength of automation.
arXiv Detail & Related papers (2023-03-11T01:04:25Z) - Same or Different? Diff-Vectors for Authorship Analysis [78.83284164605473]
In classic'' authorship analysis a feature vector represents a document, the value of a feature represents (an increasing function of) the relative frequency of the feature in the document, and the class label represents the author of the document.
Our experiments tackle same-author verification, authorship verification, and closed-set authorship attribution; while DVs are naturally geared for solving the 1st, we also provide two novel methods for solving the 2nd and 3rd.
arXiv Detail & Related papers (2023-01-24T08:48:12Z) - Cracking Double-Blind Review: Authorship Attribution with Deep Learning [43.483063713471935]
We propose a transformer-based, neural-network architecture to attribute an anonymous manuscript to an author.
We leverage all research papers publicly available on arXiv amounting to over 2 million manuscripts.
Our method achieves an unprecedented authorship attribution accuracy, where up to 73% of papers are attributed correctly.
arXiv Detail & Related papers (2022-11-14T15:50:24Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Revise and Resubmit: An Intertextual Model of Text-based Collaboration
in Peer Review [52.359007622096684]
Peer review is a key component of the publishing process in most fields of science.
Existing NLP studies focus on the analysis of individual texts.
editorial assistance often requires modeling interactions between pairs of texts.
arXiv Detail & Related papers (2022-04-22T16:39:38Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Neural Deepfake Detection with Factual Structure of Text [78.30080218908849]
We propose a graph-based model for deepfake detection of text.
Our approach represents the factual structure of a given document as an entity graph.
Our model can distinguish the difference in the factual structure between machine-generated text and human-written text.
arXiv Detail & Related papers (2020-10-15T02:35:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.