Isolating authorship from content with semantic embeddings and   contrastive learning
        - URL: http://arxiv.org/abs/2411.18472v1
 - Date: Wed, 27 Nov 2024 16:08:46 GMT
 - Title: Isolating authorship from content with semantic embeddings and   contrastive learning
 - Authors: Javier Huertas-Tato, Adrián Girón-Jiménez, Alejandro Martín, David Camacho, 
 - Abstract summary: Authorship has entangled style and content inside.<n>We present a technique to use contrastive learning with additional hard negatives synthetically created using a semantic similarity model.<n>This disentanglement technique aims to distance the content embedding space from the style embedding space, leading to embeddings more informed by style.
 - Score: 49.15148871877941
 - License: http://creativecommons.org/licenses/by-nc-sa/4.0/
 - Abstract:   Authorship has entangled style and content inside. Authors frequently write about the same topics in the same style, so when different authors write about the exact same topic the easiest way out to distinguish them is by understanding the nuances of their style. Modern neural models for authorship can pick up these features using contrastive learning, however, some amount of content leakage is always present. Our aim is to reduce the inevitable impact and correlation between content and authorship. We present a technique to use contrastive learning (InfoNCE) with additional hard negatives synthetically created using a semantic similarity model. This disentanglement technique aims to distance the content embedding space from the style embedding space, leading to embeddings more informed by style. We demonstrate the performance with ablations on two different datasets and compare them on out-of-domain challenges. Improvements are clearly shown on challenging evaluations on prolific authors with up to a 10% increase in accuracy when the settings are particularly hard. Trials on challenges also demonstrate the preservation of zero-shot capabilities of this method as fine tuning. 
 
       
      
        Related papers
        - Boosting Semi-Supervised Scene Text Recognition via Viewing and   Summarizing [71.29488677105127]
Existing scene text recognition (STR) methods struggle to recognize challenging texts, especially for artistic and severely distorted characters.
We propose a contrastive learning-based STR framework by leveraging synthetic and real unlabeled data without any human cost.
Our method achieves SOTA performance (94.7% and 70.9% average accuracy on common benchmarks and Union14M-Benchmark.
arXiv  Detail & Related papers  (2024-11-23T15:24:47Z) - StyleDistance: Stronger Content-Independent Style Embeddings with   Synthetic Parallel Examples [48.44036251656947]
Style representations aim to embed texts with similar writing styles closely and texts with different styles far apart, regardless of content.
We introduce StyleDistance, a novel approach to training stronger content-independent style embeddings.
arXiv  Detail & Related papers  (2024-10-16T17:25:25Z) - CLAP: Isolating Content from Style through Contrastive Learning with   Augmented Prompts [11.752632557524969]
We propose contrastive learning with data augmentation to disentangle content features from the original representations.
Our experiments across diverse datasets demonstrate significant improvements in zero-shot and few-shot classification tasks.
arXiv  Detail & Related papers  (2023-11-28T03:00:59Z) - Self-Supervised Disentanglement by Leveraging Structure in Data   Augmentations [63.73044203154743]
Self-supervised representation learning often uses data augmentations to induce "style" attributes of the data.
It is difficult to deduce a priori which attributes of the data are indeed "style" and can be safely discarded.
We introduce a more principled approach that seeks to disentangle style features rather than discard them.
arXiv  Detail & Related papers  (2023-11-15T09:34:08Z) - ALADIN-NST: Self-supervised disentangled representation learning of
  artistic style through Neural Style Transfer [60.6863849241972]
We learn a representation of visual artistic style more strongly disentangled from the semantic content depicted in an image.
We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics.
arXiv  Detail & Related papers  (2023-04-12T10:33:18Z) - PART: Pre-trained Authorship Representation Transformer [52.623051272843426]
Authors writing documents imprint identifying information within their texts.<n>Previous works use hand-crafted features or classification tasks to train their authorship models.<n>We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv  Detail & Related papers  (2022-09-30T11:08:39Z) - Whodunit? Learning to Contrast for Authorship Attribution [22.37948005237967]
Authorship attribution is the task of identifying the author of a given text.
We propose to fine-tune pre-trained language representations using a combination of contrastive learning and supervised learning.
We show that Contra-X advances the state-of-the-art on multiple human and machine authorship attribution benchmarks.
arXiv  Detail & Related papers  (2022-09-23T23:45:08Z) - CLLD: Contrastive Learning with Label Distance for Text Classificatioin [0.6299766708197883]
We propose Contrastive Learning with Label Distance (CLLD) for learning contrastive classes.
 CLLD ensures the flexibility within the subtle differences that lead to different label assignments.
Our experiments suggest that the learned label distance relieve the adversarial nature of interclasses.
arXiv  Detail & Related papers  (2021-10-25T07:07:14Z) - Improving Disentangled Text Representation Learning with
  Information-Theoretic Guidance [99.68851329919858]
discrete nature of natural language makes disentangling of textual representations more challenging.
Inspired by information theory, we propose a novel method that effectively manifests disentangled representations of text.
 Experiments on both conditional text generation and text-style transfer demonstrate the high quality of our disentangled representation.
arXiv  Detail & Related papers  (2020-06-01T03:36:01Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.