StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples
- URL: http://arxiv.org/abs/2410.12757v1
- Date: Wed, 16 Oct 2024 17:25:25 GMT
- Title: StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples
- Authors: Ajay Patel, Jiacheng Zhu, Justin Qiu, Zachary Horvitz, Marianna Apidianaki, Kathleen McKeown, Chris Callison-Burch,
- Abstract summary: Style representations aim to embed texts with similar writing styles closely and texts with different styles far apart, regardless of content.
We introduce StyleDistance, a novel approach to training stronger content-independent style embeddings.
- Score: 48.44036251656947
- License:
- Abstract: Style representations aim to embed texts with similar writing styles closely and texts with different styles far apart, regardless of content. However, the contrastive triplets often used for training these representations may vary in both style and content, leading to potential content leakage in the representations. We introduce StyleDistance, a novel approach to training stronger content-independent style embeddings. We use a large language model to create a synthetic dataset of near-exact paraphrases with controlled style variations, and produce positive and negative examples across 40 distinct style features for precise contrastive learning. We assess the quality of our synthetic data and embeddings through human and automatic evaluations. StyleDistance enhances the content-independence of style embeddings, which generalize to real-world benchmarks and outperform leading style representations in downstream applications. Our model can be found at https://huggingface.co/StyleDistance/styledistance .
Related papers
- ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style
Transfer [57.6482608202409]
Textual style transfer is the task of transforming stylistic properties of text while preserving meaning.
We introduce a novel diffusion-based framework for general-purpose style transfer that can be flexibly adapted to arbitrary target styles.
We validate the method on the Enron Email Corpus, with both human and automatic evaluations, and find that it outperforms strong baselines on formality, sentiment, and even authorship style transfer.
arXiv Detail & Related papers (2023-08-29T17:36:02Z) - Learning Interpretable Style Embeddings via Prompting LLMs [46.74488355350601]
Style representation learning builds content-independent representations of author style in text.
Current style representation learning uses neural methods to disentangle style from content to create style vectors.
We use prompting to perform stylometry on a large number of texts to create a synthetic dataset and train human-interpretable style representations.
arXiv Detail & Related papers (2023-05-22T04:07:54Z) - ALADIN-NST: Self-supervised disentangled representation learning of
artistic style through Neural Style Transfer [60.6863849241972]
We learn a representation of visual artistic style more strongly disentangled from the semantic content depicted in an image.
We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics.
arXiv Detail & Related papers (2023-04-12T10:33:18Z) - Disentangling Writer and Character Styles for Handwriting Generation [8.33116145030684]
We present the style-disentangled Transformer (SDT), which employs two complementary contrastive objectives to extract the style commonalities of reference samples.
Our empirical findings reveal that the two learned style representations provide information at different frequency magnitudes.
arXiv Detail & Related papers (2023-03-26T14:32:02Z) - Few-shot Font Generation by Learning Style Difference and Similarity [84.76381937516356]
We propose a novel font generation approach by learning the Difference between different styles and the Similarity of the same style (DS-Font)
Specifically, we propose a multi-layer style projector for style encoding and realize a distinctive style representation via our proposed Cluster-level Contrastive Style (CCS) loss.
arXiv Detail & Related papers (2023-01-24T13:57:25Z) - Self-supervised Context-aware Style Representation for Expressive Speech
Synthesis [23.460258571431414]
We propose a novel framework for learning style representation from plain text in a self-supervised manner.
It leverages an emotion lexicon and uses contrastive learning and deep clustering.
Our method achieves improved results according to subjective evaluations on both in-domain and out-of-domain test sets in audiobook speech.
arXiv Detail & Related papers (2022-06-25T05:29:48Z) - Generating More Pertinent Captions by Leveraging Semantics and Style on
Multi-Source Datasets [56.018551958004814]
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources.
Large-scale datasets with noisy image-text pairs provide a sub-optimal source of supervision.
We propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component.
arXiv Detail & Related papers (2021-11-24T19:00:05Z) - Multi-Style Transfer with Discriminative Feedback on Disjoint Corpus [9.793194158416854]
Style transfer has been widely explored in natural language generation with non-parallel corpus.
A common shortcoming of existing approaches is the prerequisite of joint annotations across all the stylistic dimensions.
We show the ability of our model to control styles across multiple style dimensions while preserving content of the input text.
arXiv Detail & Related papers (2020-10-22T10:16:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.