Related papers: StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples

StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples

URL: http://arxiv.org/abs/2410.12757v2
Date: Sat, 08 Feb 2025 21:45:04 GMT
Title: StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples
Authors: Ajay Patel, Jiacheng Zhu, Justin Qiu, Zachary Horvitz, Marianna Apidianaki, Kathleen McKeown, Chris Callison-Burch,
Abstract summary: Style representations aim to embed texts with similar writing styles closely and texts with different styles far apart, regardless of content.<n>We introduce StyleDistance, a novel approach to training stronger content-independent style embeddings.
Score: 48.44036251656947
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Style representations aim to embed texts with similar writing styles closely and texts with different styles far apart, regardless of content. However, the contrastive triplets often used for training these representations may vary in both style and content, leading to potential content leakage in the representations. We introduce StyleDistance, a novel approach to training stronger content-independent style embeddings. We use a large language model to create a synthetic dataset of near-exact paraphrases with controlled style variations, and produce positive and negative examples across 40 distinct style features for precise contrastive learning. We assess the quality of our synthetic data and embeddings through human and automatic evaluations. StyleDistance enhances the content-independence of style embeddings, which generalize to real-world benchmarks and outperform leading style representations in downstream applications. Our model can be found at https://huggingface.co/StyleDistance/styledistance .

Related papers

mStyleDistance: Multilingual Style Embeddings and their Evaluation [45.24752717803745]
We introduce Multilingual StyleDistance, a style embedding model trained using synthetic data and contrastive learning. We train the model on data from nine languages and create a multilingual STEL-or-Content benchmark. Our results show that mStyleDistance embeddings outperform existing models on these multilingual style benchmarks and generalize well to unseen features and languages.
arXiv Detail & Related papers (2025-02-21T03:11:41Z)
StyleBlend: Enhancing Style-Specific Content Creation in Text-to-Image Diffusion Models [10.685779311280266]
StyleBlend is a method designed to learn and apply style representations from a limited set of reference images. Our approach decomposes style into two components, composition and texture, each learned through different strategies.
arXiv Detail & Related papers (2025-02-13T08:26:54Z)
Isolating authorship from content with semantic embeddings and contrastive learning [49.15148871877941]
Authorship has entangled style and content inside. We present a technique to use contrastive learning with additional hard negatives synthetically created using a semantic similarity model. This disentanglement technique aims to distance the content embedding space from the style embedding space, leading to embeddings more informed by style.
arXiv Detail & Related papers (2024-11-27T16:08:46Z)
ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer [57.6482608202409]
Textual style transfer is the task of transforming stylistic properties of text while preserving meaning. We introduce a novel diffusion-based framework for general-purpose style transfer that can be flexibly adapted to arbitrary target styles. We validate the method on the Enron Email Corpus, with both human and automatic evaluations, and find that it outperforms strong baselines on formality, sentiment, and even authorship style transfer.
arXiv Detail & Related papers (2023-08-29T17:36:02Z)
Learning Interpretable Style Embeddings via Prompting LLMs [46.74488355350601]
Style representation learning builds content-independent representations of author style in text. Current style representation learning uses neural methods to disentangle style from content to create style vectors. We use prompting to perform stylometry on a large number of texts to create a synthetic dataset and train human-interpretable style representations.
arXiv Detail & Related papers (2023-05-22T04:07:54Z)
ALADIN-NST: Self-supervised disentangled representation learning of artistic style through Neural Style Transfer [60.6863849241972]
We learn a representation of visual artistic style more strongly disentangled from the semantic content depicted in an image. We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics.
arXiv Detail & Related papers (2023-04-12T10:33:18Z)
Disentangling Writer and Character Styles for Handwriting Generation [8.33116145030684]
We present the style-disentangled Transformer (SDT), which employs two complementary contrastive objectives to extract the style commonalities of reference samples. Our empirical findings reveal that the two learned style representations provide information at different frequency magnitudes.
arXiv Detail & Related papers (2023-03-26T14:32:02Z)
Few-shot Font Generation by Learning Style Difference and Similarity [84.76381937516356]
We propose a novel font generation approach by learning the Difference between different styles and the Similarity of the same style (DS-Font) Specifically, we propose a multi-layer style projector for style encoding and realize a distinctive style representation via our proposed Cluster-level Contrastive Style (CCS) loss.
arXiv Detail & Related papers (2023-01-24T13:57:25Z)
Self-supervised Context-aware Style Representation for Expressive Speech Synthesis [23.460258571431414]
We propose a novel framework for learning style representation from plain text in a self-supervised manner. It leverages an emotion lexicon and uses contrastive learning and deep clustering. Our method achieves improved results according to subjective evaluations on both in-domain and out-of-domain test sets in audiobook speech.
arXiv Detail & Related papers (2022-06-25T05:29:48Z)
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets [56.018551958004814]
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources. Large-scale datasets with noisy image-text pairs provide a sub-optimal source of supervision. We propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component.
arXiv Detail & Related papers (2021-11-24T19:00:05Z)
Multi-Style Transfer with Discriminative Feedback on Disjoint Corpus [9.793194158416854]
Style transfer has been widely explored in natural language generation with non-parallel corpus. A common shortcoming of existing approaches is the prerequisite of joint annotations across all the stylistic dimensions. We show the ability of our model to control styles across multiple style dimensions while preserving content of the input text.
arXiv Detail & Related papers (2020-10-22T10:16:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.