Learning Interpretable Style Embeddings via Prompting LLMs
- URL: http://arxiv.org/abs/2305.12696v2
- Date: Mon, 9 Oct 2023 19:20:32 GMT
- Title: Learning Interpretable Style Embeddings via Prompting LLMs
- Authors: Ajay Patel, Delip Rao, Ansh Kothary, Kathleen McKeown, Chris
Callison-Burch
- Abstract summary: Style representation learning builds content-independent representations of author style in text.
Current style representation learning uses neural methods to disentangle style from content to create style vectors.
We use prompting to perform stylometry on a large number of texts to create a synthetic dataset and train human-interpretable style representations.
- Score: 46.74488355350601
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Style representation learning builds content-independent representations of
author style in text. Stylometry, the analysis of style in text, is often
performed by expert forensic linguists and no large dataset of stylometric
annotations exists for training. Current style representation learning uses
neural methods to disentangle style from content to create style vectors,
however, these approaches result in uninterpretable representations,
complicating their usage in downstream applications like authorship attribution
where auditing and explainability is critical. In this work, we use prompting
to perform stylometry on a large number of texts to create a synthetic dataset
and train human-interpretable style representations we call LISA embeddings. We
release our synthetic stylometry dataset and our interpretable style models as
resources.
Related papers
- StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples [48.44036251656947]
Style representations aim to embed texts with similar writing styles closely and texts with different styles far apart, regardless of content.
We introduce StyleDistance, a novel approach to training stronger content-independent style embeddings.
arXiv Detail & Related papers (2024-10-16T17:25:25Z) - Capturing Style in Author and Document Representation [4.323709559692927]
We propose a new architecture that learns embeddings for both authors and documents with a stylistic constraint.
We evaluate our method on three datasets: a literary corpus extracted from the Gutenberg Project, the Blog Authorship and IMDb62.
arXiv Detail & Related papers (2024-07-18T10:01:09Z) - Learning to Generate Text in Arbitrary Writing Styles [6.7308816341849695]
It is desirable for language models to produce text in an author-specific style on the basis of a potentially small writing sample.
We propose to guide a language model to generate text in a target style using contrastively-trained representations that capture stylometric features.
arXiv Detail & Related papers (2023-12-28T18:58:52Z) - ALADIN-NST: Self-supervised disentangled representation learning of
artistic style through Neural Style Transfer [60.6863849241972]
We learn a representation of visual artistic style more strongly disentangled from the semantic content depicted in an image.
We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics.
arXiv Detail & Related papers (2023-04-12T10:33:18Z) - StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized
Tokenizer of a Large-Scale Generative Model [64.26721402514957]
We propose StylerDALLE, a style transfer method that uses natural language to describe abstract art styles.
Specifically, we formulate the language-guided style transfer task as a non-autoregressive token sequence translation.
To incorporate style information, we propose a Reinforcement Learning strategy with CLIP-based language supervision.
arXiv Detail & Related papers (2023-03-16T12:44:44Z) - Unsupervised Neural Stylistic Text Generation using Transfer learning
and Adapters [66.17039929803933]
We propose a novel transfer learning framework which updates only $0.3%$ of model parameters to learn style specific attributes for response generation.
We learn style specific attributes from the PERSONALITY-CAPTIONS dataset.
arXiv Detail & Related papers (2022-10-07T00:09:22Z) - Generating More Pertinent Captions by Leveraging Semantics and Style on
Multi-Source Datasets [56.018551958004814]
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources.
Large-scale datasets with noisy image-text pairs provide a sub-optimal source of supervision.
We propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component.
arXiv Detail & Related papers (2021-11-24T19:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.