Contextualizing Variation in Text Style Transfer Datasets
- URL: http://arxiv.org/abs/2108.07871v1
- Date: Tue, 17 Aug 2021 20:54:24 GMT
- Title: Contextualizing Variation in Text Style Transfer Datasets
- Authors: Stephanie Schoch, Wanyu Du, Yangfeng Ji
- Abstract summary: We conduct several empirical analyses of existing text style datasets.
We propose a categorization of stylistic and dataset properties to consider when utilizing or comparing text style datasets.
- Score: 8.978727939776329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text style transfer involves rewriting the content of a source sentence in a
target style. Despite there being a number of style tasks with available data,
there has been limited systematic discussion of how text style datasets relate
to each other. This understanding, however, is likely to have implications for
selecting multiple data sources for model training. While it is prudent to
consider inherent stylistic properties when determining these relationships, we
also must consider how a style is realized in a particular dataset. In this
paper, we conduct several empirical analyses of existing text style datasets.
Based on our results, we propose a categorization of stylistic and dataset
properties to consider when utilizing or comparing text style datasets.
Related papers
- StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples [48.44036251656947]
Style representations aim to embed texts with similar writing styles closely and texts with different styles far apart, regardless of content.
We introduce StyleDistance, a novel approach to training stronger content-independent style embeddings.
arXiv Detail & Related papers (2024-10-16T17:25:25Z) - Measuring Style Similarity in Diffusion Models [118.22433042873136]
We present a framework for understanding and extracting style descriptors from images.
Our framework comprises a new dataset curated using the insight that style is a subjective property of an image.
We also propose a method to extract style attribute descriptors that can be used to style of a generated image to the images used in the training dataset of a text-to-image model.
arXiv Detail & Related papers (2024-04-01T17:58:30Z) - Analyzing Font Style Usage and Contextual Factors in Real Images [12.387676601792899]
This paper analyzes the relationship between font styles and contextual factors that might affect font style selection with large-scale datasets.
We will analyze the relationship between font style and its surrounding object (such as bus'') by using about 800,000 words in the Open Images dataset.
arXiv Detail & Related papers (2023-06-21T06:43:22Z) - Learning Interpretable Style Embeddings via Prompting LLMs [46.74488355350601]
Style representation learning builds content-independent representations of author style in text.
Current style representation learning uses neural methods to disentangle style from content to create style vectors.
We use prompting to perform stylometry on a large number of texts to create a synthetic dataset and train human-interpretable style representations.
arXiv Detail & Related papers (2023-05-22T04:07:54Z) - Stylized Data-to-Text Generation: A Case Study in the E-Commerce Domain [53.22419717434372]
We propose a new task, namely stylized data-to-text generation, whose aim is to generate coherent text according to a specific style.
This task is non-trivial, due to three challenges: the logic of the generated text, unstructured style reference, and biased training samples.
We propose a novel stylized data-to-text generation model, named StyleD2T, comprising three components: logic planning-enhanced data embedding, mask-based style embedding, and unbiased stylized text generation.
arXiv Detail & Related papers (2023-05-05T03:02:41Z) - Generating More Pertinent Captions by Leveraging Semantics and Style on
Multi-Source Datasets [56.018551958004814]
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources.
Large-scale datasets with noisy image-text pairs provide a sub-optimal source of supervision.
We propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component.
arXiv Detail & Related papers (2021-11-24T19:00:05Z) - Contextual Text Style Transfer [73.66285813595616]
Contextual Text Style Transfer aims to translate a sentence into a desired style with its surrounding context taken into account.
We propose a Context-Aware Style Transfer (CAST) model, which uses two separate encoders for each input sentence and its surrounding context.
Two new benchmarks, Enron-Context and Reddit-Context, are introduced for formality and offensiveness style transfer.
arXiv Detail & Related papers (2020-04-30T23:01:12Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.