Machine-in-the-Loop Rewriting for Creative Image Captioning
- URL: http://arxiv.org/abs/2111.04193v1
- Date: Sun, 7 Nov 2021 22:17:41 GMT
- Title: Machine-in-the-Loop Rewriting for Creative Image Captioning
- Authors: Vishakh Padmakumar, He He
- Abstract summary: We train a rewriting model that modifies specified spans of text within the user's original draft to introduce descriptive and figurative elements locally in the text.
We evaluate the model on its ability to collaborate with humans on the task of creative image captioning.
- Score: 5.544401446569243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine-in-the-loop writing aims to enable humans to collaborate with models
to complete their writing tasks more effectively. Prior work has found that
providing humans a machine-written draft or sentence-level continuations has
limited success since the generated text tends to deviate from humans'
intention. To allow the user to retain control over the content, we train a
rewriting model that, when prompted, modifies specified spans of text within
the user's original draft to introduce descriptive and figurative elements
locally in the text. We evaluate the model on its ability to collaborate with
humans on the task of creative image captioning. On a user study through Amazon
Mechanical Turk, our model is rated to be more helpful than a baseline
infilling language model. In addition, third-party evaluation shows that users
write more descriptive and figurative captions when collaborating with our
model compared to completing the task alone.
Related papers
- GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency [1.6505331001136512]
GhostWriter is an AI-enhanced writing design probe that enables users to exercise enhanced agency and personalization.
We study 18 participants who use GhostWriter for editing and creative tasks, observing that it helps users craft personalized text.
arXiv Detail & Related papers (2024-02-13T23:48:59Z) - Training-Free Consistent Text-to-Image Generation [80.4814768762066]
Text-to-image models can portray the same subject across diverse prompts.
Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects.
We present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model.
arXiv Detail & Related papers (2024-02-05T18:42:34Z) - Learning to Generate Text in Arbitrary Writing Styles [6.7308816341849695]
It is desirable for language models to produce text in an author-specific style on the basis of a potentially small writing sample.
We propose to guide a language model to generate text in a target style using contrastively-trained representations that capture stylometric features.
arXiv Detail & Related papers (2023-12-28T18:58:52Z) - WordStylist: Styled Verbatim Handwritten Text Generation with Latent
Diffusion Models [8.334487584550185]
We present a latent diffusion-based method for styled text-to-text-content-image generation on word-level.
Our proposed method is able to generate realistic word image samples from different writer styles.
We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and get similar writer retrieval score as real data.
arXiv Detail & Related papers (2023-03-29T10:19:26Z) - MOCHA: A Multi-Task Training Approach for Coherent Text Generation from
Cognitive Perspective [22.69509556890676]
We propose a novel multi-task training strategy for coherent text generation grounded on the cognitive theory of writing.
We extensively evaluate our model on three open-ended generation tasks including story generation, news article writing and argument generation.
arXiv Detail & Related papers (2022-10-26T11:55:41Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - On Advances in Text Generation from Images Beyond Captioning: A Case
Study in Self-Rationalization [89.94078728495423]
We show that recent advances in each modality, CLIP image representations and scaling of language models, do not consistently improve multimodal self-rationalization of tasks with multimodal inputs.
Our findings call for a backbone modelling approach that can be built on to advance text generation from images and text beyond image captioning.
arXiv Detail & Related papers (2022-05-24T00:52:40Z) - TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions.
We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data.
We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z) - Generating More Pertinent Captions by Leveraging Semantics and Style on
Multi-Source Datasets [56.018551958004814]
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources.
Large-scale datasets with noisy image-text pairs provide a sub-optimal source of supervision.
We propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component.
arXiv Detail & Related papers (2021-11-24T19:00:05Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z) - Collaborative Storytelling with Large-scale Neural Language Models [6.0794985566317425]
We introduce the task of collaborative storytelling, where an artificial intelligence agent and a person collaborate to create a unique story by taking turns adding to it.
We present a collaborative storytelling system which works with a human storyteller to create a story by generating new utterances based on the story so far.
arXiv Detail & Related papers (2020-11-20T04:36:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.