Affective Feedback Synthesis Towards Multimodal Text and Image Data
- URL: http://arxiv.org/abs/2203.12692v1
- Date: Wed, 23 Mar 2022 19:28:20 GMT
- Title: Affective Feedback Synthesis Towards Multimodal Text and Image Data
- Authors: Puneet Kumar, Gaurav Bhat, Omkar Ingle, Daksh Goyal and
Balasubramanian Raman
- Abstract summary: We have defined a novel task of affective feedback synthesis that deals with generating feedback for input text & corresponding image.
A feedback synthesis system has been proposed and trained using ground-truth human comments along with image-text input.
The generated feedbacks have been analyzed using automatic and human evaluation.
- Score: 12.768277167508208
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we have defined a novel task of affective feedback synthesis
that deals with generating feedback for input text & corresponding image in a
similar way as humans respond towards the multimodal data. A feedback synthesis
system has been proposed and trained using ground-truth human comments along
with image-text input. We have also constructed a large-scale dataset
consisting of image, text, Twitter user comments, and the number of likes for
the comments by crawling the news articles through Twitter feeds. The proposed
system extracts textual features using a transformer-based textual encoder
while the visual features have been extracted using a Faster region-based
convolutional neural networks model. The textual and visual features have been
concatenated to construct the multimodal features using which the decoder
synthesizes the feedback. We have compared the results of the proposed system
with the baseline models using quantitative and qualitative measures. The
generated feedbacks have been analyzed using automatic and human evaluation.
They have been found to be semantically similar to the ground-truth comments
and relevant to the given text-image input.
Related papers
- TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models [39.06617653124486]
We introduce a new evaluation framework called TypeScore to assess a model's ability to generate images with high-fidelity embedded text.
Our proposed metric demonstrates greater resolution than CLIPScore to differentiate popular image generation models.
arXiv Detail & Related papers (2024-11-02T07:56:54Z) - Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data [21.247650660908484]
We have constructed a large-scale Controllable Multimodal Feedback Synthesis dataset and propose a controllable feedback synthesis system.
The system features an encoder, decoder, and controllability block for textual and visual inputs.
The CMFeed dataset includes images, texts, reactions to the posts, human comments with relevance scores, and reactions to these comments.
These reactions train the model to produce feedback with specified sentiments, achieving a sentiment classification accuracy of 77.23%, which is 18.82% higher than the accuracy without controllability.
arXiv Detail & Related papers (2024-02-12T13:27:22Z) - Seek for Incantations: Towards Accurate Text-to-Image Diffusion
Synthesis through Prompt Engineering [118.53208190209517]
We propose a framework to learn the proper textual descriptions for diffusion models through prompt learning.
Our method can effectively learn the prompts to improve the matches between the input text and the generated images.
arXiv Detail & Related papers (2024-01-12T03:46:29Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - WordStylist: Styled Verbatim Handwritten Text Generation with Latent
Diffusion Models [8.334487584550185]
We present a latent diffusion-based method for styled text-to-text-content-image generation on word-level.
Our proposed method is able to generate realistic word image samples from different writer styles.
We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and get similar writer retrieval score as real data.
arXiv Detail & Related papers (2023-03-29T10:19:26Z) - Aligning Text-to-Image Models using Human Feedback [104.76638092169604]
Current text-to-image models often generate images that are inadequately aligned with text prompts.
We propose a fine-tuning method for aligning such models using human feedback.
Our results demonstrate the potential for learning from human feedback to significantly improve text-to-image models.
arXiv Detail & Related papers (2023-02-23T17:34:53Z) - Integrating Visuospatial, Linguistic and Commonsense Structure into
Story Visualization [81.26077816854449]
We first explore the use of constituency parse trees for encoding structured input.
Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story.
Third, we incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images.
arXiv Detail & Related papers (2021-10-21T00:16:02Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.