Style-Aware Contrastive Learning for Multi-Style Image Captioning
- URL: http://arxiv.org/abs/2301.11367v1
- Date: Thu, 26 Jan 2023 19:21:39 GMT
- Title: Style-Aware Contrastive Learning for Multi-Style Image Captioning
- Authors: Yucheng Zhou, Guodong Long
- Abstract summary: We present a style-aware visual encoder with contrastive learning to mine potential visual content relevant to style.
We also propose a style-aware triplet contrast objective to distinguish whether the image, style and caption matched.
Experimental results demonstrate that our approach achieves state-of-the-art performance.
- Score: 36.1319565907582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing multi-style image captioning methods show promising results in
generating a caption with accurate visual content and desired linguistic style.
However, existing methods overlook the relationship between linguistic style
and visual content. To overcome this drawback, we propose style-aware
contrastive learning for multi-style image captioning. First, we present a
style-aware visual encoder with contrastive learning to mine potential visual
content relevant to style. Moreover, we propose a style-aware triplet contrast
objective to distinguish whether the image, style and caption matched. To
provide positive and negative samples for contrastive learning, we present
three retrieval schemes: object-based retrieval, RoI-based retrieval and
triplet-based retrieval, and design a dynamic trade-off function to calculate
retrieval scores. Experimental results demonstrate that our approach achieves
state-of-the-art performance. In addition, we conduct an extensive analysis to
verify the effectiveness of our method.
Related papers
- ALADIN-NST: Self-supervised disentangled representation learning of
artistic style through Neural Style Transfer [60.6863849241972]
We learn a representation of visual artistic style more strongly disentangled from the semantic content depicted in an image.
We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics.
arXiv Detail & Related papers (2023-04-12T10:33:18Z) - Learning Visual Representations via Language-Guided Sampling [25.117909306792324]
We use language similarity to sample semantically similar image pairs for contrastive learning.
Our approach diverges from image-based contrastive learning by sampling view pairs using language similarity.
We show that language-guided learning yields better features than image-based and image-text representation learning approaches.
arXiv Detail & Related papers (2023-02-23T18:59:05Z) - Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning.
Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z) - Deep Learning Approaches on Image Captioning: A Review [0.5852077003870417]
Image captioning aims to generate natural language descriptions for visual content in the form of still images.
Deep learning and vision-language pre-training techniques have revolutionized the field, leading to more sophisticated methods and improved performance.
We address the challenges faced in this field by emphasizing issues such as object hallucination, missing context, illumination conditions, contextual understanding, and referring expressions.
We identify several potential future directions for research in this area, which include tackling the information misalignment problem between image and text modalities, mitigating dataset bias, incorporating vision-language pre-training methods to enhance caption generation, and developing improved evaluation tools to accurately
arXiv Detail & Related papers (2022-01-31T00:39:37Z) - Generating More Pertinent Captions by Leveraging Semantics and Style on
Multi-Source Datasets [56.018551958004814]
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources.
Large-scale datasets with noisy image-text pairs provide a sub-optimal source of supervision.
We propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component.
arXiv Detail & Related papers (2021-11-24T19:00:05Z) - STALP: Style Transfer with Auxiliary Limited Pairing [36.23393954839379]
We present an approach to example-based stylization of images that uses a single pair of a source image and its stylized counterpart.
We demonstrate how to train an image translation network that can perform real-time semantically meaningful style transfer to a set of target images.
arXiv Detail & Related papers (2021-10-20T11:38:41Z) - Language-Driven Image Style Transfer [72.36790598245096]
We introduce a new task -- language-driven image style transfer (textttLDIST) -- to manipulate the style of a content image, guided by a text.
The discriminator considers the correlation between language and patches of style images or transferred results to jointly embed style instructions.
Experiments show that our CLVA is effective and achieves superb transferred results on textttLDIST.
arXiv Detail & Related papers (2021-06-01T01:58:50Z) - Matching Visual Features to Hierarchical Semantic Topics for Image
Paragraph Captioning [50.08729005865331]
This paper develops a plug-and-play hierarchical-topic-guided image paragraph generation framework.
To capture the correlations between the image and text at multiple levels of abstraction, we design a variational inference network.
To guide the paragraph generation, the learned hierarchical topics and visual features are integrated into the language model.
arXiv Detail & Related papers (2021-05-10T06:55:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.