Crossing You in Style: Cross-modal Style Transfer from Music to Visual
Arts
- URL: http://arxiv.org/abs/2009.08083v1
- Date: Thu, 17 Sep 2020 05:58:13 GMT
- Title: Crossing You in Style: Cross-modal Style Transfer from Music to Visual
Arts
- Authors: Cheng-Che Lee, Wan-Yi Lin, Yen-Ting Shih, Pei-Yi Patricia Kuo, Li Su
- Abstract summary: Music-to-visual style transfer is a challenging yet important cross-modal learning problem in the practice of creativity.
We solve the music-to-visual style transfer problem in two steps: music visualization and style transfer.
Experiments are conducted on WikiArt-IMSLP, a dataset including Western music recordings and paintings listed by decades.
- Score: 11.96629917390208
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Music-to-visual style transfer is a challenging yet important cross-modal
learning problem in the practice of creativity. Its major difference from the
traditional image style transfer problem is that the style information is
provided by music rather than images. Assuming that musical features can be
properly mapped to visual contents through semantic links between the two
domains, we solve the music-to-visual style transfer problem in two steps:
music visualization and style transfer. The music visualization network
utilizes an encoder-generator architecture with a conditional generative
adversarial network to generate image-based music representations from music
data. This network is integrated with an image style transfer method to
accomplish the style transfer process. Experiments are conducted on
WikiArt-IMSLP, a newly compiled dataset including Western music recordings and
paintings listed by decades. By utilizing such a label to learn the semantic
connection between paintings and music, we demonstrate that the proposed
framework can generate diverse image style representations from a music piece,
and these representations can unveil certain art forms of the same era.
Subjective testing results also emphasize the role of the era label in
improving the perceptual quality on the compatibility between music and visual
content.
Related papers
- MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization [52.498942604622165]
This paper presents MuVi, a framework to generate music that aligns with video content.
MuVi analyzes video content through a specially designed visual adaptor to extract contextually and temporally relevant features.
We show that MuVi demonstrates superior performance in both audio quality and temporal synchronization.
arXiv Detail & Related papers (2024-10-16T18:44:56Z) - Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings [10.302353984541497]
This research develops a model capable of generating music that resonates with the emotions depicted in visual arts.
Addressing the scarcity of aligned art and music data, we curated the Emotion Painting Music dataset.
Our dual-stage framework converts images to text descriptions of emotional content and then transforms these descriptions into music, facilitating efficient learning with minimal data.
arXiv Detail & Related papers (2024-09-12T08:19:25Z) - DiffStyler: Controllable Dual Diffusion for Text-Driven Image
Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results.
We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z) - Vis2Mus: Exploring Multimodal Representation Mapping for Controllable
Music Generation [11.140337453072311]
We explore the representation mapping from the domain of visual arts to the domain of music.
We adopt an analysis-by-interpret approach that combines deep music representation learning with user studies.
We release the Vis2Mus system as a controllable interface for symbolic music generation.
arXiv Detail & Related papers (2022-11-10T13:01:26Z) - Music Sentiment Transfer [77.99182201815763]
Music sentiment transfer attempts to apply the high level objective of sentiment transfer to the domain of music.
In order to use the network, we choose to use symbolic, MIDI, data as the music format.
Results and literature suggest that the task of music sentiment transfer is more difficult than image sentiment transfer because of the temporal characteristics of music.
arXiv Detail & Related papers (2021-10-12T06:51:38Z) - Language-Driven Image Style Transfer [72.36790598245096]
We introduce a new task -- language-driven image style transfer (textttLDIST) -- to manipulate the style of a content image, guided by a text.
The discriminator considers the correlation between language and patches of style images or transferred results to jointly embed style instructions.
Experiments show that our CLVA is effective and achieves superb transferred results on textttLDIST.
arXiv Detail & Related papers (2021-06-01T01:58:50Z) - StyTr^2: Unbiased Image Style Transfer with Transformers [59.34108877969477]
The goal of image style transfer is to render an image with artistic features guided by a style reference while maintaining the original content.
Traditional neural style transfer methods are usually biased and content leak can be observed by running several times of the style transfer process with the same reference image.
We propose a transformer-based approach, namely StyTr2, to address this critical issue.
arXiv Detail & Related papers (2021-05-30T15:57:09Z) - Self-Supervised VQ-VAE For One-Shot Music Style Transfer [2.6381163133447836]
We present a novel method for one-shot timbre transfer based on an extension of the vector-quantized variational autoencoder (VQ-VAE)
We evaluate the method using a set of objective metrics and show that it is able to outperform selected baselines.
arXiv Detail & Related papers (2021-02-10T21:42:49Z) - Tr\"aumerAI: Dreaming Music with StyleGAN [2.578242050187029]
We propose a neural music visualizer directly mapping deep music embeddings to style embeddings of StyleGAN.
An annotator listened to 100 music clips of 10 seconds long and selected an image that suits the music among the StyleGAN-generated examples.
The generated examples show that the mapping between audio and video makes a certain level of intra-segment similarity and inter-segment dissimilarity.
arXiv Detail & Related papers (2021-02-09T07:04:22Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.