Crossing You in Style: Cross-modal Style Transfer from Music to Visual
Arts
- URL: http://arxiv.org/abs/2009.08083v1
- Date: Thu, 17 Sep 2020 05:58:13 GMT
- Title: Crossing You in Style: Cross-modal Style Transfer from Music to Visual
Arts
- Authors: Cheng-Che Lee, Wan-Yi Lin, Yen-Ting Shih, Pei-Yi Patricia Kuo, Li Su
- Abstract summary: Music-to-visual style transfer is a challenging yet important cross-modal learning problem in the practice of creativity.
We solve the music-to-visual style transfer problem in two steps: music visualization and style transfer.
Experiments are conducted on WikiArt-IMSLP, a dataset including Western music recordings and paintings listed by decades.
- Score: 11.96629917390208
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Music-to-visual style transfer is a challenging yet important cross-modal
learning problem in the practice of creativity. Its major difference from the
traditional image style transfer problem is that the style information is
provided by music rather than images. Assuming that musical features can be
properly mapped to visual contents through semantic links between the two
domains, we solve the music-to-visual style transfer problem in two steps:
music visualization and style transfer. The music visualization network
utilizes an encoder-generator architecture with a conditional generative
adversarial network to generate image-based music representations from music
data. This network is integrated with an image style transfer method to
accomplish the style transfer process. Experiments are conducted on
WikiArt-IMSLP, a newly compiled dataset including Western music recordings and
paintings listed by decades. By utilizing such a label to learn the semantic
connection between paintings and music, we demonstrate that the proposed
framework can generate diverse image style representations from a music piece,
and these representations can unveil certain art forms of the same era.
Subjective testing results also emphasize the role of the era label in
improving the perceptual quality on the compatibility between music and visual
content.
Related papers
- MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - DiffStyler: Controllable Dual Diffusion for Text-Driven Image
Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results.
We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z) - Vis2Mus: Exploring Multimodal Representation Mapping for Controllable
Music Generation [11.140337453072311]
We explore the representation mapping from the domain of visual arts to the domain of music.
We adopt an analysis-by-interpret approach that combines deep music representation learning with user studies.
We release the Vis2Mus system as a controllable interface for symbolic music generation.
arXiv Detail & Related papers (2022-11-10T13:01:26Z) - Music Sentiment Transfer [77.99182201815763]
Music sentiment transfer attempts to apply the high level objective of sentiment transfer to the domain of music.
In order to use the network, we choose to use symbolic, MIDI, data as the music format.
Results and literature suggest that the task of music sentiment transfer is more difficult than image sentiment transfer because of the temporal characteristics of music.
arXiv Detail & Related papers (2021-10-12T06:51:38Z) - Language-Driven Image Style Transfer [72.36790598245096]
We introduce a new task -- language-driven image style transfer (textttLDIST) -- to manipulate the style of a content image, guided by a text.
The discriminator considers the correlation between language and patches of style images or transferred results to jointly embed style instructions.
Experiments show that our CLVA is effective and achieves superb transferred results on textttLDIST.
arXiv Detail & Related papers (2021-06-01T01:58:50Z) - StyTr^2: Unbiased Image Style Transfer with Transformers [59.34108877969477]
The goal of image style transfer is to render an image with artistic features guided by a style reference while maintaining the original content.
Traditional neural style transfer methods are usually biased and content leak can be observed by running several times of the style transfer process with the same reference image.
We propose a transformer-based approach, namely StyTr2, to address this critical issue.
arXiv Detail & Related papers (2021-05-30T15:57:09Z) - LiveStyle -- An Application to Transfer Artistic Styles [0.0]
Style Transfer using Neural Networks refers to optimization techniques, where a content image and a style image are taken and blended.
This paper implements the Style Transfer using three different Neural Networks in form of an application that is accessible to the general population.
arXiv Detail & Related papers (2021-05-03T13:50:48Z) - Self-Supervised VQ-VAE For One-Shot Music Style Transfer [2.6381163133447836]
We present a novel method for one-shot timbre transfer based on an extension of the vector-quantized variational autoencoder (VQ-VAE)
We evaluate the method using a set of objective metrics and show that it is able to outperform selected baselines.
arXiv Detail & Related papers (2021-02-10T21:42:49Z) - Tr\"aumerAI: Dreaming Music with StyleGAN [2.578242050187029]
We propose a neural music visualizer directly mapping deep music embeddings to style embeddings of StyleGAN.
An annotator listened to 100 music clips of 10 seconds long and selected an image that suits the music among the StyleGAN-generated examples.
The generated examples show that the mapping between audio and video makes a certain level of intra-segment similarity and inter-segment dissimilarity.
arXiv Detail & Related papers (2021-02-09T07:04:22Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.