Related papers: Musical Word Embedding: Bridging the Gap between Listening Contexts and Music

Musical Word Embedding: Bridging the Gap between Listening Contexts and Music

URL: http://arxiv.org/abs/2008.01190v1
Date: Thu, 23 Jul 2020 06:42:45 GMT
Title: Musical Word Embedding: Bridging the Gap between Listening Contexts and Music
Authors: Seungheon Doh, Jongpil Lee, Tae Hong Park, Juhan Nam
Abstract summary: We train the distributed representation of words using combinations of both general text data and music-specific data. We evaluate the system in terms of how they associate listening contexts with musical compositions.
Score: 5.89179309980335
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Word embedding pioneered by Mikolov et al. is a staple technique for word representations in natural language processing (NLP) research which has also found popularity in music information retrieval tasks. Depending on the type of text data for word embedding, however, vocabulary size and the degree of musical pertinence can significantly vary. In this work, we (1) train the distributed representation of words using combinations of both general text data and music-specific data and (2) evaluate the system in terms of how they associate listening contexts with musical compositions.

Related papers

Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval [7.7464988473650935]
Text-to-Music Retrieval plays a pivotal role in content discovery within extensive music databases. This paper proposes an improved Text-to-Music Retrieval model, denoted as TTMR++.
arXiv Detail & Related papers (2024-10-04T09:33:34Z)
SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition [82.38021790213752]
SongComposer is a music-specialized large language model (LLM)<n>It integrates the capability of simultaneously composing melodies into LLMs by leveraging three key innovations.<n>It outperforms advanced LLMs in tasks such as lyric-to-melody generation, melody-to-lyric generation, song continuation, and text-to-song creation.<n>We will release SongCompose, a large-scale dataset for training, containing paired lyrics and melodies in Chinese and English.
arXiv Detail & Related papers (2024-02-27T16:15:28Z)
WikiMuTe: A web-sourced dataset of semantic descriptions for music audio [7.4327407361824935]
We present WikiMuTe, a new and open dataset containing rich semantic descriptions of music. The data is sourced from Wikipedia's rich catalogue of articles covering musical works. We train a model that jointly learns text and audio representations and performs cross-modal retrieval.
arXiv Detail & Related papers (2023-12-14T18:38:02Z)
Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves. We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features. Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z)
PWESuite: Phonetic Word Embeddings and Tasks They Facilitate [37.09948594297879]
We develop three methods that use articulatory features to build phonetically informed word embeddings. We also contribute a task suite to fairly evaluate past, current, and future methods.
arXiv Detail & Related papers (2023-04-05T16:03:42Z)
MULTIMODAL ANALYSIS: Informed content estimation and audio source separation [0.0]
The singing voice directly connects the audio signal and the text information in a unique way. Our study focuses on the audio and lyrics interaction for targeting source separation and informed content estimation.
arXiv Detail & Related papers (2021-04-27T15:45:21Z)
Match-Ignition: Plugging PageRank into Transformer for Long-form Text Matching [66.71886789848472]
We propose a novel hierarchical noise filtering model, namely Match-Ignition, to tackle the effectiveness and efficiency problem. The basic idea is to plug the well-known PageRank algorithm into the Transformer, to identify and filter both sentence and word level noisy information. Noisy sentences are usually easy to detect because the sentence is the basic unit of a long-form text, so we directly use PageRank to filter such information.
arXiv Detail & Related papers (2021-01-16T10:34:03Z)
Improving Machine Reading Comprehension with Contextualized Commonsense Knowledge [62.46091695615262]
We aim to extract commonsense knowledge to improve machine reading comprehension. We propose to represent relations implicitly by situating structured knowledge in a context. We employ a teacher-student paradigm to inject multiple types of contextualized knowledge into a student machine reader.
arXiv Detail & Related papers (2020-09-12T17:20:01Z)
Seeing wake words: Audio-visual Keyword Spotting [103.12655603634337]
KWS-Net is a novel convolutional architecture that uses a similarity map intermediate representation to separate the task into sequence matching and pattern detection. We show that our method generalises to other languages, specifically French and German, and achieves a comparable performance to English with less language specific data.
arXiv Detail & Related papers (2020-09-02T17:57:38Z)
On Vocabulary Reliance in Scene Text Recognition [79.21737876442253]
Methods perform well on images with words within vocabulary but generalize poorly to images with words outside vocabulary. We call this phenomenon "vocabulary reliance" We propose a simple yet effective mutual learning strategy to allow models of two families to learn collaboratively.
arXiv Detail & Related papers (2020-05-08T11:16:58Z)
Comparative Analysis of Word Embeddings for Capturing Word Similarities [0.0]
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.
arXiv Detail & Related papers (2020-05-08T01:16:03Z)
A Survey on Contextual Embeddings [48.04732268018772]
Contextual embeddings assign each word a representation based on its context, capturing uses of words across varied contexts and encoding knowledge that transfers across languages. We review existing contextual embedding models, cross-lingual polyglot pre-training, the application of contextual embeddings in downstream tasks, model compression, and model analyses.
arXiv Detail & Related papers (2020-03-16T15:22:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.