Musical Word Embedding: Bridging the Gap between Listening Contexts and
Music
- URL: http://arxiv.org/abs/2008.01190v1
- Date: Thu, 23 Jul 2020 06:42:45 GMT
- Title: Musical Word Embedding: Bridging the Gap between Listening Contexts and
Music
- Authors: Seungheon Doh, Jongpil Lee, Tae Hong Park, Juhan Nam
- Abstract summary: We train the distributed representation of words using combinations of both general text data and music-specific data.
We evaluate the system in terms of how they associate listening contexts with musical compositions.
- Score: 5.89179309980335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Word embedding pioneered by Mikolov et al. is a staple technique for word
representations in natural language processing (NLP) research which has also
found popularity in music information retrieval tasks. Depending on the type of
text data for word embedding, however, vocabulary size and the degree of
musical pertinence can significantly vary. In this work, we (1) train the
distributed representation of words using combinations of both general text
data and music-specific data and (2) evaluate the system in terms of how they
associate listening contexts with musical compositions.
Related papers
- WikiMuTe: A web-sourced dataset of semantic descriptions for music audio [7.4327407361824935]
We present WikiMuTe, a new and open dataset containing rich semantic descriptions of music.
The data is sourced from Wikipedia's rich catalogue of articles covering musical works.
We train a model that jointly learns text and audio representations and performs cross-modal retrieval.
arXiv Detail & Related papers (2023-12-14T18:38:02Z) - Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves.
We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features.
Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z) - PWESuite: Phonetic Word Embeddings and Tasks They Facilitate [37.09948594297879]
We develop three methods that use articulatory features to build phonetically informed word embeddings.
We also contribute a task suite to fairly evaluate past, current, and future methods.
arXiv Detail & Related papers (2023-04-05T16:03:42Z) - Audio-text Retrieval in Context [24.38055340045366]
In this work, we investigate several audio features as well as sequence aggregation methods for better audio-text alignment.
We build our contextual audio-text retrieval system using pre-trained audio features and a descriptor-based aggregation method.
With our proposed system, a significant improvement has been achieved on bidirectional audio-text retrieval, on all metrics including recall, median and mean rank.
arXiv Detail & Related papers (2022-03-25T13:41:17Z) - MULTIMODAL ANALYSIS: Informed content estimation and audio source
separation [0.0]
The singing voice directly connects the audio signal and the text information in a unique way.
Our study focuses on the audio and lyrics interaction for targeting source separation and informed content estimation.
arXiv Detail & Related papers (2021-04-27T15:45:21Z) - Match-Ignition: Plugging PageRank into Transformer for Long-form Text
Matching [66.71886789848472]
We propose a novel hierarchical noise filtering model, namely Match-Ignition, to tackle the effectiveness and efficiency problem.
The basic idea is to plug the well-known PageRank algorithm into the Transformer, to identify and filter both sentence and word level noisy information.
Noisy sentences are usually easy to detect because the sentence is the basic unit of a long-form text, so we directly use PageRank to filter such information.
arXiv Detail & Related papers (2021-01-16T10:34:03Z) - Improving Machine Reading Comprehension with Contextualized Commonsense
Knowledge [62.46091695615262]
We aim to extract commonsense knowledge to improve machine reading comprehension.
We propose to represent relations implicitly by situating structured knowledge in a context.
We employ a teacher-student paradigm to inject multiple types of contextualized knowledge into a student machine reader.
arXiv Detail & Related papers (2020-09-12T17:20:01Z) - Seeing wake words: Audio-visual Keyword Spotting [103.12655603634337]
KWS-Net is a novel convolutional architecture that uses a similarity map intermediate representation to separate the task into sequence matching and pattern detection.
We show that our method generalises to other languages, specifically French and German, and achieves a comparable performance to English with less language specific data.
arXiv Detail & Related papers (2020-09-02T17:57:38Z) - On Vocabulary Reliance in Scene Text Recognition [79.21737876442253]
Methods perform well on images with words within vocabulary but generalize poorly to images with words outside vocabulary.
We call this phenomenon "vocabulary reliance"
We propose a simple yet effective mutual learning strategy to allow models of two families to learn collaboratively.
arXiv Detail & Related papers (2020-05-08T11:16:58Z) - Comparative Analysis of Word Embeddings for Capturing Word Similarities [0.0]
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks.
Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings.
selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.
arXiv Detail & Related papers (2020-05-08T01:16:03Z) - A Survey on Contextual Embeddings [48.04732268018772]
Contextual embeddings assign each word a representation based on its context, capturing uses of words across varied contexts and encoding knowledge that transfers across languages.
We review existing contextual embedding models, cross-lingual polyglot pre-training, the application of contextual embeddings in downstream tasks, model compression, and model analyses.
arXiv Detail & Related papers (2020-03-16T15:22:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.