Deep Learning for MIR Tutorial
- URL: http://arxiv.org/abs/2001.05266v1
- Date: Wed, 15 Jan 2020 12:23:17 GMT
- Title: Deep Learning for MIR Tutorial
- Authors: Alexander Schindler, Thomas Lidy, Sebastian B\"ock
- Abstract summary: The tutorial covers a wide range of MIR relevant deep learning approaches.
textbfConvolutional Neural Networks are currently a de-facto standard for deep learning based audio retrieval.
textbfSiamese Networks have been shown effective in learning audio representations and distance functions specific for music similarity retrieval.
- Score: 68.8204255655161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Learning has become state of the art in visual computing and
continuously emerges into the Music Information Retrieval (MIR) and audio
retrieval domain. In order to bring attention to this topic we propose an
introductory tutorial on deep learning for MIR. Besides a general introduction
to neural networks, the proposed tutorial covers a wide range of MIR relevant
deep learning approaches. \textbf{Convolutional Neural Networks} are currently
a de-facto standard for deep learning based audio retrieval. \textbf{Recurrent
Neural Networks} have proven to be effective in onset detection tasks such as
beat or audio-event detection. \textbf{Siamese Networks} have been shown
effective in learning audio representations and distance functions specific for
music similarity retrieval. We will incorporate both academic and industrial
points of view into the tutorial. Accompanying the tutorial, we will create a
Github repository for the content presented at the tutorial as well as
references to state of the art work and literature for further reading. This
repository will remain public after the conference.
Related papers
- Deep Learning with CNNs: A Compact Holistic Tutorial with Focus on Supervised Regression (Preprint) [0.0]
This tutorial focuses on Convolutional Neural Networks (CNNs) and supervised regression.
It not only summarizes the most relevant concepts but also provides an in-depth exploration of each, offering a complete yet agile set of ideas.
We aim for this tutorial to serve as an optimal resource for students, professors, and anyone interested in understanding the foundations of Deep Learning.
arXiv Detail & Related papers (2024-08-22T11:34:34Z) - Exploiting the Semantic Knowledge of Pre-trained Text-Encoders for Continual Learning [70.64617500380287]
Continual learning allows models to learn from new data while retaining previously learned knowledge.
The semantic knowledge available in the label information of the images, offers important semantic information that can be related with previously acquired knowledge of semantic classes.
We propose integrating semantic guidance within and across tasks by capturing semantic similarity using text embeddings.
arXiv Detail & Related papers (2024-08-02T07:51:44Z) - Self-supervised Audiovisual Representation Learning for Remote Sensing Data [96.23611272637943]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing.
By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner.
We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z) - Reasoning-Modulated Representations [85.08205744191078]
We study a common setting where our task is not purely opaque.
Our approach paves the way for a new class of data-efficient representation learning.
arXiv Detail & Related papers (2021-07-19T13:57:13Z) - DeepSpectrumLite: A Power-Efficient Transfer Learning Framework for
Embedded Speech and Audio Processing from Decentralised Data [0.0]
We introduce DeepSpectrumLite, an open-source, lightweight transfer learning framework for on-device speech and audio recognition.
The framework creates and augments Mel-spectrogram plots on-the-fly from raw audio signals which are then used to finetune specific pre-trained CNNs.
The whole pipeline can be run in real-time with a mean inference lag of 242.0 ms when a DenseNet121 model is used on a consumer-grade Motorola moto e7 plus smartphone.
arXiv Detail & Related papers (2021-04-23T14:32:33Z) - PyTorch-Hebbian: facilitating local learning in a deep learning
framework [67.67299394613426]
Hebbian local learning has shown potential as an alternative training mechanism to backpropagation.
We propose a framework for thorough and systematic evaluation of local learning rules in existing deep learning pipelines.
The framework is used to expand the Krotov-Hopfield learning rule to standard convolutional neural networks without sacrificing accuracy.
arXiv Detail & Related papers (2021-01-31T10:53:08Z) - Incorporating Domain Knowledge To Improve Topic Segmentation Of Long
MOOC Lecture Videos [4.189643331553923]
We propose an algorithm for automatically detecting different coherent topics present inside a long lecture video.
We use the language model on speech-to-text transcription to capture the implicit meaning of the whole video.
We also leverage the domain knowledge we can capture the way instructor binds and connects different concepts while teaching.
arXiv Detail & Related papers (2020-12-08T13:37:40Z) - Applications of Deep Neural Networks with Keras [0.0]
Deep learning allows a neural network to learn hierarchies of information in a way that is like the function of the human brain.
This course will introduce the student to classic neural network structures, Conversa Neural Networks (CNN), Long Short-Term Memory (LSTM), Gated Recurrent Neural Networks (GRU), General Adrial Networks (GAN)
arXiv Detail & Related papers (2020-09-11T22:09:10Z) - Detecting Generic Music Features with Single Layer Feedforward Network
using Unsupervised Hebbian Computation [3.8707695363745223]
The authors extract information on such features from a popular open-source music corpus.
They apply unsupervised Hebbian learning techniques on their single-layer neural network using the same dataset.
The unsupervised training algorithm enhances their proposed neural network to achieve an accuracy of 90.36% for successful music feature detection.
arXiv Detail & Related papers (2020-08-31T13:57:31Z) - AVLnet: Learning Audio-Visual Language Representations from
Instructional Videos [69.56522471911396]
We introduce the Audio-Video Language Network (AVLnet), a self-supervised network that learns a shared audio-visual embedding space directly from raw video inputs.
We train AVLnet on HowTo100M, a large corpus of publicly available instructional videos, and evaluate on image retrieval and video retrieval tasks.
Our code, data, and trained models will be released at avlnet.csail.mit.edu.
arXiv Detail & Related papers (2020-06-16T14:38:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.