Deep Learning for MIR Tutorial
- URL: http://arxiv.org/abs/2001.05266v1
- Date: Wed, 15 Jan 2020 12:23:17 GMT
- Title: Deep Learning for MIR Tutorial
- Authors: Alexander Schindler, Thomas Lidy, Sebastian B\"ock
- Abstract summary: The tutorial covers a wide range of MIR relevant deep learning approaches.
textbfConvolutional Neural Networks are currently a de-facto standard for deep learning based audio retrieval.
textbfSiamese Networks have been shown effective in learning audio representations and distance functions specific for music similarity retrieval.
- Score: 68.8204255655161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Learning has become state of the art in visual computing and
continuously emerges into the Music Information Retrieval (MIR) and audio
retrieval domain. In order to bring attention to this topic we propose an
introductory tutorial on deep learning for MIR. Besides a general introduction
to neural networks, the proposed tutorial covers a wide range of MIR relevant
deep learning approaches. \textbf{Convolutional Neural Networks} are currently
a de-facto standard for deep learning based audio retrieval. \textbf{Recurrent
Neural Networks} have proven to be effective in onset detection tasks such as
beat or audio-event detection. \textbf{Siamese Networks} have been shown
effective in learning audio representations and distance functions specific for
music similarity retrieval. We will incorporate both academic and industrial
points of view into the tutorial. Accompanying the tutorial, we will create a
Github repository for the content presented at the tutorial as well as
references to state of the art work and literature for further reading. This
repository will remain public after the conference.
Related papers
- SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained
Networks [52.766795949716986]
We present a study of the generalization capabilities of the pre-trained visual representations at the categorical level.
We propose SpawnNet, a novel two-stream architecture that learns to fuse pre-trained multi-layer representations into a separate network to learn a robust policy.
arXiv Detail & Related papers (2023-07-07T13:01:29Z) - ResAttUNet: Detecting Marine Debris using an Attention activated
Residual UNet [0.0]
This paper introduces a novel attention based segmentation technique that outperforms the existing state-of-the-art results introduced with MARIDA.
The attained results are expected to pave the path for further research involving deep learning using remote sensing images.
arXiv Detail & Related papers (2022-10-16T10:59:32Z) - Self-supervised Audiovisual Representation Learning for Remote Sensing
Data [70.64030011999981]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing.
By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner.
We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z) - Reasoning-Modulated Representations [85.08205744191078]
We study a common setting where our task is not purely opaque.
Our approach paves the way for a new class of data-efficient representation learning.
arXiv Detail & Related papers (2021-07-19T13:57:13Z) - DeepSpectrumLite: A Power-Efficient Transfer Learning Framework for
Embedded Speech and Audio Processing from Decentralised Data [0.0]
We introduce DeepSpectrumLite, an open-source, lightweight transfer learning framework for on-device speech and audio recognition.
The framework creates and augments Mel-spectrogram plots on-the-fly from raw audio signals which are then used to finetune specific pre-trained CNNs.
The whole pipeline can be run in real-time with a mean inference lag of 242.0 ms when a DenseNet121 model is used on a consumer-grade Motorola moto e7 plus smartphone.
arXiv Detail & Related papers (2021-04-23T14:32:33Z) - PyTorch-Hebbian: facilitating local learning in a deep learning
framework [67.67299394613426]
Hebbian local learning has shown potential as an alternative training mechanism to backpropagation.
We propose a framework for thorough and systematic evaluation of local learning rules in existing deep learning pipelines.
The framework is used to expand the Krotov-Hopfield learning rule to standard convolutional neural networks without sacrificing accuracy.
arXiv Detail & Related papers (2021-01-31T10:53:08Z) - Incorporating Domain Knowledge To Improve Topic Segmentation Of Long
MOOC Lecture Videos [4.189643331553923]
We propose an algorithm for automatically detecting different coherent topics present inside a long lecture video.
We use the language model on speech-to-text transcription to capture the implicit meaning of the whole video.
We also leverage the domain knowledge we can capture the way instructor binds and connects different concepts while teaching.
arXiv Detail & Related papers (2020-12-08T13:37:40Z) - Applications of Deep Neural Networks with Keras [0.0]
Deep learning allows a neural network to learn hierarchies of information in a way that is like the function of the human brain.
This course will introduce the student to classic neural network structures, Conversa Neural Networks (CNN), Long Short-Term Memory (LSTM), Gated Recurrent Neural Networks (GRU), General Adrial Networks (GAN)
arXiv Detail & Related papers (2020-09-11T22:09:10Z) - Detecting Generic Music Features with Single Layer Feedforward Network
using Unsupervised Hebbian Computation [3.8707695363745223]
The authors extract information on such features from a popular open-source music corpus.
They apply unsupervised Hebbian learning techniques on their single-layer neural network using the same dataset.
The unsupervised training algorithm enhances their proposed neural network to achieve an accuracy of 90.36% for successful music feature detection.
arXiv Detail & Related papers (2020-08-31T13:57:31Z) - AVLnet: Learning Audio-Visual Language Representations from
Instructional Videos [69.56522471911396]
We introduce the Audio-Video Language Network (AVLnet), a self-supervised network that learns a shared audio-visual embedding space directly from raw video inputs.
We train AVLnet on HowTo100M, a large corpus of publicly available instructional videos, and evaluate on image retrieval and video retrieval tasks.
Our code, data, and trained models will be released at avlnet.csail.mit.edu.
arXiv Detail & Related papers (2020-06-16T14:38:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.