Abstract: This paper describes joint effort of BUT and Telef\'onica Research on
development of Automatic Speech Recognition systems for Albayzin 2020
Challenge. We compare approaches based on either hybrid or end-to-end models.
In hybrid modelling, we explore the impact of SpecAugment layer on performance.
For end-to-end modelling, we used a convolutional neural network with gated
linear units (GLUs). The performance of such model is also evaluated with an
additional n-gram language model to improve word error rates. We further
inspect source separation methods to extract speech from noisy environment
(i.e. TV shows). More precisely, we assess the effect of using a neural-based
music separator named Demucs. A fusion of our best systems achieved 23.33% WER
in official Albayzin 2020 evaluations. Aside from techniques used in our final
submitted systems, we also describe our efforts in retrieving high quality
transcripts for training.