Abstract: In this paper, we train Mozilla's DeepSpeech architecture on German and Swiss
German speech datasets and compare the results of different training methods.
We first train the models from scratch on both languages and then improve upon
the results by using an English pretrained version of DeepSpeech for weight
initialization and experiment with the effects of freezing different layers
during training. We see that even freezing only one layer already improves the