Related papers: Cockpit: A Practical Debugging Tool for Training Deep Neural Networks

Cockpit: A Practical Debugging Tool for Training Deep Neural Networks

URL: http://arxiv.org/abs/2102.06604v1
Date: Fri, 12 Feb 2021 16:28:49 GMT
Title: Cockpit: A Practical Debugging Tool for Training Deep Neural Networks
Authors: Frank Schneider and Felix Dangel and Philipp Hennig
Abstract summary: We present a collection of instruments that enable a closer look into the inner workings of a learning machine. These instruments leverage novel higher-order information about the gradient distribution and curvature.
Score: 27.96164890143314
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When engineers train deep learning models, they are very much "flying blind". Commonly used approaches for real-time training diagnostics, such as monitoring the train/test loss, are limited. Assessing a network's training process solely through these performance indicators is akin to debugging software without access to internal states through a debugger. To address this, we present Cockpit, a collection of instruments that enable a closer look into the inner workings of a learning machine, and a more informative and meaningful status report for practitioners. It facilitates the identification of learning phases and failure modes, like ill-chosen hyperparameters. These instruments leverage novel higher-order information about the gradient distribution and curvature, which has only recently become efficiently accessible. We believe that such a debugging tool, which we open-source for PyTorch, represents an important step to improve troubleshooting the training process, reveal new insights, and help develop novel methods and heuristics.

Related papers

Efficient Unsupervised Shortcut Learning Detection and Mitigation in Transformers [6.080064619880841]
Shortcut learning, i.e., a model's reliance on undesired features not directly relevant to the task, is a major challenge that severely limits the applications of machine learning algorithms. We leverage recent advancements in machine learning to create an unsupervised framework that is capable of both detecting and mitigating shortcut learning in transformers.
arXiv Detail & Related papers (2025-01-01T19:52:19Z)
Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario. To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability. This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability. Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead. We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z)
Robust Machine Learning by Transforming and Augmenting Imperfect Training Data [6.928276018602774]
This thesis explores several data sensitivities of modern machine learning. We first discuss how to prevent ML from codifying prior human discrimination measured in the training data. We then discuss the problem of learning from data containing spurious features, which provide predictive fidelity during training but are unreliable upon deployment.
arXiv Detail & Related papers (2023-12-19T20:49:28Z)
Improving Long-Horizon Imitation Through Instruction Prediction [93.47416552953075]
In this work, we explore the use of an often unused source of auxiliary supervision: language. Inspired by recent advances in transformer-based models, we train agents with an instruction prediction loss that encourages learning temporally extended representations that operate at a high level of abstraction. In further analysis we find that instruction modeling is most important for tasks that require complex reasoning, while understandably offering smaller gains in environments that require simple plans.
arXiv Detail & Related papers (2023-06-21T20:47:23Z)
PIVOT: Prompting for Video Continual Learning [50.80141083993668]
We introduce PIVOT, a novel method that leverages extensive knowledge in pre-trained models from the image domain. Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup.
arXiv Detail & Related papers (2022-12-09T13:22:27Z)
Self-supervised Transformer for Deepfake Detection [112.81127845409002]
Deepfake techniques in real-world scenarios require stronger generalization abilities of face forgery detectors. Inspired by transfer learning, neural networks pre-trained on other large-scale face-related tasks may provide useful features for deepfake detection. In this paper, we propose a self-supervised transformer based audio-visual contrastive learning method.
arXiv Detail & Related papers (2022-03-02T17:44:40Z)
Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking [58.14267480293575]
We propose a simple yet effective online learning approach for few-shot online adaptation without requiring offline training. It allows an in-built memory retention mechanism for the model to remember the knowledge about the object seen before. We evaluate our approach based on two networks in the online learning families for tracking, i.e., multi-layer perceptrons in RT-MDNet and convolutional neural networks in DiMP.
arXiv Detail & Related papers (2021-12-28T06:51:18Z)
Active learning using weakly supervised signals for quality inspection [0.16683739531034203]
We develop a methodology for learning actively, from rapidly mined, weakly annotated data. We tackle a big machine vision weakness: false positives. In that regard, we show domain-adversarial training to be an efficient way to address this issue.
arXiv Detail & Related papers (2021-04-07T07:49:07Z)
Data-efficient Weakly-supervised Learning for On-line Object Detection under Domain Shift in Robotics [24.878465999976594]
Several object detection methods have been proposed in the literature, the vast majority based on Deep Convolutional Neural Networks (DCNNs) These methods have important limitations for robotics: Learning solely on off-line data may introduce biases, and prevents adaptation to novel tasks. In this work, we investigate how weakly-supervised learning can cope with these problems.
arXiv Detail & Related papers (2020-12-28T16:36:11Z)
Skyline: Interactive In-Editor Computational Performance Profiling for Deep Neural Network Training [24.512629761651535]
Skyline is an in-editor tool for training a state-of-the-art deep neural network (DNN) It provides interactive performance predictions and visualizations, and directly manipulatable visualizations that, when dragged, mutate the batch size in the code. An exploratory qualitative user study of Skyline produced promising results.
arXiv Detail & Related papers (2020-08-15T22:17:00Z)
Auto-Rectify Network for Unsupervised Indoor Depth Estimation [119.82412041164372]
We establish that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth. We propose a data pre-processing method that rectifies training images by removing their relative rotations for effective learning. Our results outperform the previous unsupervised SOTA method by a large margin on the challenging NYUv2 dataset.
arXiv Detail & Related papers (2020-06-04T08:59:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.