Self-Supervised Structure-from-Motion through Tightly-Coupled Depth and
Egomotion Networks
- URL: http://arxiv.org/abs/2106.04007v1
- Date: Mon, 7 Jun 2021 23:30:45 GMT
- Title: Self-Supervised Structure-from-Motion through Tightly-Coupled Depth and
Egomotion Networks
- Authors: Brandon Wagstaff and Valentin Peretroukhin and Jonathan Kelly
- Abstract summary: We introduce several notions of coupling, categorize existing approaches, and present a novel tightly-coupled approach.
We demonstrate that our approach promotes consistency between the depth and egomotion predictions at test time, improves generalization on new data, and leads to state-of-the-art accuracy on indoor and outdoor depth and egomotion evaluation benchmarks.
- Score: 11.888728516442905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Much recent literature has formulated structure-from-motion (SfM) as a
self-supervised learning problem where the goal is to jointly learn neural
network models of depth and egomotion through view synthesis. Herein, we
address the open problem of how to optimally couple the depth and egomotion
network components. Toward this end, we introduce several notions of coupling,
categorize existing approaches, and present a novel tightly-coupled approach
that leverages the interdependence of depth and egomotion at training and at
inference time. Our approach uses iterative view synthesis to recursively
update the egomotion network input, permitting contextual information to be
passed between the components without explicit weight sharing. Through
substantial experiments, we demonstrate that our approach promotes consistency
between the depth and egomotion predictions at test time, improves
generalization on new data, and leads to state-of-the-art accuracy on indoor
and outdoor depth and egomotion evaluation benchmarks.
Related papers
- Learning by Steering the Neural Dynamics: A Statistical Mechanics Perspective [0.0]
We study how neural dynamics can support fully local, distributed learning.<n>We propose a biologically plausible algorithm for supervised learning with any binary recurrent network.
arXiv Detail & Related papers (2025-10-13T22:28:34Z) - Machine Learning and Control: Foundations, Advances, and Perspectives [0.0]
We show that concepts such as simultaneous and ensemble controllability offer new insights into the classification and representation properties of deep neural networks.<n>We also explore the relationship between dynamic and static neural networks, where depth is traded for width.<n>We describe how classical properties of diffusion processes, long established in the context of partial differential equations, contribute to explaining the success of modern generative artificial intelligence.
arXiv Detail & Related papers (2025-09-30T10:47:26Z) - Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [55.914891182214475]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z) - Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Connectivity-Inspired Network for Context-Aware Recognition [1.049712834719005]
We focus on the effect of incorporating circuit motifs found in biological brains to address visual recognition.
Our convolutional architecture is inspired by the connectivity of human cortical and subcortical streams.
We present a new plug-and-play module to model context awareness.
arXiv Detail & Related papers (2024-09-06T15:42:10Z) - DSAM: A Deep Learning Framework for Analyzing Temporal and Spatial Dynamics in Brain Networks [4.041732967881764]
Most rs-fMRI studies compute a single static functional connectivity matrix across brain regions of interest.
These approaches are at risk of oversimplifying brain dynamics and lack proper consideration of the goal at hand.
We propose a novel interpretable deep learning framework that learns goal-specific functional connectivity matrix directly from time series.
arXiv Detail & Related papers (2024-05-19T23:35:06Z) - Critical Learning Periods for Multisensory Integration in Deep Networks [112.40005682521638]
We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training.
We show that critical periods arise from the complex and unstable early transient dynamics, which are decisive of final performance of the trained system and their learned representations.
arXiv Detail & Related papers (2022-10-06T23:50:38Z) - Learning What and Where -- Unsupervised Disentangling Location and
Identity Tracking [0.44040106718326594]
We introduce an unsupervisedd LOCation and Identity tracking system (Loci)
Inspired by the dorsal-ventral pathways in the brain, Loci tackles the what-and-where binding problem by means of a self-supervised segregation mechanism.
Loci may set the stage for deeper, explanation-oriented video processing.
arXiv Detail & Related papers (2022-05-26T13:30:14Z) - MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis [84.7287684402508]
Current deep learning approaches for multimodal fusion rely on bottom-up fusion of high and mid-level latent modality representations.
Models of human perception highlight the importance of top-down fusion, where high-level representations affect the way sensory inputs are perceived.
We propose a neural architecture that captures top-down cross-modal interactions, using a feedback mechanism in the forward pass during network training.
arXiv Detail & Related papers (2022-01-24T17:48:04Z) - Being Friends Instead of Adversaries: Deep Networks Learn from Data
Simplified by Other Networks [23.886422706697882]
A different idea has been recently proposed, named Friendly Training, which consists in altering the input data by adding an automatically estimated perturbation.
We revisit and extend this idea inspired by the effectiveness of neural generators in the context of Adversarial Machine Learning.
We propose an auxiliary multi-layer network that is responsible of altering the input data to make them easier to be handled by the classifier.
arXiv Detail & Related papers (2021-12-18T16:59:35Z) - X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task
Distillation [69.9604394044652]
We propose a novel method to improve the self-supervised training of monocular depth via cross-task knowledge distillation.
During training, we utilize a pretrained semantic segmentation teacher network and transfer its semantic knowledge to the depth network.
We extensively evaluate the efficacy of our proposed approach on the KITTI benchmark and compare it with the latest state of the art.
arXiv Detail & Related papers (2021-10-24T19:47:14Z) - Self-Supervised Learning of Depth and Ego-Motion from Video by
Alternative Training and Geometric Constraints from 3D to 2D [5.481942307939029]
Self-supervised learning of depth and ego-motion from unlabeled monocular video has acquired promising results.
In this paper, we aim to improve the depth-pose learning performance without the auxiliary tasks.
We design a log-scale 3D structural consistency loss to put more emphasis on the smaller depth values during training.
arXiv Detail & Related papers (2021-08-04T11:40:53Z) - Deep Reinforced Attention Learning for Quality-Aware Visual Recognition [73.15276998621582]
We build upon the weakly-supervised generation mechanism of intermediate attention maps in any convolutional neural networks.
We introduce a meta critic network to evaluate the quality of attention maps in the main network.
arXiv Detail & Related papers (2020-07-13T02:44:38Z) - An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d)
This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.