DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction
- URL: http://arxiv.org/abs/2303.01573v2
- Date: Wed, 29 Mar 2023 20:02:14 GMT
- Title: DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction
- Authors: Shubhankar Borse, Debasmit Das, Hyojin Park, Hong Cai, Risheek
Garrepalli, Fatih Porikli
- Abstract summary: We use conditional image regeneration as additional supervision during training to improve deep networks for dense prediction tasks.
DejaVu can be extended to incorporate an attention-based regeneration module within the dense prediction network.
- Score: 45.89461725594674
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present DejaVu, a novel framework which leverages conditional image
regeneration as additional supervision during training to improve deep networks
for dense prediction tasks such as segmentation, depth estimation, and surface
normal prediction. First, we apply redaction to the input image, which removes
certain structural information by sparse sampling or selective frequency
removal. Next, we use a conditional regenerator, which takes the redacted image
and the dense predictions as inputs, and reconstructs the original image by
filling in the missing structural information. In the redacted image,
structural attributes like boundaries are broken while semantic context is
largely preserved. In order to make the regeneration feasible, the conditional
generator will then require the structure information from the other input
source, i.e., the dense predictions. As such, by including this conditional
regeneration objective during training, DejaVu encourages the base network to
learn to embed accurate scene structure in its dense prediction. This leads to
more accurate predictions with clearer boundaries and better spatial
consistency. When it is feasible to leverage additional computation, DejaVu can
be extended to incorporate an attention-based regeneration module within the
dense prediction network, which further improves accuracy. Through extensive
experiments on multiple dense prediction benchmarks such as Cityscapes, COCO,
ADE20K, NYUD-v2, and KITTI, we demonstrate the efficacy of employing DejaVu
during training, as it outperforms SOTA methods at no added computation cost.
Related papers
- Implicit Occupancy Flow Fields for Perception and Prediction in
Self-Driving [68.95178518732965]
A self-driving vehicle (SDV) must be able to perceive its surroundings and predict the future behavior of other traffic participants.
Existing works either perform object detection followed by trajectory of the detected objects, or predict dense occupancy and flow grids for the whole scene.
This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network.
arXiv Detail & Related papers (2023-08-02T23:39:24Z) - Predicting Temporal Aspects of Movement for Predictive Replication in
Fog Environments [0.0]
Blind or reactive data falls short in harnessing the potential of fog computing.
We propose a novel model using Holt-Winter's Exponential Smoothing for temporal prediction.
In a fog network simulation with real user trajectories our model achieves a 15% reduction in excess data with a marginal 1% decrease in data availability.
arXiv Detail & Related papers (2023-06-01T11:45:13Z) - Understanding Reconstruction Attacks with the Neural Tangent Kernel and
Dataset Distillation [110.61853418925219]
We build a stronger version of the dataset reconstruction attack and show how it can provably recover the emphentire training set in the infinite width regime.
We show that both theoretically and empirically, reconstructed images tend to "outliers" in the dataset.
These reconstruction attacks can be used for textitdataset distillation, that is, we can retrain on reconstructed images and obtain high predictive accuracy.
arXiv Detail & Related papers (2023-02-02T21:41:59Z) - Revealing Disocclusions in Temporal View Synthesis through Infilling
Vector Prediction [6.51882364384472]
We study the idea of an infilling vector to infill by pointing to a non-disoccluded region in the synthesized view.
To exploit the structure of disocclusions created by camera motion during their infilling, we rely on two important cues, temporal correlation of infilling directions and depth.
arXiv Detail & Related papers (2021-10-17T12:11:34Z) - SLPC: a VRNN-based approach for stochastic lidar prediction and
completion in autonomous driving [63.87272273293804]
We propose a new LiDAR prediction framework that is based on generative models namely Variational Recurrent Neural Networks (VRNNs)
Our algorithm is able to address the limitations of previous video prediction frameworks when dealing with sparse data by spatially inpainting the depth maps in the upcoming frames.
We present a sparse version of VRNNs and an effective self-supervised training method that does not require any labels.
arXiv Detail & Related papers (2021-02-19T11:56:44Z) - CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth [83.77839773394106]
We present a lightweight, tightly-coupled deep depth network and visual-inertial odometry system.
We provide the network with previously marginalized sparse features from VIO to increase the accuracy of initial depth prediction.
We show that it can run in real-time with single-thread execution while utilizing GPU acceleration only for the network and code Jacobian.
arXiv Detail & Related papers (2020-12-18T09:42:54Z) - Set Prediction without Imposing Structure as Conditional Density
Estimation [40.86881969839325]
We propose an alternative to training via set losses by viewing learning as conditional density estimation.
Our framework fits deep energy-based models and approximates the intractable likelihood with gradient-guided sampling.
Our approach is competitive with previous set prediction models on standard benchmarks.
arXiv Detail & Related papers (2020-10-08T16:49:16Z) - Representation Learning for Sequence Data with Deep Autoencoding
Predictive Components [96.42805872177067]
We propose a self-supervised representation learning method for sequence data, based on the intuition that useful representations of sequence data should exhibit a simple structure in the latent space.
We encourage this latent structure by maximizing an estimate of predictive information of latent feature sequences, which is the mutual information between past and future windows at each time step.
We demonstrate that our method recovers the latent space of noisy dynamical systems, extracts predictive features for forecasting tasks, and improves automatic speech recognition when used to pretrain the encoder on large amounts of unlabeled data.
arXiv Detail & Related papers (2020-10-07T03:34:01Z) - MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction [26.151761714896118]
We address the problem of multimodal trajectory prediction exploiting a Memory Augmented Neural Network.
Our method learns past and future trajectory embeddings using recurrent neural networks and exploits an associative external memory to store and retrieve such embeddings.
Trajectory prediction is then performed by decoding in-memory future encodings conditioned with the observed past.
arXiv Detail & Related papers (2020-06-05T09:49:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.