Generalizing Interactive Backpropagating Refinement for Dense Prediction
- URL: http://arxiv.org/abs/2112.10969v2
- Date: Wed, 22 Dec 2021 11:07:46 GMT
- Title: Generalizing Interactive Backpropagating Refinement for Dense Prediction
- Authors: Fanqing Lin, Brian Price, Tony Martinez
- Abstract summary: We introduce a set of G-BRS layers that enable both global and localized refinement for a range of dense prediction tasks.
Our method can successfully generalize and significantly improve performance of existing pretrained state-of-the-art models with only a few clicks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As deep neural networks become the state-of-the-art approach in the field of
computer vision for dense prediction tasks, many methods have been developed
for automatic estimation of the target outputs given the visual inputs.
Although the estimation accuracy of the proposed automatic methods continues to
improve, interactive refinement is oftentimes necessary for further correction.
Recently, feature backpropagating refinement scheme (f-BRS) has been proposed
for the task of interactive segmentation, which enables efficient optimization
of a small set of auxiliary variables inserted into the pretrained network to
produce object segmentation that better aligns with user inputs. However, the
proposed auxiliary variables only contain channel-wise scale and bias, limiting
the optimization to global refinement only. In this work, in order to
generalize backpropagating refinement for a wide range of dense prediction
tasks, we introduce a set of G-BRS (Generalized Backpropagating Refinement
Scheme) layers that enable both global and localized refinement for the
following tasks: interactive segmentation, semantic segmentation, image matting
and monocular depth estimation. Experiments on SBD, Cityscapes, Mapillary
Vista, Composition-1k and NYU-Depth-V2 show that our method can successfully
generalize and significantly improve performance of existing pretrained
state-of-the-art models with only a few clicks.
Related papers
- ContourFormer:Real-Time Contour-Based End-to-End Instance Segmentation Transformer [9.836892752093297]
This paper presents Contourformer, a real-time contour-based instance segmentation algorithm.
The method is fully based on the DETR paradigm and achieves end-to-end inference through iterative and progressive mechanisms to optimize contours.
We conduct comprehensive evaluations and comparisons with existing state-of-the-art methods, showing significant improvements in both accuracy and inference speed.
arXiv Detail & Related papers (2025-01-29T14:56:27Z) - Towards Generalizable Trajectory Prediction Using Dual-Level Representation Learning And Adaptive Prompting [107.4034346788744]
Existing vehicle trajectory prediction models struggle with generalizability, prediction uncertainties, and handling complex interactions.
We propose Perceiver with Register queries (PerReg+), a novel trajectory prediction framework that introduces: (1) Dual-Level Representation Learning via Self-Distillation (SD) and Masked Reconstruction (MR), capturing global context and fine-grained details; (2) Enhanced Multimodality using register-based queries and pretraining, eliminating the need for clustering and suppression; and (3) Adaptive Prompt Tuning during fine-tuning, freezing the main architecture and optimizing a small number of prompts for efficient adaptation.
arXiv Detail & Related papers (2025-01-08T20:11:09Z) - UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning [35.62208317531141]
We advocate and introduce the unrolling paradigm, also referred to as "learning to optimize"
Our unrolling approach covers various statistical feature distributions and pre-training paradigms.
We report comprehensive experiments, which cover a breadth of fine-grained downstream image classification tasks.
arXiv Detail & Related papers (2024-12-21T19:01:57Z) - Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - ConvPoseCNN2: Prediction and Refinement of Dense 6D Object Poses [23.348510362258402]
We propose a fully-convolutional extension of the PoseCNN method, which densely predicts object translations and orientations.
This has several advantages such as improving the spatial resolution of the orientation predictions.
We demonstrate that our method achieves the same accuracy as PoseCNN on the challenging YCB-Video dataset.
arXiv Detail & Related papers (2022-05-23T08:32:09Z) - Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation [87.54604263202941]
We propose a tiny deep neural network of which partial layers are iteratively exploited for refining its previous estimations.
We employ learned gating criteria to decide whether to exit from the weight-sharing loop, allowing per-sample adaptation in our model.
Our method consistently outperforms state-of-the-art 2D/3D hand pose estimation approaches in terms of both accuracy and efficiency for widely used benchmarks.
arXiv Detail & Related papers (2021-11-11T23:31:34Z) - Predicting Deep Neural Network Generalization with Perturbation Response
Curves [58.8755389068888]
We propose a new framework for evaluating the generalization capabilities of trained networks.
Specifically, we introduce two new measures for accurately predicting generalization gaps.
We attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition.
arXiv Detail & Related papers (2021-06-09T01:37:36Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Unsupervised learning of disentangled representations in deep restricted
kernel machines with orthogonality constraints [15.296955630621566]
Constr-DRKM is a deep kernel method for the unsupervised learning of disentangled data representations.
We quantitatively evaluate the proposed method's effectiveness in disentangled feature learning.
arXiv Detail & Related papers (2020-11-25T11:40:10Z) - Target-Embedding Autoencoders for Supervised Representation Learning [111.07204912245841]
This paper analyzes a framework for improving generalization in a purely supervised setting, where the target space is high-dimensional.
We motivate and formalize the general framework of target-embedding autoencoders (TEA) for supervised prediction, learning intermediate latent representations jointly optimized to be both predictable from features as well as predictive of targets.
arXiv Detail & Related papers (2020-01-23T02:37:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.