High Resolution Multi-Scale RAFT (Robust Vision Challenge 2022)
- URL: http://arxiv.org/abs/2210.16900v1
- Date: Sun, 30 Oct 2022 17:48:11 GMT
- Title: High Resolution Multi-Scale RAFT (Robust Vision Challenge 2022)
- Authors: Azin Jahedi, Maximilian Luz, Lukas Mehl, Marc Rivinius, Andr\'es Bruhn
- Abstract summary: We present our optical flow approach, MS-RAFT+, that won the Robust Vision Challenge 2022.
It is based on the MS-RAFT method, which successfully integrates several multi-scale concepts into single-scale RAFT.
Our approach extends this method by exploiting an additional finer scale for estimating the flow, which is made feasible by on-demand cost computation.
- Score: 0.6299766708197884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this report, we present our optical flow approach, MS-RAFT+, that won the
Robust Vision Challenge 2022. It is based on the MS-RAFT method, which
successfully integrates several multi-scale concepts into single-scale RAFT.
Our approach extends this method by exploiting an additional finer scale for
estimating the flow, which is made feasible by on-demand cost computation. This
way, it can not only operate at half the original resolution, but also use
MS-RAFT's shared convex upsampler to obtain full resolution flow. Moreover, our
approach relies on an adjusted fine-tuning scheme during training. This in turn
aims at improving the generalization across benchmarks. Among all participating
methods in the Robust Vision Challenge, our approach ranks first on VIPER and
second on KITTI, Sintel, and Middlebury, resulting in the first place of the
overall ranking.
Related papers
- Rethinking the Upsampling Layer in Hyperspectral Image Super Resolution [51.98465973507002]
We propose a novel lightweight SHSR network, i.e., LKCA-Net, that incorporates channel attention to calibrate multi-scale channel features of hyperspectral images.
We demonstrate, for the first time, that the low-rank property of the learnable upsampling layer is a key bottleneck in lightweight SHSR methods.
arXiv Detail & Related papers (2025-01-30T15:43:34Z) - First Place Solution to the ECCV 2024 BRAVO Challenge: Evaluating Robustness of Vision Foundation Models for Semantic Segmentation [1.8570591025615457]
We present the first place solution to the ECCV 2024 BRAVO Challenge.
A model is trained on Cityscapes and its robustness is evaluated on several out-of-distribution datasets.
This approach outperforms more complex existing approaches, and achieves first place in the challenge.
arXiv Detail & Related papers (2024-09-25T16:15:06Z) - Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles [83.85151306138007]
Multi-level Actor-Critic (MAC) framework incorporates a Multi-level Monte-Carlo (MLMC) estimator.
We demonstrate that MAC outperforms the existing state-of-the-art policy gradient-based method for average reward settings.
arXiv Detail & Related papers (2024-03-18T16:23:47Z) - Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach [58.57026686186709]
We introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR)
CFSR inherits the advantages of both convolution-based and transformer-based approaches.
Experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance.
arXiv Detail & Related papers (2024-01-11T03:08:00Z) - Diffusion for Natural Image Matting [93.86689168212241]
We present DiffMatte, a solution designed to overcome the challenges of image matting.
First, DiffMatte decouples the decoder from the intricately coupled matting network design, involving only one lightweight decoder in the iterations of the diffusion process.
Second, we employ a self-aligned training strategy with uniform time intervals, ensuring a consistent noise sampling between training and inference across the entire time domain.
arXiv Detail & Related papers (2023-12-10T15:28:56Z) - CCMR: High Resolution Optical Flow Estimation via Coarse-to-Fine
Context-Guided Motion Reasoning [1.0855602842179624]
We propose CCMR: a high-resolution coarse-to-fine approach that leverages attention-based motion grouping concepts to multi-scale optical flow estimation.
CCMR relies on a hierarchical two-step attention-based context-motion grouping strategy.
Experiments and ablations demonstrate that our efforts of combining multi-scale and attention-based concepts pay off.
arXiv Detail & Related papers (2023-11-05T14:14:24Z) - Improving Pixel-based MIM by Reducing Wasted Modeling Capability [77.99468514275185]
We propose a new method that explicitly utilizes low-level features from shallow layers to aid pixel reconstruction.
To the best of our knowledge, we are the first to systematically investigate multi-level feature fusion for isotropic architectures.
Our method yields significant performance gains, such as 1.2% on fine-tuning, 2.8% on linear probing, and 2.6% on semantic segmentation.
arXiv Detail & Related papers (2023-08-01T03:44:56Z) - Blind Face Restoration: Benchmark Datasets and a Baseline Model [63.053331687284064]
Blind Face Restoration (BFR) aims to construct a high-quality (HQ) face image from its corresponding low-quality (LQ) input.
We first synthesize two blind face restoration benchmark datasets called EDFace-Celeb-1M (BFR128) and EDFace-Celeb-150K (BFR512)
State-of-the-art methods are benchmarked on them under five settings including blur, noise, low resolution, JPEG compression artifacts, and the combination of them (full degradation)
arXiv Detail & Related papers (2022-06-08T06:34:24Z) - Deep Model-Based Super-Resolution with Non-uniform Blur [1.7188280334580197]
We propose a state-of-the-art method for super-resolution with non-uniform blur.
We first propose a fast deep plug-and-play algorithm, based on linearized ADMM splitting techniques.
We unfold our iterative algorithm into a single network and train it end-to-end.
arXiv Detail & Related papers (2022-04-21T13:57:21Z) - Normalizing Flow as a Flexible Fidelity Objective for Photo-Realistic
Super-resolution [161.39504409401354]
Super-resolution is an ill-posed problem, where a ground-truth high-resolution image represents only one possibility in the space of plausible solutions.
Yet, the dominant paradigm is to employ pixel-wise losses, such as L_, which drive the prediction towards a blurry average.
We address this issue by revisiting the L_ loss and show that it corresponds to a one-layer conditional flow.
Inspired by this relation, we explore general flows as a fidelity-based alternative to the L_ objective.
We demonstrate that the flexibility of deeper flows leads to better visual quality and consistency when combined with adversarial losses.
arXiv Detail & Related papers (2021-11-05T17:56:51Z) - 1st Place Solution for ICDAR 2021 Competition on Mathematical Formula
Detection [3.600275712225597]
We present our 1st place solution for the ICDAR 2021 competition on mathematical formula detection (MFD)
The MFD task has three key challenges including a large scale span, large variation of the ratio between height and width, and rich character set and mathematical expressions.
Considering these challenges, we used Generalized Focal Loss (GFL), an anchor-free method, instead of the anchor-based method.
arXiv Detail & Related papers (2021-07-12T16:03:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.