CCMR: High Resolution Optical Flow Estimation via Coarse-to-Fine
Context-Guided Motion Reasoning
- URL: http://arxiv.org/abs/2311.02661v1
- Date: Sun, 5 Nov 2023 14:14:24 GMT
- Title: CCMR: High Resolution Optical Flow Estimation via Coarse-to-Fine
Context-Guided Motion Reasoning
- Authors: Azin Jahedi, Maximilian Luz, Marc Rivinius, Andr\'es Bruhn
- Abstract summary: We propose CCMR: a high-resolution coarse-to-fine approach that leverages attention-based motion grouping concepts to multi-scale optical flow estimation.
CCMR relies on a hierarchical two-step attention-based context-motion grouping strategy.
Experiments and ablations demonstrate that our efforts of combining multi-scale and attention-based concepts pay off.
- Score: 1.0855602842179624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attention-based motion aggregation concepts have recently shown their
usefulness in optical flow estimation, in particular when it comes to handling
occluded regions. However, due to their complexity, such concepts have been
mainly restricted to coarse-resolution single-scale approaches that fail to
provide the detailed outcome of high-resolution multi-scale networks. In this
paper, we hence propose CCMR: a high-resolution coarse-to-fine approach that
leverages attention-based motion grouping concepts to multi-scale optical flow
estimation. CCMR relies on a hierarchical two-step attention-based
context-motion grouping strategy that first computes global multi-scale context
features and then uses them to guide the actual motion grouping. As we iterate
both steps over all coarse-to-fine scales, we adapt cross covariance image
transformers to allow for an efficient realization while maintaining
scale-dependent properties. Experiments and ablations demonstrate that our
efforts of combining multi-scale and attention-based concepts pay off. By
providing highly detailed flow fields with strong improvements in both occluded
and non-occluded regions, our CCMR approach not only outperforms both the
corresponding single-scale attention-based and multi-scale attention-free
baselines by up to 23.0% and 21.6%, respectively, it also achieves
state-of-the-art results, ranking first on KITTI 2015 and second on MPI Sintel
Clean and Final. Code and trained models are available at
https://github.com/cv-stuttgart /CCMR.
Related papers
- Fast Disentangled Slim Tensor Learning for Multi-view Clustering [28.950845031752927]
We propose a new approach termed fast Disdentangle Slim Learning (DSTL) for multi-view clustering.
To alleviate the negative influence of feature redundancy, inspired by robust PCA, DSTL disentangles the latent low-dimensional representation into a semantic-unrelated part and a semantic-related part for each view.
Our proposed model is computationally efficient and can be solved effectively.
arXiv Detail & Related papers (2024-11-12T09:57:53Z) - Unleashing Network Potentials for Semantic Scene Completion [50.95486458217653]
This paper proposes a novel SSC framework - Adrial Modality Modulation Network (AMMNet)
AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition.
Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin.
arXiv Detail & Related papers (2024-03-12T11:48:49Z) - GA-HQS: MRI reconstruction via a generically accelerated unfolding
approach [14.988694941405575]
We propose a Generically Accelerated Half-Quadratic Splitting (GA-HQS) algorithm that incorporates second-order gradient information and pyramid attention modules for the delicate fusion of inputs at the pixel level.
Our method surpasses previous ones on single-coil MRI acceleration tasks.
arXiv Detail & Related papers (2023-04-06T06:21:18Z) - Federated Representation Learning via Maximal Coding Rate Reduction [109.26332878050374]
We propose a methodology to learn low-dimensional representations from a dataset that is distributed among several clients.
Our proposed method, which we refer to as FLOW, utilizes MCR2 as the objective of choice, hence resulting in representations that are both between-class discriminative and within-class compressible.
arXiv Detail & Related papers (2022-10-01T15:43:51Z) - EMC2A-Net: An Efficient Multibranch Cross-channel Attention Network for
SAR Target Classification [10.479559839534033]
This paper proposed two residual blocks, namely EMC2A blocks with multiscale receptive fields(RFs), based on a multibranch structure and then designed an efficient isotopic architecture deep CNN (DCNN), EMC2A-Net.
EMC2A blocks utilize parallel dilated convolution with different dilation rates, which can effectively capture multiscale context features without significantly increasing the computational burden.
This paper proposed a multiscale feature cross-channel attention module, namely the EMC2A module, adopting a local multiscale feature interaction strategy without dimensionality reduction.
arXiv Detail & Related papers (2022-08-03T04:31:52Z) - Optical-Flow-Reuse-Based Bidirectional Recurrent Network for Space-Time
Video Super-Resolution [52.899234731501075]
Space-time video super-resolution (ST-VSR) simultaneously increases the spatial resolution and frame rate for a given video.
Existing methods typically suffer from difficulties in how to efficiently leverage information from a large range of neighboring frames.
We propose a coarse-to-fine bidirectional recurrent neural network instead of using ConvLSTM to leverage knowledge between adjacent frames.
arXiv Detail & Related papers (2021-10-13T15:21:30Z) - Cross-modal Consensus Network for Weakly Supervised Temporal Action
Localization [74.34699679568818]
Weakly supervised temporal action localization (WS-TAL) is a challenging task that aims to localize action instances in the given video with video-level categorical supervision.
We propose a cross-modal consensus network (CO2-Net) to tackle this problem.
arXiv Detail & Related papers (2021-07-27T04:21:01Z) - Learning to Estimate Hidden Motions with Global Motion Aggregation [71.12650817490318]
Occlusions pose a significant challenge to optical flow algorithms that rely on local evidences.
We introduce a global motion aggregation module to find long-range dependencies between pixels in the first image.
We demonstrate that the optical flow estimates in the occluded regions can be significantly improved without damaging the performance in non-occluded regions.
arXiv Detail & Related papers (2021-04-06T10:32:03Z) - Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy.
We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR.
Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z) - Hybrid Multiple Attention Network for Semantic Segmentation in Aerial
Images [24.35779077001839]
We propose a novel attention-based framework named Hybrid Multiple Attention Network (HMANet) to adaptively capture global correlations.
We introduce a simple yet effective region shuffle attention (RSA) module to reduce feature redundant and improve the efficiency of self-attention mechanism.
arXiv Detail & Related papers (2020-01-09T07:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.