Joint Forecasting of Features and Feature Motion for Dense Semantic
Future Prediction
- URL: http://arxiv.org/abs/2101.10777v1
- Date: Tue, 26 Jan 2021 13:30:44 GMT
- Title: Joint Forecasting of Features and Feature Motion for Dense Semantic
Future Prediction
- Authors: Josip \v{S}ari\'c and Sacha Vra\v{z}i\'c and Sini\v{s}a \v{S}egvi\'c
- Abstract summary: The approach consists of two modules: Feature-to-motion (F2M) and Feature-to-feature (F2F)
The compound F2MF approach decouples effects of motion from the effects of novelty in a task-agnostic manner.
We perform experiments on three dense prediction tasks: semantic segmentation, instance-level segmentation, and panoptic segmentation.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel dense semantic forecasting approach which is applicable to
a variety of architectures and tasks. The approach consists of two modules.
Feature-to-motion (F2M) module forecasts a dense deformation field which warps
past features into their future positions. Feature-to-feature (F2F) module
regresses the future features directly and is therefore able to account for
emergent scenery. The compound F2MF approach decouples effects of motion from
the effects of novelty in a task-agnostic manner. We aim to apply F2MF
forecasting to the most subsampled and the most abstract representation of a
desired single-frame model. Our implementations take advantage of deformable
convolutions and pairwise correlation coefficients across neighbouring time
instants. We perform experiments on three dense prediction tasks: semantic
segmentation, instance-level segmentation, and panoptic segmentation. The
results reveal state-of-the-art forecasting accuracy across all three
modalities on the Cityscapes dataset.
Related papers
- PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection [65.84604846389624]
We propose PointOBB-v3, a stronger single point-supervised OOD framework.
It generates pseudo rotated boxes without additional priors and incorporates support for the end-to-end paradigm.
Our method achieves an average improvement in accuracy of 3.56% in comparison to previous state-of-the-art methods.
arXiv Detail & Related papers (2025-01-23T18:18:15Z) - PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion [80.79938369319152]
We design a new pipeline coined PCF-Lift based on our Probabilis-tic Contrastive Fusion (PCF)
Our PCF-lift not only significantly outperforms the state-of-the-art methods on widely used benchmarks including the ScanNet dataset and the Messy Room dataset (4.4% improvement of scene-level PQ)
arXiv Detail & Related papers (2024-10-14T16:06:59Z) - SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers [57.46911575980854]
We introduce SkelFormer, a novel markerless motion capture pipeline for multi-view human pose and shape estimation.
Our method first uses off-the-shelf 2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain 3D joint positions.
Next, we design a regression-based inverse-kinematic skeletal transformer that maps the joint positions to pose and shape representations from heavily noisy observations.
arXiv Detail & Related papers (2024-04-19T04:51:18Z) - FFINet: Future Feedback Interaction Network for Motion Forecasting [46.247396728154904]
We propose a novel Future Feedback Interaction Network (FFINet) to aggregate features the current observations and potential future interactions for trajectory prediction.
Our FFINet achieves the state-of-the-art performance on Argoverse 1 and Argoverse 2 motion forecasting benchmarks.
arXiv Detail & Related papers (2023-11-08T07:57:29Z) - Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and
Motion Estimation [49.56131393810713]
We present an SE(3) equivariant architecture and a training strategy to tackle this task in an unsupervised manner.
Our method excels in both model performance and computational efficiency, with only 0.25M parameters and 0.92G FLOPs.
arXiv Detail & Related papers (2023-06-08T22:55:32Z) - Video Semantic Segmentation with Inter-Frame Feature Fusion and
Inner-Frame Feature Refinement [39.06589186472675]
We propose a spatial-temporal fusion (STF) module to model dense pairwise relationships among multi-frame features.
Besides, we propose a novel memory-augmented refinement (MAR) module to tackle difficult predictions among semantic boundaries.
arXiv Detail & Related papers (2023-01-10T07:57:05Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view
Structure from Motion [9.294501649791016]
Two-view structure from motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM (vSLAM)
We formulate the two-view SfM problem as a maximum likelihood estimation (MLE) and solve it with the proposed framework, denoted as DeepMLE.
Our method significantly outperforms the state-of-the-art end-to-end two-view SfM approaches in accuracy and generalization capability.
arXiv Detail & Related papers (2022-10-11T15:07:25Z) - Joint Forecasting of Panoptic Segmentations with Difference Attention [72.03470153917189]
We study a new panoptic segmentation forecasting model that jointly forecasts all object instances in a scene.
We evaluate the proposed model on the Cityscapes and AIODrive datasets.
arXiv Detail & Related papers (2022-04-14T17:59:32Z) - MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term Trajectory
Prediction [28.438787700968703]
Conditional MUSE offers diverse and simultaneously more accurate predictions compared to the current state-of-the-art.
We demonstrate these assertions through a comprehensive set of experiments on nuScenes and SDD benchmarks as well as PFSD, a new synthetic dataset.
arXiv Detail & Related papers (2022-01-18T18:40:03Z) - Panoptic Segmentation Forecasting [71.75275164959953]
Our goal is to forecast the near future given a set of recent observations.
We think this ability to forecast, i.e., to anticipate, is integral for the success of autonomous agents.
We develop a two-component model: one component learns the dynamics of the background stuff by anticipating odometry, the other one anticipates the dynamics of detected things.
arXiv Detail & Related papers (2021-04-08T17:59:16Z) - Multi-Person Pose Estimation with Enhanced Feature Aggregation and
Selection [33.15192824888279]
We propose a novel Enhanced Feature Aggregation and Selection network (EFASNet) for multi-person 2D human pose estimation.
Our method can well handle crowded, cluttered and occluded scenes.
Comprehensive experiments demonstrate that the proposed approach outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2020-03-20T08:33:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.