Related papers: Deep Learning for Robust Motion Segmentation with Non-Static Cameras

Deep Learning for Robust Motion Segmentation with Non-Static Cameras

URL: http://arxiv.org/abs/2102.10929v1
Date: Mon, 22 Feb 2021 11:58:41 GMT
Title: Deep Learning for Robust Motion Segmentation with Non-Static Cameras
Authors: Markus Bosch
Abstract summary: This paper proposes a new end-to-end DCNN based approach for motion segmentation, especially for captured with such non-static cameras, called MOSNET. While other approaches focus on spatial or temporal context, the proposed approach uses 3D convolutions as a key technology to factor in temporal features in video frames. The network is able to perform well on scenes captured with non-static cameras where the image content changes significantly during the scene.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work proposes a new end-to-end DCNN based approach for motion segmentation, especially for video sequences captured with such non-static cameras, called MOSNET. While other approaches focus on spatial or temporal context only, the proposed approach uses 3D convolutions as a key technology to factor in, spatio-temporal features in cohesive video frames. This is done by capturing temporal information in features with a low and also with a high level of abstraction. The lean network architecture with about 21k trainable parameters is mainly based on a pre-trained VGG-16 network. The MOSNET uses a new feature map fusion technique, which enables the network to focus on the appropriate level of abstraction, resolution, and the appropriate size of the receptive field regarding the input. Furthermore, the end-to-end deep learning based approach can be extended by feature based image alignment as a pre-processing step, which brings a gain in performance for some scenes. Evaluating the end-to-end deep learning based MOSNET network in a scene independent manner leads to an overall F-measure of 0.803 on the CDNet2014 dataset. A small temporal window of five input frames, without the need of any initialization is used to obtain this result. Therefore the network is able to perform well on scenes captured with non-static cameras where the image content changes significantly during the scene. In order to get robust results in scenes captured with a moving camera, feature based image alignment can implemented as pre-processing step. The MOSNET combined with pre-processing leads to an F-measure of 0.685 when cross-evaluating with a relabeled LASIESTA dataset, which underpins the capability generalise of the MOSNET.

Related papers

Image-Goal Navigation Using Refined Feature Guidance and Scene Graph Enhancement [28.716326030924474]
In this paper, we introduce a novel image-goal navigation approach, named RFSG. Our focus lies in leveraging the fine-grained connections between goals, observations, and the environment within limited image data. We propose the spatial-channel attention mechanism, enabling the network to learn the importance of multi-dimensional features to fuse the goal and observation features.
arXiv Detail & Related papers (2025-03-14T01:15:24Z)
Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning [86.99944014645322]
We introduce a novel framework, Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning. We decompose each query image into its high-frequency and low-frequency components, and parallel incorporate them into the feature embedding network. Our framework establishes new state-of-the-art results on multiple cross-domain few-shot learning benchmarks.
arXiv Detail & Related papers (2024-11-03T04:02:35Z)
KRONC: Keypoint-based Robust Camera Optimization for 3D Car Reconstruction [58.04846444985808]
This paper introduces KRONC, a novel approach aimed at inferring view poses by leveraging prior knowledge about the object to reconstruct and its representation through semantic keypoints. With a focus on vehicle scenes, KRONC is able to estimate the position of the views as a solution to a light optimization problem targeting the convergence of keypoints' back-projections to a singular point.
arXiv Detail & Related papers (2024-09-09T08:08:05Z)
Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view. Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks. Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z)
Cross-Attention Transformer for Video Interpolation [3.5317804902980527]
TAIN (Transformers and Attention for video INterpolation) aims to interpolate an intermediate frame given two consecutive image frames around it. We first present a novel visual transformer module, named Cross-Similarity (CS), to globally aggregate input image features with similar appearance as those of the predicted frame. To account for occlusions in the CS features, we propose an Image Attention (IA) module to allow the network to focus on CS features from one frame over those of the other.
arXiv Detail & Related papers (2022-07-08T21:38:54Z)
RADNet: A Deep Neural Network Model for Robust Perception in Moving Autonomous Systems [8.706086688708014]
We develop a novel ranking method to rank videos based on the degree of global camera motion. For the high ranking camera videos we show that the accuracy of action detection is decreased. We propose an action detection pipeline that is robust to the camera motion effect and verify it empirically.
arXiv Detail & Related papers (2022-04-30T23:14:08Z)
Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image. The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model. We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z)
Feature Flow: In-network Feature Flow Estimation for Video Object Detection [56.80974623192569]
Optical flow is widely used in computer vision tasks to provide pixel-level motion information. A common approach is to:forward optical flow to a neural network and fine-tune this network on the task dataset. We propose a novel network (IFF-Net) with an textbfIn-network textbfFeature textbfFlow estimation module for video object detection.
arXiv Detail & Related papers (2020-09-21T07:55:50Z)
An End-to-end Framework For Low-Resolution Remote Sensing Semantic Segmentation [0.5076419064097732]
We propose an end-to-end framework that unites a super-resolution and a semantic segmentation module. It allows the semantic segmentation network to conduct the reconstruction process, modifying the input image with helpful textures. The results show that the framework is capable of achieving a semantic segmentation performance close to native high-resolution data.
arXiv Detail & Related papers (2020-03-17T21:41:22Z)
Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes. The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
FusionLane: Multi-Sensor Fusion for Lane Marking Semantic Segmentation Using Deep Neural Networks [1.0062127381149395]
This paper proposes a lane marking semantic segmentation method based on LIDAR and camera fusion deep neural network. Experiments on more than 14,000 image datasets have shown the proposed method has better performance on the semantic segmentation of the points cloud bird's eye view.
arXiv Detail & Related papers (2020-03-09T20:33:30Z)
Medical Image Segmentation via Unsupervised Convolutional Neural Network [1.6396833577035679]
We present a novel learning-based segmentation model that could be trained semi- or un- supervised. We parameterize the Active contour without edges (ACWE) framework via a convolutional neural network (ConvNet) We show that the method provides fast and high-quality bone segmentation in the context of single-photon emission computed tomography (SPECT) image.
arXiv Detail & Related papers (2020-01-28T03:56:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.