Issues in Object Detection in Videos using Common Single-Image CNNs
- URL: http://arxiv.org/abs/2105.12822v1
- Date: Wed, 26 May 2021 20:33:51 GMT
- Title: Issues in Object Detection in Videos using Common Single-Image CNNs
- Authors: Spencer Ploeger and Lucas Dasovic
- Abstract summary: Object detection is used in many applications such as industrial process, medical imaging analysis, and autonomous vehicles.
For applications such as autonomous vehicles, it is crucial that the object detection system can identify objects through multiple frames in video.
There are many neural networks that have been used for object detection and if there was a way of connecting objects between frames then these problems could be eliminated.
A dataset must be created with images that represent consecutive video frames and have matching ground-truth layers.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A growing branch of computer vision is object detection. Object detection is
used in many applications such as industrial process, medical imaging analysis,
and autonomous vehicles. The ability to detect objects in videos is crucial.
Object detection systems are trained on large image datasets. For applications
such as autonomous vehicles, it is crucial that the object detection system can
identify objects through multiple frames in video. There are many problems with
applying these systems to video. Shadows or changes in brightness that can
cause the system to incorrectly identify objects frame to frame and cause an
unintended system response. There are many neural networks that have been used
for object detection and if there was a way of connecting objects between
frames then these problems could be eliminated. For these neural networks to
get better at identifying objects in video, they need to be re-trained. A
dataset must be created with images that represent consecutive video frames and
have matching ground-truth layers. A method is proposed that can generate these
datasets. The ground-truth layer contains only moving objects. To generate this
layer, FlowNet2-Pytorch was used to create the flow mask using the novel
Magnitude Method. As well, a segmentation mask will be generated using networks
such as Mask R-CNN or Refinenet. These segmentation masks will contain all
objects detected in a frame. By comparing this segmentation mask to the flow
mask ground-truth layer, a loss function is generated. This loss function can
be used to train a neural network to be better at making consistent predictions
on video. The system was tested on multiple video samples and a loss was
generated for each frame, proving the Magnitude Method's ability to be used to
train object detection neural networks in future work.
Related papers
- Accelerating Object Detection with YOLOv4 for Real-Time Applications [0.276240219662896]
Convolutional Neural Network (CNN) have emerged as a powerful tool for recognizing image content and in computer vision approach for most problems.
This paper introduces the brief introduction of deep learning and object detection framework like Convolutional Neural Network(CNN)
arXiv Detail & Related papers (2024-10-17T17:44:57Z) - LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Follow Anything: Open-set detection, tracking, and following in
real-time [89.83421771766682]
We present a robotic system to detect, track, and follow any object in real-time.
Our approach, dubbed follow anything'' (FAn), is an open-vocabulary and multimodal model.
FAn can be deployed on a laptop with a lightweight (6-8 GB) graphics card, achieving a throughput of 6-20 frames per second.
arXiv Detail & Related papers (2023-08-10T17:57:06Z) - Building Flyweight FLIM-based CNNs with Adaptive Decoding for Object
Detection [40.97322222472642]
This work presents a method to build a Convolutional Neural Network (CNN) layer by layer for object detection from user-drawn markers.
We address the detection of Schistosomiasis mansoni eggs in microscopy images of fecal samples, and the detection of ships in satellite images.
Our CNN weighs thousands of times less than SOTA object detectors, being suitable for CPU execution and showing superior or equivalent performance to three methods in five measures.
arXiv Detail & Related papers (2023-06-26T16:48:20Z) - Application Of ADNN For Background Subtraction In Smart Surveillance
System [0.0]
We develop an intelligent video surveillance system that uses ADNN architecture for motion detection, trims the video with parts only containing motion, and performs anomaly detection on the trimmed video.
arXiv Detail & Related papers (2022-12-31T18:42:11Z) - Object Propagation via Inter-Frame Attentions for Temporally Stable
Video Instance Segmentation [51.68840525174265]
Video instance segmentation aims to detect, segment, and track objects in a video.
Current approaches extend image-level segmentation algorithms to the temporal domain.
We propose a video instance segmentation method that alleviates the problem due to missing detections.
arXiv Detail & Related papers (2021-11-15T04:15:57Z) - RICE: Refining Instance Masks in Cluttered Environments with Graph
Neural Networks [53.15260967235835]
We propose a novel framework that refines the output of such methods by utilizing a graph-based representation of instance masks.
We train deep networks capable of sampling smart perturbations to the segmentations, and a graph neural network, which can encode relations between objects, to evaluate the segmentations.
We demonstrate an application that uses uncertainty estimates generated by our method to guide a manipulator, leading to efficient understanding of cluttered scenes.
arXiv Detail & Related papers (2021-06-29T20:29:29Z) - Few-Shot Learning for Video Object Detection in a Transfer-Learning
Scheme [70.45901040613015]
We study the new problem of few-shot learning for video object detection.
We employ a transfer-learning framework to effectively train the video object detector on a large number of base-class objects and a few video clips of novel-class objects.
arXiv Detail & Related papers (2021-03-26T20:37:55Z) - Recurrent Neural Networks for video object detection [0.0]
This work compares different methods, especially those which use Recurrent Neural Networks to detect objects in videos.
We differ between feature-based methods, which feed feature maps of different frames into the recurrent units, box-level methods, which feed bounding boxes with class probabilities into the recurrent units and methods which use flow networks.
arXiv Detail & Related papers (2020-10-29T16:40:10Z) - Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks.
First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts.
Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.