From Chaos Comes Order: Ordering Event Representations for Object
Recognition and Detection
- URL: http://arxiv.org/abs/2304.13455v4
- Date: Wed, 30 Aug 2023 19:44:41 GMT
- Title: From Chaos Comes Order: Ordering Event Representations for Object
Recognition and Detection
- Authors: Nikola Zubi\'c, Daniel Gehrig, Mathias Gehrig, Davide Scaramuzza
- Abstract summary: We show how to select the appropriate representation for a task based on the Gromov-Wasserstein Discrepancy (GWD) between raw events and their representation.
It is about 200 times faster to compute than training a neural network and preserves the task performance ranking of event representations.
Our optimized representations outperform existing representations by 1.7 mAP on the 1 Mpx dataset and 0.3 mAP on the Gen1 dataset, two established object detection benchmarks, and reach a 3.8% higher classification score on the mini N-ImageNet benchmark.
- Score: 29.653946064645705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Today, state-of-the-art deep neural networks that process events first
convert them into dense, grid-like input representations before using an
off-the-shelf network. However, selecting the appropriate representation for
the task traditionally requires training a neural network for each
representation and selecting the best one based on the validation score, which
is very time-consuming. This work eliminates this bottleneck by selecting
representations based on the Gromov-Wasserstein Discrepancy (GWD) between raw
events and their representation. It is about 200 times faster to compute than
training a neural network and preserves the task performance ranking of event
representations across multiple representations, network backbones, datasets,
and tasks. Thus finding representations with high task scores is equivalent to
finding representations with a low GWD. We use this insight to, for the first
time, perform a hyperparameter search on a large family of event
representations, revealing new and powerful representations that exceed the
state-of-the-art. Our optimized representations outperform existing
representations by 1.7 mAP on the 1 Mpx dataset and 0.3 mAP on the Gen1
dataset, two established object detection benchmarks, and reach a 3.8% higher
classification score on the mini N-ImageNet benchmark. Moreover, we outperform
state-of-the-art by 2.1 mAP on Gen1 and state-of-the-art feed-forward methods
by 6.0 mAP on the 1 Mpx datasets. This work opens a new unexplored field of
explicit representation optimization for event-based learning.
Related papers
- Neural Clustering based Visual Representation Learning [61.72646814537163]
Clustering is one of the most classic approaches in machine learning and data analysis.
We propose feature extraction with clustering (FEC), which views feature extraction as a process of selecting representatives from data.
FEC alternates between grouping pixels into individual clusters to abstract representatives and updating the deep features of pixels with current representatives.
arXiv Detail & Related papers (2024-03-26T06:04:50Z) - Learning to Generate Parameters of ConvNets for Unseen Image Data [36.68392191824203]
ConvNets depend heavily on large amounts of image data and resort to an iterative optimization algorithm to learn network parameters.
We propose a new training paradigm and formulate the parameter learning of ConvNets into a prediction task.
We show that our proposed method achieves good efficacy for unseen image datasets on two kinds of settings.
arXiv Detail & Related papers (2023-10-18T10:26:18Z) - Dynamic Focus-aware Positional Queries for Semantic Segmentation [94.6834904076914]
We propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries.
Our framework achieves SOTA performance and outperforms Mask2former by clear margins of 1.1%, 1.9%, and 1.1% single-scale mIoU with ResNet-50, Swin-T, and Swin-B backbones.
arXiv Detail & Related papers (2022-04-04T05:16:41Z) - ES-ImageNet: A Million Event-Stream Classification Dataset for Spiking
Neural Networks [12.136368750042688]
We propose a fast and effective algorithm termed Omnidirectional Discrete Gradient (ODG) to convert the popular computer vision dataset ILSVRC2012 into its event-stream (ES) version.
In this way, we propose ES-ImageNet, which is dozens of times larger than other neuromorphic classification datasets at present and completely generated by the software.
arXiv Detail & Related papers (2021-10-23T12:56:23Z) - Single Object Tracking through a Fast and Effective Single-Multiple
Model Convolutional Neural Network [0.0]
Recent state-of-the-art (SOTA) approaches are proposed based on taking a matching network with a heavy structure to distinguish the target from other objects in the area.
In this article, a special architecture is proposed based on which in contrast to the previous approaches, it is possible to identify the object location in a single shot.
The presented tracker performs comparatively with the SOTA in challenging situations while having a super speed compared to them (up to $120 FPS$ on 1080ti)
arXiv Detail & Related papers (2021-03-28T11:02:14Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z) - CRNet: Cross-Reference Networks for Few-Shot Segmentation [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images.
With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images.
Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-03-24T04:55:43Z) - Selecting Relevant Features from a Multi-domain Representation for
Few-shot Classification [91.67977602992657]
We propose a new strategy based on feature selection, which is both simpler and more effective than previous feature adaptation approaches.
We show that a simple non-parametric classifier built on top of such features produces high accuracy and generalizes to domains never seen during training.
arXiv Detail & Related papers (2020-03-20T15:44:17Z) - DHOG: Deep Hierarchical Object Grouping [0.0]
We show that greedy or local methods of maximising mutual information (such as gradient optimisation) discover local optima of the mutual information criterion.
We introduce deep hierarchical object grouping (DHOG) that computes a number distinct discrete representations of images in a hierarchical order.
We find that these representations align better with the downstream task of grouping into underlying object classes.
arXiv Detail & Related papers (2020-03-13T14:11:48Z) - R-FCN: Object Detection via Region-based Fully Convolutional Networks [87.62557357527861]
We present region-based, fully convolutional networks for accurate and efficient object detection.
Our result is achieved at a test-time speed of 170ms per image, 2.5-20x faster than the Faster R-CNN counterpart.
arXiv Detail & Related papers (2016-05-20T15:50:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.