Two-level Data Augmentation for Calibrated Multi-view Detection
- URL: http://arxiv.org/abs/2210.10756v1
- Date: Wed, 19 Oct 2022 17:55:13 GMT
- Title: Two-level Data Augmentation for Calibrated Multi-view Detection
- Authors: Martin Engilberge, Haixin Shi, Zhiye Wang, Pascal Fua
- Abstract summary: We introduce a new multi-view data augmentation pipeline that preserves alignment among views.
We also propose a second level of augmentation applied directly at the scene level.
When combined with our simple multi-view detection model, our two-level augmentation pipeline outperforms all existing baselines.
- Score: 51.5746691103591
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Data augmentation has proven its usefulness to improve model generalization
and performance. While it is commonly applied in computer vision application
when it comes to multi-view systems, it is rarely used. Indeed geometric data
augmentation can break the alignment among views. This is problematic since
multi-view data tend to be scarce and it is expensive to annotate. In this work
we propose to solve this issue by introducing a new multi-view data
augmentation pipeline that preserves alignment among views. Additionally to
traditional augmentation of the input image we also propose a second level of
augmentation applied directly at the scene level. When combined with our simple
multi-view detection model, our two-level augmentation pipeline outperforms all
existing baselines by a significant margin on the two main multi-view
multi-person detection datasets WILDTRACK and MultiviewX.
Related papers
- Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - Hypergraph-based Multi-View Action Recognition using Event Cameras [20.965606424362726]
We introduce HyperMV, a multi-view event-based action recognition framework.
We present the largest multi-view event-based action dataset $textTHUtextMV-EACTtext-50$, comprising 50 actions from 6 viewpoints.
Experimental results show that HyperMV significantly outperforms baselines in both cross-subject and cross-view scenarios.
arXiv Detail & Related papers (2024-03-28T11:17:00Z) - Debunking Free Fusion Myth: Online Multi-view Anomaly Detection with
Disentangled Product-of-Experts Modeling [25.02446577349165]
Multi-view or even multi-modal data is appealing yet challenging for real-world applications.
We propose dPoE, a novel multi-view variational autoencoder model that involves (1) a Product-of-Experts layer in tackling multi-view data, (2) a Total Correction discriminator in disentangling view-common and view-specific representations, and (3) a joint loss function in wrapping up all components.
arXiv Detail & Related papers (2023-10-28T15:14:43Z) - Hierarchical Mutual Information Analysis: Towards Multi-view Clustering
in The Wild [9.380271109354474]
This work proposes a deep MVC framework where data recovery and alignment are fused in a hierarchically consistent way to maximize the mutual information among different views.
To the best of our knowledge, this could be the first successful attempt to handle the missing and unaligned data problem separately with different learning paradigms.
arXiv Detail & Related papers (2023-10-28T06:43:57Z) - Multi-view Fuzzy Representation Learning with Rules based Model [25.997490574254172]
Unsupervised multi-view representation learning has been extensively studied for mining multi-view data.
This paper proposes a new multi-view fuzzy representation learning method based on the interpretable Takagi-Sugeno-Kang fuzzy system (MVRL_FS)
arXiv Detail & Related papers (2023-09-20T17:13:15Z) - Credible Remote Sensing Scene Classification Using Evidential Fusion on
Aerial-Ground Dual-view Images [6.817740582240199]
Multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks.
The issue of data quality becomes more apparent, limiting the potential benefits of multi-view data.
Deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification.
arXiv Detail & Related papers (2023-01-02T12:27:55Z) - Cross-view Graph Contrastive Representation Learning on Partially
Aligned Multi-view Data [52.491074276133325]
Multi-view representation learning has developed rapidly over the past decades and has been applied in many fields.
We propose a new cross-view graph contrastive learning framework, which integrates multi-view information to align data and learn latent representations.
Experiments conducted on several real datasets demonstrate the effectiveness of the proposed method on the clustering and classification tasks.
arXiv Detail & Related papers (2022-11-08T09:19:32Z) - Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with
Transformers [115.90778814368703]
Our objective is language-based search of large-scale image and video datasets.
For this task, the approach that consists of independently mapping text and vision to a joint embedding space, a.k.a. dual encoders, is attractive as retrieval scales.
An alternative approach of using vision-text transformers with cross-attention gives considerable improvements in accuracy over the joint embeddings.
arXiv Detail & Related papers (2021-03-30T17:57:08Z) - Multiview Detection with Feature Perspective Transformation [59.34619548026885]
We propose a novel multiview detection system, MVDet.
We take an anchor-free approach to aggregate multiview information by projecting feature maps onto the ground plane.
Our entire model is end-to-end learnable and achieves 88.2% MODA on the standard Wildtrack dataset.
arXiv Detail & Related papers (2020-07-14T17:58:30Z) - Generative Partial Multi-View Clustering [133.36721417531734]
We propose a generative partial multi-view clustering model, named as GP-MVC, to address the incomplete multi-view problem.
First, multi-view encoder networks are trained to learn common low-dimensional representations, followed by a clustering layer to capture the consistent cluster structure across multiple views.
Second, view-specific generative adversarial networks are developed to generate the missing data of one view conditioning on the shared representation given by other views.
arXiv Detail & Related papers (2020-03-29T17:48:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.