Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge
- URL: http://arxiv.org/abs/2411.07834v1
- Date: Tue, 12 Nov 2024 14:36:06 GMT
- Title: Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge
- Authors: Emmanuel Azuh Mensah, Anderson Lee, Haoran Zhang, Yitong Shan, Kurtis Heimerl,
- Abstract summary: TinyML' community is actively proposing methods to save communication bandwidth and excessive cloud storage costs.
We explore similar per patch conditional computation for the first time for mobile vision transformers.
We evaluate the model on Cornell Sap Sucker Woods 60, a fine grained bird species discrimination dataset.
- Score: 13.112893692624768
- License:
- Abstract: The explosion of IoT sensors in industrial, consumer and remote sensing use cases has come with unprecedented demand for computing infrastructure to transmit and to analyze petabytes of data. Concurrently, the world is slowly shifting its focus towards more sustainable computing. For these reasons, there has been a recent effort to reduce the footprint of related computing infrastructure, especially by deep learning algorithms, for advanced insight generation. The `TinyML' community is actively proposing methods to save communication bandwidth and excessive cloud storage costs while reducing algorithm inference latency and promoting data privacy. Such proposed approaches should ideally process multiple types of data, including time series, audio, satellite images, and video, near the network edge as multiple data streams has been shown to improve the discriminative ability of learning algorithms, especially for generating fine grained results. Incidentally, there has been recent work on data driven conditional computation of subnetworks that has shown real progress in using a single model to share parameters among very different types of inputs such as images and text, reducing the computation requirement of multi-tower multimodal networks. Inspired by such line of work, we explore similar per patch conditional computation for the first time for mobile vision transformers (vision only case), that will eventually be used for single-tower multimodal edge models. We evaluate the model on Cornell Sap Sucker Woods 60, a fine grained bird species discrimination dataset. Our initial experiments uses $4X$ fewer parameters compared to MobileViTV2-1.0 with a $1$% accuracy drop on the iNaturalist '21 birds test data provided as part of the SSW60 dataset.
Related papers
- An Efficient Privacy-aware Split Learning Framework for Satellite Communications [33.608696987158424]
We propose a novel framework for more efficient SL in satellite communications.
Our approach, Dynamic Topology Informed Pruning, combines differential privacy with graph and model pruning to optimize graph neural networks for distributed learning.
Our framework not only significantly improves the operational efficiency of satellite communications but also establishes a new benchmark in privacy-aware distributed learning.
arXiv Detail & Related papers (2024-09-13T04:59:35Z) - Conformal Trajectory Prediction with Multi-View Data Integration in Cooperative Driving [4.628774934971078]
Current research on trajectory prediction primarily relies on data collected by onboard sensors of an ego vehicle.
We introduce V2INet, a novel trajectory prediction framework designed to model multi-view data by extending existing single-view models.
Our results demonstrate superior performance in terms of Final Displacement Error (FDE) and Miss Rate (MR) using a single GPU.
arXiv Detail & Related papers (2024-08-01T08:32:03Z) - Quanv4EO: Empowering Earth Observation by means of Quanvolutional Neural Networks [62.12107686529827]
This article highlights a significant shift towards leveraging quantum computing techniques in processing large volumes of remote sensing data.
The proposed Quanv4EO model introduces a quanvolution method for preprocessing multi-dimensional EO data.
Key findings suggest that the proposed model not only maintains high precision in image classification but also shows improvements of around 5% in EO use cases.
arXiv Detail & Related papers (2024-07-24T09:11:34Z) - Share Your Secrets for Privacy! Confidential Forecasting with Vertical Federated Learning [5.584904689846748]
Key challenges to address in manufacturing include data privacy and over-fitting on small and noisy datasets.
We propose 'Secret-shared Time Series Forecasting with VFL', a novel framework that exhibits the following key features.
Our results demonstrate that STV's forecasting accuracy is comparable to those of centralized approaches.
arXiv Detail & Related papers (2024-05-31T12:27:38Z) - Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised
Learning for Robust Infrared Small Target Detection [53.19618419772467]
Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds.
With the development of Transformer, the scale of SIRST models is constantly increasing.
With a rich diversity of infrared small target data, our algorithm significantly improves the model performance and convergence speed.
arXiv Detail & Related papers (2024-03-08T16:14:54Z) - Radio Map Estimation -- An Open Dataset with Directive Transmitter
Antennas and Initial Experiments [49.61405888107356]
We release a dataset of simulated path loss radio maps together with realistic city maps from real-world locations and aerial images from open datasources.
Initial experiments regarding model architectures, input feature design and estimation of radio maps from aerial images are presented.
arXiv Detail & Related papers (2024-01-12T14:56:45Z) - WiFi-TCN: Temporal Convolution for Human Interaction Recognition based
on WiFi signal [4.0773490083614075]
Wi-Fi based human activity recognition has gained considerable interest in recent times.
A challenge associated with Wi-Fi-based HAR is the significant decline in performance when the scene or subject changes.
We propose a novel approach that leverages a temporal convolution network with augmentations and attention, referred to as TCN-AA.
arXiv Detail & Related papers (2023-05-21T08:37:32Z) - Towards Efficient Scheduling of Federated Mobile Devices under
Computational and Statistical Heterogeneity [16.069182241512266]
This paper studies the implementation of distributed learning on mobile devices.
We use data as a tuning knob and propose two efficient-time algorithms to schedule different workloads.
Compared with the common benchmarks, the proposed algorithms achieve 2-100x speedup-wise, 2-7% accuracy gain and convergence rate by more than 100% on CIFAR10.
arXiv Detail & Related papers (2020-05-25T18:21:51Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.