Spatial Mixture-of-Experts
- URL: http://arxiv.org/abs/2211.13491v1
- Date: Thu, 24 Nov 2022 09:31:02 GMT
- Title: Spatial Mixture-of-Experts
- Authors: Nikoli Dryden and Torsten Hoefler
- Abstract summary: We introduce the Spatial Mixture-of-Experts layer, which learns spatial structure in the input domain and routes experts at a fine-grained level to utilize it.
We show strong results for SMoEs on numerous tasks, and set new results for medium-range weather prediction and post-processing ensemble weather forecasts.
- Score: 16.71096722340687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many data have an underlying dependence on spatial location; it may be
weather on the Earth, a simulation on a mesh, or a registered image. Yet this
feature is rarely taken advantage of, and violates common assumptions made by
many neural network layers, such as translation equivariance. Further, many
works that do incorporate locality fail to capture fine-grained structure. To
address this, we introduce the Spatial Mixture-of-Experts (SMoE) layer, a
sparsely-gated layer that learns spatial structure in the input domain and
routes experts at a fine-grained level to utilize it. We also develop new
techniques to train SMoEs, including a self-supervised routing loss and damping
expert errors. Finally, we show strong results for SMoEs on numerous tasks, and
set new state-of-the-art results for medium-range weather prediction and
post-processing ensemble weather forecasts.
Related papers
- RPMixer: Shaking Up Time Series Forecasting with Random Projections for Large Spatial-Temporal Data [33.0546525587517]
We propose a all-Multi-Layer Perceptron (all-MLP) time series forecasting architecture called RPMixer.
Our method capitalizes on the ensemble-like behavior of deep neural networks, where each individual block behaves like a base learner in an ensemble model.
arXiv Detail & Related papers (2024-02-16T07:28:59Z) - CMG-Net: Robust Normal Estimation for Point Clouds via Chamfer Normal
Distance and Multi-scale Geometry [23.86650228464599]
This work presents an accurate and robust method for estimating normals from point clouds.
We first propose a new metric termed Chamfer Normal Distance to address this issue.
We devise an innovative architecture that encompasses Multi-scale Local Feature Aggregation and Hierarchical Geometric Information Fusion.
arXiv Detail & Related papers (2023-12-14T17:23:16Z) - Learning Robust Precipitation Forecaster by Temporal Frame Interpolation [65.5045412005064]
We develop a robust precipitation forecasting model that demonstrates resilience against spatial-temporal discrepancies.
Our approach has led to significant improvements in forecasting precision, culminating in our model securing textit1st place in the transfer learning leaderboard of the textitWeather4cast'23 competition.
arXiv Detail & Related papers (2023-11-30T08:22:08Z) - Contextualizing MLP-Mixers Spatiotemporally for Urban Data Forecast at Scale [54.15522908057831]
We propose an adapted version of the computationally-Mixer for STTD forecast at scale.
Our results surprisingly show that this simple-yeteffective solution can rival SOTA baselines when tested on several traffic benchmarks.
Our findings contribute to the exploration of simple-yet-effective models for real-world STTD forecasting.
arXiv Detail & Related papers (2023-07-04T05:19:19Z) - Exploring the Application of Large-scale Pre-trained Models on Adverse
Weather Removal [97.53040662243768]
We propose a CLIP embedding module to make the network handle different weather conditions adaptively.
This module integrates the sample specific weather prior extracted by CLIP image encoder together with the distribution specific information learned by a set of parameters.
arXiv Detail & Related papers (2023-06-15T10:06:13Z) - SARN: Structurally-Aware Recurrent Network for Spatio-Temporal Disaggregation [8.636014676778682]
Open data is frequently released spatially aggregated, usually to comply with privacy policies. But coarse, heterogeneous aggregations complicate coherent learning and integration for downstream AI/ML systems.
We propose an overarching model named Structurally-Aware Recurrent Network (SARN), which integrates structurally-aware spatial attention layers into the Gated Recurrent Unit (GRU) model.
For scenarios with limited historical training data, we show that a model pre-trained on one city variable can be fine-tuned for another city variable using only a few hundred samples.
arXiv Detail & Related papers (2023-06-09T21:01:29Z) - Semi-signed neural fitting for surface reconstruction from unoriented
point clouds [53.379712818791894]
We propose SSN-Fitting to reconstruct a better signed distance field.
SSN-Fitting consists of a semi-signed supervision and a loss-based region sampling strategy.
We conduct experiments to demonstrate that SSN-Fitting achieves state-of-the-art performance under different settings.
arXiv Detail & Related papers (2022-06-14T09:40:17Z) - Sparsely-gated Mixture-of-Expert Layers for CNN Interpretability [3.021134753248103]
Sparsely-gated Mixture of Expert (MoE) layers have been successfully applied for scaling large transformers.
In this work, we apply sparse MoE layers to CNNs for computer vision tasks and analyze the resulting effect on model interpretability.
arXiv Detail & Related papers (2022-04-22T09:40:23Z) - Exploiting latent representation of sparse semantic layers for improved
short-term motion prediction with Capsule Networks [0.12183405753834559]
This paper explores use of Capsule Networks (CapsNets) in the context of learning a hierarchical representation of sparse semantic layers corresponding to small regions of the High-Definition (HD) map.
By using an architecture based on CapsNets the model is able to retain hierarchical relationships between detected features within images whilst also preventing loss of spatial data often caused by the pooling operation.
We show that our model achieves significant improvement over recently published works on prediction, whilst drastically reducing the overall size of the network.
arXiv Detail & Related papers (2021-03-02T11:13:43Z) - Unsupervised Monocular Depth Learning with Integrated Intrinsics and
Spatio-Temporal Constraints [61.46323213702369]
This work presents an unsupervised learning framework that is able to predict at-scale depth maps and egomotion.
Our results demonstrate strong performance when compared to the current state-of-the-art on multiple sequences of the KITTI driving dataset.
arXiv Detail & Related papers (2020-11-02T22:26:58Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.