Related papers: From MLP to NeoMLP: Leveraging Self-Attention for Neural Fields

From MLP to NeoMLP: Leveraging Self-Attention for Neural Fields

URL: http://arxiv.org/abs/2412.08731v1
Date: Wed, 11 Dec 2024 19:01:38 GMT
Title: From MLP to NeoMLP: Leveraging Self-Attention for Neural Fields
Authors: Miltiadis Kofinas, Samuele Papa, Efstratios Gavves,
Abstract summary: We develop a new type of connectionism based on hidden and scalable nodes, called NeoMLP.<n>We demonstrate the effectiveness of our method by fitting high-resolution signals, including multi-modal audio-visual data.
Score: 26.659511924272962
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural fields (NeFs) have recently emerged as a state-of-the-art method for encoding spatio-temporal signals of various modalities. Despite the success of NeFs in reconstructing individual signals, their use as representations in downstream tasks, such as classification or segmentation, is hindered by the complexity of the parameter space and its underlying symmetries, in addition to the lack of powerful and scalable conditioning mechanisms. In this work, we draw inspiration from the principles of connectionism to design a new architecture based on MLPs, which we term NeoMLP. We start from an MLP, viewed as a graph, and transform it from a multi-partite graph to a complete graph of input, hidden, and output nodes, equipped with high-dimensional features. We perform message passing on this graph and employ weight-sharing via self-attention among all the nodes. NeoMLP has a built-in mechanism for conditioning through the hidden and output nodes, which function as a set of latent codes, and as such, NeoMLP can be used straightforwardly as a conditional neural field. We demonstrate the effectiveness of our method by fitting high-resolution signals, including multi-modal audio-visual data. Furthermore, we fit datasets of neural representations, by learning instance-specific sets of latent codes using a single backbone architecture, and then use them for downstream tasks, outperforming recent state-of-the-art methods. The source code is open-sourced at https://github.com/mkofinas/neomlp.

Related papers

Aggregation-aware MLP: An Unsupervised Approach for Graph Message-passing [10.93155007218297]
"AMLP" is an unsupervised framework that shifts the paradigm from directly crafting aggregation functions to making adaptive aggregation.<n>Our approach consists of two key steps: First, we utilize a graph reconstruction that facilitates high-order grouping effects, and second, we employ a single-layer network to encode varying degrees of heterophily.
arXiv Detail & Related papers (2025-07-27T04:52:55Z)
Training MLPs on Graphs without Supervision [38.63554842214315]
We introduce SimMLP, a Self-supervised framework for learnings on graphs. SimMLP is the first-learning method that can achieve equivalence to GNNs in the optimal case. We provide a comprehensive theoretical analysis, demonstrating the equivalence between SimMLP and GNNs based on mutual information and inductive bias.
arXiv Detail & Related papers (2024-12-05T04:20:54Z)
SimMLP: Training MLPs on Graphs without Supervision [38.63554842214315]
We introduce SimMLP, a Self-supervised framework for learnings on graphs. SimMLP is the first-learning method that can achieve equivalence to GNNs in the optimal case. We provide a comprehensive theoretical analysis, demonstrating the equivalence between SimMLP and GNNs based on mutual information and inductive bias.
arXiv Detail & Related papers (2024-02-14T03:16:13Z)
SiT-MLP: A Simple MLP with Point-wise Topology Feature Learning for Skeleton-based Action Recognition [9.673505408890435]
Graph networks (GCNs) have achieved remarkable performance in skeleton-based action recognition. Previous GCN-based methods rely on elaborate human priors excessively and construct complex feature aggregation mechanisms. We propose a novel model, SiT-MLP, for skeleton-based action recognition in this work.
arXiv Detail & Related papers (2023-08-30T13:20:54Z)
Versatile Neural Processes for Learning Implicit Neural Representations [57.090658265140384]
We propose Versatile Neural Processes (VNP), which largely increases the capability of approximating functions. Specifically, we introduce a bottleneck encoder that produces fewer and informative context tokens, relieving the high computational cost. We demonstrate the effectiveness of the proposed VNP on a variety of tasks involving 1D, 2D and 3D signals.
arXiv Detail & Related papers (2023-01-21T04:08:46Z)
Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and MLPs [71.93227401463199]
This paper pinpoints the major source of GNNs' performance gain to their intrinsic capability, by introducing an intermediate model class dubbed as P(ropagational)MLP. We observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts, while being much more efficient in training.
arXiv Detail & Related papers (2022-12-18T08:17:32Z)
NOSMOG: Learning Noise-robust and Structure-aware MLPs on Graphs [41.85649409565574]
Graph Networks (GNNs) have demonstrated their efficacy in dealing with non-Euclidean structural data. Existing methods attempt to address this scalability issue by training multi-layer perceptrons (MLPs) exclusively on node content features. In this paper, we propose to learn NOise-robust Structure-awares On Graphs (NOSMOG) to overcome the challenges.
arXiv Detail & Related papers (2022-08-22T01:47:07Z)
GraphMLP: A Graph MLP-Like Architecture for 3D Human Pose Estimation [68.65764751482774]
GraphMLP is a global-local-graphical unified architecture for 3D human pose estimation. It incorporates the graph structure of human bodies into a model to meet the domain-specific demand of the 3D human pose. It can be extended to model complex temporal dynamics in a simple way with negligible computational cost gains in the sequence length.
arXiv Detail & Related papers (2022-06-13T18:59:31Z)
MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing [123.43419144051703]
We present a novel-like 3D architecture for video recognition. The results are comparable to state-of-the-art widely-used 3D CNNs and video.
arXiv Detail & Related papers (2022-06-13T16:21:33Z)
UNeXt: MLP-based Rapid Medical Image Segmentation Network [80.16644725886968]
UNet and its latest extensions like TransUNet have been the leading medical image segmentation methods in recent years. We propose UNeXt which is a Convolutional multilayer perceptron based network for image segmentation. We show that we reduce the number of parameters by 72x, decrease the computational complexity by 68x, and improve the inference speed by 10x while also obtaining better segmentation performance.
arXiv Detail & Related papers (2022-03-09T18:58:22Z)
RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality [113.1414517605892]
We propose a methodology, Locality Injection, to incorporate local priors into an FC layer. RepMLPNet is the first that seamlessly transfer to Cityscapes semantic segmentation.
arXiv Detail & Related papers (2021-12-21T10:28:17Z)
Graph-MLP: Node Classification without Message Passing in Graph [28.604893350871777]
Graph Neural Network (GNN) has been demonstrated its effectiveness in dealing with non-Euclidean structural data. Recent works have mainly focused on powerful message passing modules, however, in this paper, we show that none of the message passing modules is necessary. We propose a pure multilayer-perceptron-based framework, Graph-MLP with the supervision signal leveraging graph structure.
arXiv Detail & Related papers (2021-06-08T02:07:21Z)
Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains [69.62456877209304]
We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron to learn high-frequency functions. Results shed light on advances in computer vision and graphics that achieve state-of-the-art results.
arXiv Detail & Related papers (2020-06-18T17:59:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.