Generalized Multimodal Fusion via Poisson-Nernst-Planck Equation
- URL: http://arxiv.org/abs/2410.15475v1
- Date: Sun, 20 Oct 2024 19:15:28 GMT
- Title: Generalized Multimodal Fusion via Poisson-Nernst-Planck Equation
- Authors: Jiayu Xiong, Jing Wang, Hengjing Xiang, Jun Xue, Chen Xu, Zhouqiang Jiang,
- Abstract summary: This paper proposes a generalized multimodal fusion method (GMF) via the Poisson-Nernst-Planck (PNP) equation.
We show that the proposed GMF achieves close to the state-of-the-art (SOTA) accuracy while utilizing fewer parameters and computational resources.
- Score: 5.022049774600693
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Previous studies have highlighted significant advancements in multimodal fusion. Nevertheless, such methods often encounter challenges regarding the efficacy of feature extraction, data integrity, consistency of feature dimensions, and adaptability across various downstream tasks. This paper proposes a generalized multimodal fusion method (GMF) via the Poisson-Nernst-Planck (PNP) equation, which adeptly addresses the aforementioned issues. Theoretically, the optimization objective for traditional multimodal tasks is formulated and redefined by integrating information entropy and the flow of gradient backward step. Leveraging these theoretical insights, the PNP equation is applied to feature fusion, rethinking multimodal features through the framework of charged particles in physics and controlling their movement through dissociation, concentration, and reconstruction. Building on these theoretical foundations, GMF disassociated features which extracted by the unimodal feature extractor into modality-specific and modality-invariant subspaces, thereby reducing mutual information and subsequently lowering the entropy of downstream tasks. The identifiability of the feature's origin enables our approach to function independently as a frontend, seamlessly integrated with a simple concatenation backend, or serve as a prerequisite for other modules. Experimental results on multiple downstream tasks show that the proposed GMF achieves performance close to the state-of-the-art (SOTA) accuracy while utilizing fewer parameters and computational resources. Furthermore, by integrating GMF with advanced fusion methods, we surpass the SOTA results.
Related papers
- Self-Refining Training for Amortized Density Functional Theory [5.5541132320126945]
We propose a novel method that reduces the dependency of amortized DFT solvers on large pre-collected datasets by introducing a self-refining training strategy.<n>We derive our method as a minimization of the variational upper bound on the KL-divergence measuring the discrepancy between the generated samples and the target Boltzmann distribution defined by the ground state energy.
arXiv Detail & Related papers (2025-06-02T00:32:32Z) - Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization [66.10528870853324]
Fusing and balancing multi-modal inputs from novel sensors for dense prediction tasks is critically important.<n>One major limitation is the tendency of multi-modal frameworks to over-rely on easily learnable modalities.<n>We propose a plug-and-play regularization term based on functional entropy, which introduces no additional parameters.
arXiv Detail & Related papers (2025-05-10T12:58:15Z) - The Finite Element Neural Network Method: One Dimensional Study [0.0]
This research introduces the finite element neural network method (FENNM) within the framework of the Petrov-Galerkin method.
FENNM uses convolution operations to approximate the weighted residual of the differential equations.
This enables the integration of forcing terms and natural boundary conditions into the loss function similar to conventional finite element method (FEM) solvers.
arXiv Detail & Related papers (2025-01-21T21:39:56Z) - On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities.
The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations.
We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z) - FFHFlow: A Flow-based Variational Approach for Multi-fingered Grasp Synthesis in Real Time [19.308304984645684]
We propose exploiting a special kind of Deep Generative Model (DGM) based on Normalizing Flows (NFs)
We first observed an encouraging improvement in diversity by directly applying a single conditional NFs (cNFs) to learn a grasp distribution conditioned on the incomplete point cloud.
This motivated us to develop a novel flow-based d Deep Latent Variable Model (DLVM)
Unlike Variational Autoencoders (VAEs), the proposed DLVM counteracts typical pitfalls by leveraging two cNFs for the prior and likelihood distributions.
arXiv Detail & Related papers (2024-07-21T13:33:08Z) - Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD)
It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images.
A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z) - TransFusionOdom: Interpretable Transformer-based LiDAR-Inertial Fusion
Odometry Estimation [7.778461949427663]
We propose an end-to-end supervised Transformer-based LiDAR-Inertial fusion framework (namely TransFusionOdom) for odometry estimation.
We show different fusion approaches for homogeneous and heterogeneous modalities to address the overfitting problem.
exhaustive ablation studies evaluate different multi-modal fusion strategies to verify the performance of the proposed fusion strategy.
arXiv Detail & Related papers (2023-04-16T08:54:36Z) - D4FT: A Deep Learning Approach to Kohn-Sham Density Functional Theory [79.50644650795012]
We propose a deep learning approach to solve Kohn-Sham Density Functional Theory (KS-DFT)
We prove that such an approach has the same expressivity as the SCF method, yet reduces the computational complexity.
In addition, we show that our approach enables us to explore more complex neural-based wave functions.
arXiv Detail & Related papers (2023-03-01T10:38:10Z) - Sampling with Mollified Interaction Energy Descent [57.00583139477843]
We present a new optimization-based method for sampling called mollified interaction energy descent (MIED)
MIED minimizes a new class of energies on probability measures called mollified interaction energies (MIEs)
We show experimentally that for unconstrained sampling problems our algorithm performs on par with existing particle-based algorithms like SVGD.
arXiv Detail & Related papers (2022-10-24T16:54:18Z) - Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma
Distributions [91.63716984911278]
We introduce a novel Mixture of Normal-Inverse Gamma distributions (MoNIG) algorithm, which efficiently estimates uncertainty in principle for adaptive integration of different modalities and produces a trustworthy regression result.
Experimental results on both synthetic and different real-world data demonstrate the effectiveness and trustworthiness of our method on various multimodal regression tasks.
arXiv Detail & Related papers (2021-11-11T14:28:12Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z) - Information-theoretic Feature Selection via Tensor Decomposition and
Submodularity [38.05393186002834]
We introduce a low-rank tensor model of the joint PMF of all variables and indirect targeting as a way of mitigating complexity and maximizing the classification performance for a given number of features.
By indirectly aiming to predict the latent variable of the naive Bayes model instead of the original target variable, it is possible to formulate the feature selection problem as of a monotone submodular function subject to a cardinality constraint.
arXiv Detail & Related papers (2020-10-30T10:36:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.