Vision Mamba for Permeability Prediction of Porous Media
- URL: http://arxiv.org/abs/2510.14516v2
- Date: Mon, 20 Oct 2025 08:25:13 GMT
- Title: Vision Mamba for Permeability Prediction of Porous Media
- Authors: Ali Kashefi, Tapan Mukerji,
- Abstract summary: We introduce a neural network that uses Vision Mamba as its backbone for predicting the permeability of three-dimensional porous media.<n>We demonstrate in practice the aforementioned advantages of Vision Mamba over ViTs and CNNs in the permeability prediction of three-dimensional porous media.
- Score: 1.3063093054280948
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision Mamba has recently received attention as an alternative to Vision Transformers (ViTs) for image classification. The network size of Vision Mamba scales linearly with input image resolution, whereas ViTs scale quadratically, a feature that improves computational and memory efficiency. Moreover, Vision Mamba requires a significantly smaller number of trainable parameters than traditional convolutional neural networks (CNNs), and thus, they can be more memory efficient. Because of these features, we introduce, for the first time, a neural network that uses Vision Mamba as its backbone for predicting the permeability of three-dimensional porous media. We compare the performance of Vision Mamba with ViT and CNN models across multiple aspects of permeability prediction and perform an ablation study to assess the effects of its components on accuracy. We demonstrate in practice the aforementioned advantages of Vision Mamba over ViTs and CNNs in the permeability prediction of three-dimensional porous media. We make the source code publicly available to facilitate reproducibility and to enable other researchers to build on and extend this work. We believe the proposed framework has the potential to be integrated into large vision models in which Vision Mamba is used instead of ViTs.
Related papers
- VCMamba: Bridging Convolutions with Multi-Directional Mamba for Efficient Visual Representation [25.60289758013904]
Recent advances in Vision Transformers (ViTs) and State Space Models (SSMs) have challenged the dominance of Convolutional Neural Networks (CNNs) in computer vision.<n>We introduce textitVCMamba, a novel vision backbone that integrates the strengths of CNNs and multi-directional Mamba SSMs.<n>Our VCMamba-B achieves 82.6% top-1 accuracy on ImageNet-1K, surpassing PlainMamba-L3 by 0.3% with 37% fewer parameters, and outperforming Vision GNN-B by 0.3% with 64% fewer parameters.
arXiv Detail & Related papers (2025-09-04T21:32:27Z) - Training-free Token Reduction for Vision Mamba [21.451182941570394]
Vision Mamba has emerged as a strong competitor to Vision Transformers (ViTs)<n>Applying token reduction techniques for ViTs to Vision Mamba leads to significant performance degradation.<n>We propose MTR, a training-free textbfMamba textbfToken textbfReduction framework.
arXiv Detail & Related papers (2025-07-18T16:11:28Z) - MobileMamba: Lightweight Multi-Receptive Visual Mamba Network [51.33486891724516]
Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs.
We propose the MobileMamba framework, which balances efficiency and performance.
MobileMamba achieves up to 83.6% on Top-1, surpassing existing state-of-the-art methods.
arXiv Detail & Related papers (2024-11-24T18:01:05Z) - MambaVision: A Hybrid Mamba-Transformer Vision Backbone [54.965143338206644]
We propose a novel hybrid Mamba-Transformer backbone, MambaVision, specifically tailored for vision applications.<n>We show that equipping the Mamba architecture with self-attention blocks in the final layers greatly improves its capacity to capture long-range spatial dependencies.<n>For classification on the ImageNet-1K dataset, MambaVision variants achieve state-of-the-art (SOTA) performance in terms of both Top-1 accuracy and throughput.
arXiv Detail & Related papers (2024-07-10T23:02:45Z) - Vision Mamba for Classification of Breast Ultrasound Images [9.90112908284836]
Mamba-based models, VMamba and Vim, are a recent family of vision encoders that offer promising performance improvements in many computer vision tasks.
This paper compares Mamba-based models with traditional Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) using the breast ultrasound BUSI dataset and Breast Ultrasound B dataset.
arXiv Detail & Related papers (2024-07-04T00:21:47Z) - Demystify Mamba in Vision: A Linear Attention Perspective [72.93213667713493]
Mamba is an effective state space model with linear computation complexity.<n>We show that Mamba shares surprising similarities with linear attention Transformer.<n>We propose a Mamba-Inspired Linear Attention (MILA) model by incorporating the merits of these two key designs into linear attention.
arXiv Detail & Related papers (2024-05-26T15:31:09Z) - Visual Mamba: A Survey and New Outlooks [33.90213491829634]
Mamba, a recent selective structured state space model, excels in long sequence modeling.
Since January 2024, Mamba has been actively applied to diverse computer vision tasks.
This paper reviews visual Mamba approaches, analyzing over 200 papers.
arXiv Detail & Related papers (2024-04-29T16:51:30Z) - Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining [85.08169822181685]
This paper introduces a novel Mamba-based model, Swin-UMamba, designed specifically for medical image segmentation tasks.
Swin-UMamba demonstrates superior performance with a large margin compared to CNNs, ViTs, and latest Mamba-based models.
arXiv Detail & Related papers (2024-02-05T18:58:11Z) - VMamba: Visual State Space Model [98.0517369083152]
We adapt Mamba, a state-space language model, into VMamba, a vision backbone with linear time complexity.<n>At the core of VMamba is a stack of Visual State-Space (VSS) blocks with the 2D Selective Scan (SS2D) module.
arXiv Detail & Related papers (2024-01-18T17:55:39Z) - Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model [48.233300343211205]
We propose a new generic vision backbone with bidirectional Mamba blocks (Vim)
Vim marks the image sequences with position embeddings and compresses the visual representation with bidirectional state space models.
The results demonstrate that Vim is capable of overcoming the computation & memory constraints on performing Transformer-style understanding for high-resolution images.
arXiv Detail & Related papers (2024-01-17T18:56:18Z) - Vision Permutator: A Permutable MLP-Like Architecture for Visual
Recognition [185.80889967154963]
We present Vision Permutator, a conceptually simple and data efficient-like architecture for visual recognition.
By realizing the importance of the positional information carried by 2D feature representations, Vision Permutator encodes the feature representations along the height and width dimensions with linear projections.
We show that our Vision Permutators are formidable competitors to convolutional neural networks (CNNs) and vision transformers.
arXiv Detail & Related papers (2021-06-23T13:05:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.