FORTRESS: Function-composition Optimized Real-Time Resilient Structural Segmentation via Kolmogorov-Arnold Enhanced Spatial Attention Networks
- URL: http://arxiv.org/abs/2507.12675v1
- Date: Wed, 16 Jul 2025 23:17:58 GMT
- Title: FORTRESS: Function-composition Optimized Real-Time Resilient Structural Segmentation via Kolmogorov-Arnold Enhanced Spatial Attention Networks
- Authors: Christina Thrainer, Md Meftahul Ferdaus, Mahdi Abdelguerfi, Christian Guetl, Steven Sloan, Kendall N. Niles, Ken Pathak,
- Abstract summary: FORTRESS (Function-composition Optimized Real-Time Resilient Structural) is a new architecture that balances accuracy and speed by using a special method.<n>Fortress incorporates three key innovations: a systematic depthwise separable convolution framework, adaptive TiKAN integration, and multi-scale attention fusion.<n>The architecture achieves remarkable efficiency gains with 91% parameter reduction (31M to 2.9M), 91% computational complexity reduction (13.7 to 1.17 GFLOPs), and 3x inference speed improvement.
- Score: 1.663204995903499
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated structural defect segmentation in civil infrastructure faces a critical challenge: achieving high accuracy while maintaining computational efficiency for real-time deployment. This paper presents FORTRESS (Function-composition Optimized Real-Time Resilient Structural Segmentation), a new architecture that balances accuracy and speed by using a special method that combines depthwise separable convolutions with adaptive Kolmogorov-Arnold Network integration. FORTRESS incorporates three key innovations: a systematic depthwise separable convolution framework achieving a 3.6x parameter reduction per layer, adaptive TiKAN integration that selectively applies function composition transformations only when computationally beneficial, and multi-scale attention fusion combining spatial, channel, and KAN-enhanced features across decoder levels. The architecture achieves remarkable efficiency gains with 91% parameter reduction (31M to 2.9M), 91% computational complexity reduction (13.7 to 1.17 GFLOPs), and 3x inference speed improvement while delivering superior segmentation performance. Evaluation on benchmark infrastructure datasets demonstrates state-of-the-art results with an F1- score of 0.771 and a mean IoU of 0.677, significantly outperforming existing methods including U-Net, SA-UNet, and U- KAN. The dual optimization strategy proves essential for optimal performance, establishing FORTRESS as a robust solution for practical structural defect segmentation in resource-constrained environments where both accuracy and computational efficiency are paramount. Comprehensive architectural specifications are provided in the Supplemental Material. Source code is available at URL: https://github.com/faeyelab/fortress-paper-code.
Related papers
- EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models [21.42353501209045]
Vision-Language-Action (VLA) models demonstrate transformative potential for embodied intelligence but are severely hampered by high computational and memory demands.<n>We introduce EfficientVLA, a structured and training-free inference acceleration framework.<n>We apply our method to a standard VLA model CogACT, yielding a 1.93X inference speedup and reduces FLOPs to 28.9%, with only a 0.6% success rate drop in the SIMPLER benchmark.
arXiv Detail & Related papers (2025-06-11T18:34:57Z) - Is Architectural Complexity Overrated? Competitive and Interpretable Knowledge Graph Completion with RelatE [6.959701672059059]
RelatE is an interpretable and modular method that efficiently integrates dual representations for entities and relations.<n>It achieves competitive or superior performance on standard benchmarks.<n>Perturbation studies demonstrate improved robustness, with MRR reduced by up to 61% relative to TransE and by up to 19% compared to RotatE.
arXiv Detail & Related papers (2025-05-25T04:36:52Z) - AdaptoVision: A Multi-Resolution Image Recognition Model for Robust and Scalable Classification [0.0]
AdaptoVision is a novel convolutional neural network (CNN) architecture designed to efficiently balance computational complexity and classification accuracy.<n>By leveraging enhanced residual units, depth-wise separable convolutions, and hierarchical skip connections, AdaptoVision significantly reduces parameter count and computational requirements.<n>It achieves state-of-the-art on BreakHis dataset and comparable accuracy levels, notably 95.3% on CIFAR-10 and 85.77% on CIFAR-100, without relying on any pretrained weights.
arXiv Detail & Related papers (2025-04-17T05:23:07Z) - ZeroLM: Data-Free Transformer Architecture Search for Language Models [54.83882149157548]
Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity.<n>This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics.<n>Our evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark.
arXiv Detail & Related papers (2025-03-24T13:11:22Z) - iFlame: Interleaving Full and Linear Attention for Efficient Mesh Generation [49.8026360054331]
iFlame is a novel transformer-based network architecture for mesh generation.<n>We propose an interleaving autoregressive mesh generation framework that combines the efficiency of linear attention with the expressive power of full attention mechanisms.<n>Our results indicate that the proposed interleaving framework effectively balances computational efficiency and generative performance.
arXiv Detail & Related papers (2025-03-20T19:10:37Z) - Transformer^-1: Input-Adaptive Computation for Resource-Constrained Deployment [3.6219999155937113]
This paper proposes a Transformer$-1$ architecture to address the resource waste caused by fixed computation paradigms in deep learning models under dynamic scenarios.<n>In a benchmark test, our method reduces FLOPs by 42.7% and peak memory usage by 3% compared to the standard Transformer.<n>We also conducted experiments on several natural language processing tasks and achieved significant improvements in resource efficiency.
arXiv Detail & Related papers (2025-01-26T15:31:45Z) - Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation [15.377463849213033]
EFA is a novel global context modeling mechanism that focuses on functioning the global non-linearity.
Our ISR method reduces the key-value resolution at the inference phase, which can mitigate the computation-performance trade-off gap.
EDAFormer shows the state-of-the-art performance with the efficient computation compared to the existing transformer-based semantic segmentation models.
arXiv Detail & Related papers (2024-07-24T13:24:25Z) - Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach [58.57026686186709]
We introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR)
CFSR inherits the advantages of both convolution-based and transformer-based approaches.
Experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance.
arXiv Detail & Related papers (2024-01-11T03:08:00Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - Structured Convolutions for Efficient Neural Network Design [65.36569572213027]
We tackle model efficiency by exploiting redundancy in the textitimplicit structure of the building blocks of convolutional neural networks.
We show how this decomposition can be applied to 2D and 3D kernels as well as the fully-connected layers.
arXiv Detail & Related papers (2020-08-06T04:38:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.