CURA: Size Isnt All You Need - A Compact Universal Architecture for On-Device Intelligence
- URL: http://arxiv.org/abs/2509.24601v1
- Date: Mon, 29 Sep 2025 11:06:37 GMT
- Title: CURA: Size Isnt All You Need - A Compact Universal Architecture for On-Device Intelligence
- Authors: Jae-Bum Seo, Muhammad Salman, Lismer Andres Caceres-Najarro,
- Abstract summary: We propose CURA, an architecture that provides a compact and lightweight solution for diverse machine learning tasks.<n>For compactness, it achieved equivalent accuracy using up to 2,500 times fewer parameters compared to baseline models.<n>For generalizability, it demonstrated consistent performance across four NLP benchmarks and one computer vision dataset.
- Score: 0.6244816393907943
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing on-device AI architectures for resource-constrained environments face two critical limitations: they lack compactness, with parameter requirements scaling proportionally to task complexity, and they exhibit poor generalizability, performing effectively only on specific application domains (e.g., models designed for regression tasks cannot adapt to natural language processing (NLP) applications). In this paper, we propose CURA, an architecture inspired by analog audio signal processing circuits that provides a compact and lightweight solution for diverse machine learning tasks across multiple domains. Our architecture offers three key advantages over existing approaches: (1) Compactness: it requires significantly fewer parameters regardless of task complexity; (2) Generalizability: it adapts seamlessly across regression, classification, complex NLP, and computer vision tasks; and (3) Complex pattern recognition: it can capture intricate data patterns while maintaining extremely low model complexity. We evaluated CURA across diverse datasets and domains. For compactness, it achieved equivalent accuracy using up to 2,500 times fewer parameters compared to baseline models. For generalizability, it demonstrated consistent performance across four NLP benchmarks and one computer vision dataset, nearly matching specialized existing models (achieving F1-scores up to 90%). Lastly, it delivers superior forecasting accuracy for complex patterns, achieving 1.6 times lower mean absolute error and 2.1 times lower mean squared error than competing models.
Related papers
- Channel-Adaptive Edge AI: Maximizing Inference Throughput by Adapting Computational Complexity to Channel States [31.472509140661796]
emph communication and computation (IC$2$) has emerged as a new paradigm for enabling efficient edge inference in 6G networks.<n>The metric is highly complicated as it needs to account for both channel distortion and artificial intelligence (AI) model architecture and computational complexity.<n>We develop a tractable analytical model for E2E inference accuracy and leverage it to design a emphchannel-adaptive AI algorithm that maximizes inference throughput.
arXiv Detail & Related papers (2026-03-03T16:33:29Z) - Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models [99.85131798240808]
We introduce a novel generative framework called textitGuided Topology Diffusion (GTD)<n>Inspired by conditional discrete graph diffusion models, GTD formulates topology synthesis as an iterative construction process.<n>At each step, the generation is steered by a lightweight proxy model that predicts multi-objective rewards.<n>Experiments show that GTD can generate highly task-adaptive, sparse, and efficient communication topologies.
arXiv Detail & Related papers (2025-10-09T05:28:28Z) - QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution [53.13952833016505]
We propose a low-bit quantization model for real-world video super-resolution (VSR)<n>We use a calibration dataset to measure both spatial and temporal complexity for each layer.<n>We refine the FP and low-bit branches to achieve simultaneous optimization.
arXiv Detail & Related papers (2025-08-06T14:35:59Z) - Efficient Multi-Instance Generation with Janus-Pro-Dirven Prompt Parsing [53.295515505026096]
Janus-Pro-driven Prompt Parsing is a prompt- parsing module that bridges text understanding and layout generation.<n>MIGLoRA is a parameter-efficient plug-in integrating Low-Rank Adaptation into UNet (SD1.5) and DiT (SD3) backbones.<n>The proposed method achieves state-of-the-art performance on COCO and LVIS benchmarks while maintaining parameter efficiency.
arXiv Detail & Related papers (2025-03-27T00:59:14Z) - ZeroLM: Data-Free Transformer Architecture Search for Language Models [54.83882149157548]
Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity.<n>This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics.<n>Our evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark.
arXiv Detail & Related papers (2025-03-24T13:11:22Z) - ZO-DARTS++: An Efficient and Size-Variable Zeroth-Order Neural Architecture Search Algorithm [13.271262526855212]
Differentiable Neural Architecture Search (NAS) provides promising avenue for automating complex design of deep learning (DL) models.<n>We introduce ZO-DARTS++, a novel NAS method that effectively balances performance and resource constraints.<n>In extensive tests on medical imaging datasets, ZO-DARTS++ improves the average accuracy by up to 1.8% over standard DARTS-based methods.
arXiv Detail & Related papers (2025-03-08T06:43:33Z) - A Low-Complexity Plug-and-Play Deep Learning Model for Massive MIMO Precoding Across Sites [5.896656636095934]
MMIMO technology has transformed wireless communication by enhancing spectral efficiency and network capacity.<n>This paper proposes a novel deep learning-based mMIMO precoder to tackle the complexity challenges of existing approaches.
arXiv Detail & Related papers (2025-02-12T20:02:36Z) - Adaptable Embeddings Network (AEN) [49.1574468325115]
We introduce Adaptable Embeddings Networks (AEN), a novel dual-encoder architecture using Kernel Density Estimation (KDE)
AEN allows for runtime adaptation of classification criteria without retraining and is non-autoregressive.
The architecture's ability to preprocess and cache condition embeddings makes it ideal for edge computing applications and real-time monitoring systems.
arXiv Detail & Related papers (2024-11-21T02:15:52Z) - Less is KEN: a Universal and Simple Non-Parametric Pruning Algorithm for Large Language Models [1.5807079236265718]
KEN is a straightforward, universal and unstructured pruning algorithm based on Kernel Density Estimation (KDE)
Ken aims to construct optimized transformers by selectively preserving the most significant parameters while restoring others to their pre-training state.
Ken achieves equal or better performance than their original unpruned versions, with a minimum parameter reduction of 25%.
arXiv Detail & Related papers (2024-02-05T16:11:43Z) - Compact: Approximating Complex Activation Functions for Secure Computation [15.801954240019176]
Compact produces piece-wise approximations of complex AFs to enable their efficient use with state-of-the-art MPC techniques.
We show that Compact incurs negligible accuracy loss while being 2x-5x more efficient than state-of-the-art approaches for DNN models with large number of hidden layers.
arXiv Detail & Related papers (2023-09-09T02:44:41Z) - SqueezeLLM: Dense-and-Sparse Quantization [80.32162537942138]
Main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, for single batch inference.
We introduce SqueezeLLM, a post-training quantization framework that enables lossless compression to ultra-low precisions of up to 3-bit.
Our framework incorporates two novel ideas: (i) sensitivity-based non-uniform quantization, which searches for the optimal bit precision assignment based on second-order information; and (ii) the Dense-and-Sparse decomposition that stores outliers and sensitive weight values in an efficient sparse format.
arXiv Detail & Related papers (2023-06-13T08:57:54Z) - OTOV2: Automatic, Generic, User-Friendly [39.828644638174225]
We propose the second generation of Only-Train-Once (OTOv2), which first automatically trains and compresses a general DNN only once from scratch.
OTOv2 is automatic and pluggable into various deep learning applications, and requires almost minimal engineering efforts from the users.
Numerically, we demonstrate the generality and autonomy of OTOv2 on a variety of model architectures such as VGG, ResNet, CARN, ConvNeXt, DenseNet and StackedUnets.
arXiv Detail & Related papers (2023-03-13T05:13:47Z) - Squeezeformer: An Efficient Transformer for Automatic Speech Recognition [99.349598600887]
Conformer is the de facto backbone model for various downstream speech tasks based on its hybrid attention-convolution architecture.
We propose the Squeezeformer model, which consistently outperforms the state-of-the-art ASR models under the same training schemes.
arXiv Detail & Related papers (2022-06-02T06:06:29Z) - AlphaGAN: Fully Differentiable Architecture Search for Generative
Adversarial Networks [15.740179244963116]
Generative Adversarial Networks (GANs) are formulated as minimax game problems, whereby generators attempt to approach real data distributions by virtue of adversarial learning against discriminators.
In this work, we aim to boost model learning from the perspective of network architectures, by incorporating recent progress on automated architecture search into GANs.
We propose a fully differentiable search framework for generative adversarial networks, dubbed alphaGAN.
arXiv Detail & Related papers (2020-06-16T13:27:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.