MEGA: xLSTM with Multihead Exponential Gated Fusion for Precise Aspect-based Sentiment Analysis
- URL: http://arxiv.org/abs/2507.01213v1
- Date: Tue, 01 Jul 2025 22:21:33 GMT
- Title: MEGA: xLSTM with Multihead Exponential Gated Fusion for Precise Aspect-based Sentiment Analysis
- Authors: Adamu Lawan, Juhua Pu, Haruna Yunusa, Jawad Muhammad, Muhammad Lawan,
- Abstract summary: Aspect-based Sentiment Analysis (ABSA) is a critical Natural Language Processing (NLP) task that extracts aspects from text and determines their associated sentiments.<n>Existing ABSA methods struggle to balance computational efficiency with high performance.<n>We propose xLSTM with Multihead Exponential Gated Fusion (MEGA), a novel framework integrating a bi-directional mLSTM architecture with forward and partially flipped backward streams.
- Score: 2.9045498954705886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Aspect-based Sentiment Analysis (ABSA) is a critical Natural Language Processing (NLP) task that extracts aspects from text and determines their associated sentiments, enabling fine-grained analysis of user opinions. Existing ABSA methods struggle to balance computational efficiency with high performance: deep learning models often lack global context, transformers demand significant computational resources, and Mamba-based approaches face CUDA dependency and diminished local correlations. Recent advancements in Extended Long Short-Term Memory (xLSTM) models, particularly their efficient modeling of long-range dependencies, have significantly advanced the NLP community. However, their potential in ABSA remains untapped. To this end, we propose xLSTM with Multihead Exponential Gated Fusion (MEGA), a novel framework integrating a bi-directional mLSTM architecture with forward and partially flipped backward (PF-mLSTM) streams. The PF-mLSTM enhances localized context modeling by processing the initial sequence segment in reverse with dedicated parameters, preserving critical short-range patterns. We further introduce an mLSTM-based multihead cross exponential gated fusion mechanism (MECGAF) that dynamically combines forward mLSTM outputs as query and key with PF-mLSTM outputs as value, optimizing short-range dependency capture while maintaining global context and efficiency. Experimental results on three benchmark datasets demonstrate that MEGA outperforms state-of-the-art baselines, achieving superior accuracy and efficiency in ABSA tasks.
Related papers
- Refer-Agent: A Collaborative Multi-Agent System with Reasoning and Reflection for Referring Video Object Segmentation [50.22481337087162]
Referring Video Object (RVOS) aims to segment objects in videos based on textual queries.<n>Refer-Agent is a collaborative multi-agent system with alternating reasoning-reflection mechanisms.
arXiv Detail & Related papers (2026-02-03T14:48:12Z) - Co-Training Vision Language Models for Remote Sensing Multi-task Learning [68.15604397741753]
Vision language models (VLMs) have achieved promising results in RS image understanding, grounding, and ultra-high-resolution (UHR) image reasoning.<n>We present RSCoVLM, a simple yet flexible VLM baseline for RS MTL.<n>We propose a unified dynamic-resolution strategy to address the diverse image scales inherent in RS imagery.
arXiv Detail & Related papers (2025-11-26T10:55:07Z) - xLSTM Scaling Laws: Competitive Performance with Linear Time-Complexity [22.40851170527]
Scaling laws play a central role in the success of Large Language Models.<n>Recent alternatives such as xLSTM offer linear complexity with respect to context length.<n>xLSTM's advantage widens as training and inference contexts grow.
arXiv Detail & Related papers (2025-10-02T17:14:34Z) - Graft: Integrating the Domain Knowledge via Efficient Parameter Synergy for MLLMs [56.76586846269894]
Multimodal Large Language Models (MLLMs) have achieved success across various domains.<n>Despite its importance, the study of knowledge sharing among domain-specific MLLMs remains largely underexplored.<n>We propose a unified parameter integration framework that enables modular composition of expert capabilities.
arXiv Detail & Related papers (2025-06-30T15:07:41Z) - xLSTMAD: A Powerful xLSTM-based Method for Anomaly Detection [0.794682109939797]
We propose xLSTMAD, the first anomaly detection method that integrates a full encoder-decoder xLSTM architecture.<n>We evaluate our method on the comprehensive TSB-AD-M benchmark, which spans 17 real-world datasets.<n>In our results, xLSTM showcases state-of-the-art accuracy, outperforming 23 popular anomaly detection baselines.
arXiv Detail & Related papers (2025-06-28T10:39:09Z) - Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection [88.47928738482719]
Linear State Space Models (SSMs) offer remarkable performance gains in sequence modeling.<n>Recent advances, such as Mamba, further enhance SSMs with input-dependent gating and hardware-aware implementations.<n>We introduce Routing Mamba (RoM), a novel approach that scales SSM parameters using sparse mixtures of linear projection experts.
arXiv Detail & Related papers (2025-06-22T19:26:55Z) - Comprehensive Attribute Encoding and Dynamic LSTM HyperModels for Outcome Oriented Predictive Business Process Monitoring [5.634923879819779]
Predictive Business Process Monitoring aims to forecast future outcomes of ongoing business processes.<n>Existing methods often lack flexibility to handle real-world challenges such as simultaneous events, class imbalance, and multi-level attributes.<n>We propose a suite of dynamic LSTM HyperModels that integrate two-level hierarchical encoding for event and sequence attributes.<n> specialized LSTM variants for simultaneous event modeling, leveraging multidimensional embeddings and time-difference flag augmentation.
arXiv Detail & Related papers (2025-06-04T08:27:58Z) - Distilling Transitional Pattern to Large Language Models for Multimodal Session-based Recommendation [67.84581846180458]
Session-based recommendation (SBR) predicts the next item based on anonymous sessions.<n>Recent Multimodal SBR methods utilize simplistic pre-trained models for modality learning but have limitations in semantic richness.<n>We propose a multimodal LLM-enhanced framework TPAD, which extends a distillation paradigm to decouple and align transitional patterns for promoting MSBR.
arXiv Detail & Related papers (2025-04-13T07:49:08Z) - Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks.<n>By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z) - DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [70.91804882618243]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks.<n>We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge.<n>Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z) - MAL: Cluster-Masked and Multi-Task Pretraining for Enhanced xLSTM Vision Performance [2.45239928345171]
We introduce MAL (Cluster-Masked and Multi-Task Pretraining for Enhanced xLSTM Vision Performance), a novel framework that enhances xLSTM's capabilities through innovative pretraining strategies.<n>We propose a cluster-masked masking method that significantly improves local feature capture and optimize image scanning efficiency.<n>Our universal encoder-decoder pretraining approach integrates multiple tasks, including image autoregression, depth estimation, and image segmentation, thereby enhancing the model's adaptability and robustness across diverse visual tasks.
arXiv Detail & Related papers (2024-12-14T07:58:24Z) - Unlocking the Power of LSTM for Long Term Time Series Forecasting [27.245021350821638]
We propose a simple yet efficient algorithm named P-sLSTM built upon sLSTM by incorporating patching and channel independence.<n>These modifications substantially enhance sLSTM's performance in TSF, achieving state-of-the-art results.
arXiv Detail & Related papers (2024-08-19T13:59:26Z) - xLSTM: Extended Long Short-Term Memory [26.607656211983155]
In the 1990s, constant error carousel and gating were introduced as the central ideas of the Long Short-Term Memory (LSTM)<n>We introduce exponential gating with appropriate normalization and stabilization techniques.<n>We modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule.
arXiv Detail & Related papers (2024-05-07T17:50:21Z) - CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning [59.88924847995279]
We propose a novel Cross-Modal LLM Fine-Tuning (CALF) framework for MTSF.<n>To reduce the distribution discrepancy, we develop the cross-modal match module.<n>CALF establishes state-of-the-art performance for both long-term and short-term forecasting tasks.
arXiv Detail & Related papers (2024-03-12T04:04:38Z) - DeLELSTM: Decomposition-based Linear Explainable LSTM to Capture
Instantaneous and Long-term Effects in Time Series [26.378073712630467]
We propose a Decomposition-based Linear Explainable LSTM (DeLELSTM) to improve the interpretability of LSTM.
We demonstrate the effectiveness and interpretability of DeLELSTM on three empirical datasets.
arXiv Detail & Related papers (2023-08-26T07:45:41Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - Towards Energy-Efficient, Low-Latency and Accurate Spiking LSTMs [1.7969777786551424]
Spiking Neural Networks (SNNs) have emerged as an attractive-temporal computing paradigm vision for complex tasks.
We propose an optimized spiking long short-term memory networks (LSTM) training framework that involves a novel.
rev-to-SNN conversion framework, followed by SNN training.
We evaluate our framework on sequential learning tasks including temporal M, Google Speech Commands (GSC) datasets, and UCI Smartphone on different LSTM architectures.
arXiv Detail & Related papers (2022-10-23T04:10:27Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - Long short-term memory networks and laglasso for bond yield forecasting:
Peeping inside the black box [10.412912723760172]
We conduct the first study of bond yield forecasting using long short-term memory (LSTM) networks.
We calculate the LSTM signals through time, at selected locations in the memory cell, using sequence-to-sequence architectures.
arXiv Detail & Related papers (2020-05-05T14:23:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.