Related papers: SG-XDEAT: Sparsity-Guided Cross-Dimensional and Cross-Encoding Attention with Target-Aware Conditioning in Tabular Learning

SG-XDEAT: Sparsity-Guided Cross-Dimensional and Cross-Encoding Attention with Target-Aware Conditioning in Tabular Learning

URL: http://arxiv.org/abs/2510.12659v1
Date: Tue, 14 Oct 2025 15:56:40 GMT
Title: SG-XDEAT: Sparsity-Guided Cross-Dimensional and Cross-Encoding Attention with Target-Aware Conditioning in Tabular Learning
Authors: Chih-Chuan Cheng, Yi-Ju Tseng,
Abstract summary: We propose SG-XDEAT, a novel framework for supervised learning on tabular data.<n>At its core, SG-XDEAT employs a dual-stream encoder that decomposes each input feature into two parallel representations.<n>These dual representations are then propagated through a hierarchical stack of attention-based modules.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose SG-XDEAT (Sparsity-Guided Cross Dimensional and Cross-Encoding Attention with Target Aware Conditioning), a novel framework designed for supervised learning on tabular data. At its core, SG-XDEAT employs a dual-stream encoder that decomposes each input feature into two parallel representations: a raw value stream and a target-conditioned (label-aware) stream. These dual representations are then propagated through a hierarchical stack of attention-based modules. SG-XDEAT integrates three key components: (i) Cross-Dimensional self-attention, which captures intra-view dependencies among features within each stream; (ii) Cross-Encoding self-attention, which enables bidirectional interaction between raw and target-aware representations; and (iii) an Adaptive Sparse Self-Attention (ASSA) mechanism, which dynamically suppresses low-utility tokens by driving their attention weights toward zero--thereby mitigating the impact of noise. Empirical results on multiple public benchmarks show consistent gains over strong baselines, confirming that jointly modeling raw and target-aware views--while adaptively filtering noise--yields a more robust deep tabular learner.

Related papers

Pay Attention to CTC: Fast and Robust Pseudo-Labelling for Unified Speech Recognition [61.39209522608919]
Unified Speech Recognition has emerged as a semi-supervised framework for training a single model for audio, visual, and audiovisual speech recognition.<n>We propose CTC-driven teacher forcing, where greedily decoded CTC pseudo-labels are fed into the decoder to generate attention targets.<n>Because CTC and CTC-driven attention pseudo-labels have the same length, the decoder can predict both simultaneously.
arXiv Detail & Related papers (2026-02-22T19:38:21Z)
Entropy-Aware Structural Alignment for Zero-Shot Handwritten Chinese Character Recognition [7.632962062462334]
Zero-shot Handwritten Chinese Character Recognition aims to recognize unseen characters by leveraging radical-based semantic compositions.<n>We propose an Entropy-Aware Structural Alignment Network that bridges the visual-semantic gap through information-theoretic modeling.<n>Our method establishes new state-of-the-art performance, achieving an accuracy of 55.04% on the ICDAR 2013 dataset.
arXiv Detail & Related papers (2026-02-03T16:08:40Z)
Cross-Modal Attention Network with Dual Graph Learning in Multimodal Recommendation [12.802844514133255]
Cross-modal Recursive Attention Network with dual graph Embedding (CRANE)<n>We design a core Recursive Cross-Modal Attention (RCA) mechanism that iteratively refines modality features based on cross-correlations in a joint latent space.<n>For symmetric multimodal learning, we explicitly construct users' multimodal profiles by aggregating features of their interacted items.
arXiv Detail & Related papers (2026-01-16T10:09:39Z)
Mask & Match: Learning to Recognize Handwritten Math with Self-Supervised Attention [0.19116784879310025]
We present a self-supervised learning framework for recognizing handwritten mathematical expressions (HMER)<n>Our approach begins by pretraining an image encoder using a combination of global and local contrastive loss.<n>A key contribution of this work is a novel self-supervised attention network, which is trained using a progressive spatial masking strategy.
arXiv Detail & Related papers (2025-08-08T08:11:36Z)
CTR-Sink: Attention Sink for Language Models in Click-Through Rate Prediction [42.92011330807996]
$textitCTR-Sink$ is a novel framework introducing behavior-level attention sinks tailored for recommendation scenarios.<n>Inspired by attention sink theory, it constructs attention focus sinks and dynamically regulates attention aggregation via external information.
arXiv Detail & Related papers (2025-08-05T17:30:34Z)
Mitigating Attention Hacking in Preference-Based Reward Modeling via Interaction Distillation [62.14692332209628]
"Interaction Distillation" is a novel training framework for more adequate preference modeling through attention-level optimization.<n>It provides more stable and generalizable reward signals compared to state-of-the-art RM optimization methods.
arXiv Detail & Related papers (2025-08-04T17:06:23Z)
Semantic-Aligned Learning with Collaborative Refinement for Unsupervised VI-ReID [82.12123628480371]
Unsupervised person re-identification (USL-VI-ReID) seeks to match pedestrian images of the same individual across different modalities without human annotations for model learning.<n>Previous methods unify pseudo-labels of cross-modality images through label association algorithms and then design contrastive learning framework for global feature learning.<n>We propose a Semantic-Aligned Learning with Collaborative Refinement (SALCR) framework, which builds up objective for specific fine-grained patterns emphasized by each modality.
arXiv Detail & Related papers (2025-04-27T13:58:12Z)
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation [50.433911327489554]
The goal of referring remote sensing image segmentation (RRSIS) is to generate a pixel-level mask of the target object identified by the referring expression.<n>To address the aforementioned challenges, a novel RRSIS framework is proposed, termed the cross-modal bidirectional interaction model (CroBIM)<n>To further forster the research of RRSIS, we also construct RISBench, a new large-scale benchmark dataset comprising 52,472 image-language-label triplets.
arXiv Detail & Related papers (2024-10-11T08:28:04Z)
Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps. We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z)
Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features. Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z)
Towards Lightweight Cross-domain Sequential Recommendation via External Attention-enhanced Graph Convolution Network [7.1102362215550725]
Cross-domain Sequential Recommendation (CSR) depicts the evolution of behavior patterns for overlapped users by modeling their interactions from multiple domains. We introduce a lightweight external attention-enhanced GCN-based framework to solve the above challenges, namely LEA-GCN. To further alleviate the framework structure and aggregate the user-specific sequential pattern, we devise a novel dual-channel External Attention (EA) component.
arXiv Detail & Related papers (2023-02-07T03:06:29Z)
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks [34.32609892928909]
We propose a novel attention mechanism which we call external attention, based on two external, small, learnable, and shared memories. Our method provides comparable or superior performance to the self-attention mechanism and some of its variants, with much lower computational and memory costs.
arXiv Detail & Related papers (2021-05-05T22:29:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.