S2Sent: Nested Selectivity Aware Sentence Representation Learning
- URL: http://arxiv.org/abs/2508.18164v1
- Date: Mon, 25 Aug 2025 16:13:42 GMT
- Title: S2Sent: Nested Selectivity Aware Sentence Representation Learning
- Authors: Jianxiang Zang, Nijia Mo, Yonda Wei, Meiling Ning, Hui Liu,
- Abstract summary: We propose a sentence representation selection mechanism Ssuperscript2Sent.<n>The selector performs spatial selection (SS) and nested frequency selection (FS) from a modular perspective.<n>Extensive experiments have demonstrated that Stextsuperscript2Sent achieves significant improvements over baseline methods.
- Score: 5.284254208630281
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The combination of Transformer-based encoders with contrastive learning represents the current mainstream paradigm for sentence representation learning. This paradigm is typically based on the hidden states of the last Transformer block of the encoder. However, within Transformer-based encoders, different blocks exhibit varying degrees of semantic perception ability. From the perspective of interpretability, the semantic perception potential of knowledge neurons is modulated by stimuli, thus rational cross-block representation fusion is a direction worth optimizing. To balance the semantic redundancy and loss across block fusion, we propose a sentence representation selection mechanism S\textsuperscript{2}Sent, which integrates a parameterized nested selector downstream of the Transformer-based encoder. This selector performs spatial selection (SS) and nested frequency selection (FS) from a modular perspective. The SS innovatively employs a spatial squeeze based self-gating mechanism to obtain adaptive weights, which not only achieves fusion with low information redundancy but also captures the dependencies between embedding features. The nested FS replaces GAP with different DCT basis functions to achieve spatial squeeze with low semantic loss. Extensive experiments have demonstrated that S\textsuperscript{2}Sent achieves significant improvements over baseline methods with negligible additional parameters and inference latency, while highlighting high integrability and scalability.
Related papers
- Unified Multimodal Brain Decoding via Cross-Subject Soft-ROI Fusion [0.0]
Multimodal brain decoding aims to reconstruct semantic information consistent with visual stimuli from brain activity signals such as fMRI.<n>We propose a BrainROI model and achieve leading-level results in brain-captioning evaluation on the NSD dataset.<n>Under the cross-subject setting, compared with recent state-of-the-art methods and representative baselines, metrics such as BLEU-4 and CIDEr show clear improvements.
arXiv Detail & Related papers (2025-12-23T11:04:34Z) - Mechanistic Interpretability for Neural TSP Solvers [0.8092772265574576]
We train a pointer network with reinforcement learning on 100-node instances, then fit an SAE to the encoder's residual stream to discover an overcomplete dictionary of interpretable features.<n>Our analysis reveals that the solver naturally develops features mirroring fundamental TSP concepts.<n>These findings provide the first model-internal account of what neural solvers compute before node selection, demonstrate that geometric structure emerges without explicit supervision, and suggest pathways toward transparent hybrid systems.
arXiv Detail & Related papers (2025-10-24T17:54:19Z) - Semantic Fusion with Fuzzy-Membership Features for Controllable Language Modelling [0.0]
semantic fusion is a lightweight scheme that augments a Transformer language model (LM) with a fuzzy-membership feature channel.<n>Each token is represented by a vector of interpretable features whose values are graded degrees from differentiable membership functions.<n>This approach adds only small overhead, remains fully compatible with tied input-output embeddings, and provides an interpretable pathway for conditioned natural language generation.
arXiv Detail & Related papers (2025-09-14T22:11:09Z) - Semi-supervised Semantic Segmentation with Multi-Constraint Consistency Learning [81.02648336552421]
We propose a Multi-Constraint Consistency Learning approach to facilitate the staged enhancement of the encoder and decoder.<n>Self-adaptive feature masking and noise injection are designed in an instance-specific manner to perturb the features for robust learning of the decoder.<n> Experimental results on Pascal VOC2012 and Cityscapes datasets demonstrate that our proposed MCCL achieves new state-of-the-art performance.
arXiv Detail & Related papers (2025-03-23T03:21:33Z) - SAFR: Neuron Redistribution for Interpretability [7.756342860929851]
Superposition refers to encoding representations of multiple features within a single neuron.<n>Despite promising performance, the model's interpretability has been diminished.<n>This paper presents a novel approach to enhance model interpretability by regularizing feature superposition.
arXiv Detail & Related papers (2025-01-23T06:20:33Z) - Spatial Semantic Recurrent Mining for Referring Image Segmentation [63.34997546393106]
We propose Stextsuperscript2RM to achieve high-quality cross-modality fusion.
It follows a working strategy of trilogy: distributing language feature, spatial semantic recurrent coparsing, and parsed-semantic balancing.
Our proposed method performs favorably against other state-of-the-art algorithms.
arXiv Detail & Related papers (2024-05-15T00:17:48Z) - Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation [59.91357714415056]
We propose two Transformer variants: Context-Sharing Transformer (CST) and Semantic Gathering-Scattering Transformer (S GST)
CST learns the global-shared contextual information within image frames with a lightweight computation; S GST models the semantic correlation separately for the foreground and background.
Compared with the baseline that uses vanilla Transformers for multi-stage fusion, ours significantly increase the speed by 13 times and achieves new state-of-the-art ZVOS performance.
arXiv Detail & Related papers (2023-08-13T06:12:00Z) - Learning Disentangled Semantic Spaces of Explanations via Invertible Neural Networks [10.880057430629126]
Disentangled latent spaces usually have better semantic separability and geometrical properties, which leads to better interpretability and more controllable data generation.
In this work, we focus on a more general form of sentence disentanglement, targeting the localised modification and control of more general sentence semantic features.
We introduce a flow-based invertible neural network (INN) mechanism integrated with a transformer-based language Autoencoder (AE) in order to deliver latent spaces with better separability properties.
arXiv Detail & Related papers (2023-05-02T18:27:13Z) - Multi-scale and Cross-scale Contrastive Learning for Semantic
Segmentation [5.281694565226513]
We apply contrastive learning to enhance the discriminative power of the multi-scale features extracted by semantic segmentation networks.
By first mapping the encoder's multi-scale representations to a common feature space, we instantiate a novel form of supervised local-global constraint.
arXiv Detail & Related papers (2022-03-25T01:24:24Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - MetaSDF: Meta-learning Signed Distance Functions [85.81290552559817]
Generalizing across shapes with neural implicit representations amounts to learning priors over the respective function space.
We formalize learning of a shape space as a meta-learning problem and leverage gradient-based meta-learning algorithms to solve this task.
arXiv Detail & Related papers (2020-06-17T05:14:53Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z) - Spatial-Scale Aligned Network for Fine-Grained Recognition [42.71878867504503]
Existing approaches for fine-grained visual recognition focus on learning marginal region-based representations.
We propose the spatial-scale aligned network (SSANET) and implicitly address misalignments during the recognition process.
arXiv Detail & Related papers (2020-01-05T11:12:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.