Hyperbolic Additive Margin Softmax with Hierarchical Information for Speaker Verification
- URL: http://arxiv.org/abs/2601.19709v1
- Date: Tue, 27 Jan 2026 15:33:47 GMT
- Title: Hyperbolic Additive Margin Softmax with Hierarchical Information for Speaker Verification
- Authors: Zhihua Fang, Liang He,
- Abstract summary: We propose Hyperbolic Softmax (H-Softmax) and Hyperbolic Additive Margin Softmax (HAM-Softmax) based on hyperbolic space.<n>H-Softmax incorporates hierarchical information into speaker embeddings by projecting embeddings and speaker centers into hyperbolic space.<n>Ham-Softmax further enhances inter-class separability by introducing margin constraint.
- Score: 11.01429225070742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speaker embedding learning based on Euclidean space has achieved significant progress, but it is still insufficient in modeling hierarchical information within speaker features. Hyperbolic space, with its negative curvature geometric properties, can efficiently represent hierarchical information within a finite volume, making it more suitable for the feature distribution of speaker embeddings. In this paper, we propose Hyperbolic Softmax (H-Softmax) and Hyperbolic Additive Margin Softmax (HAM-Softmax) based on hyperbolic space. H-Softmax incorporates hierarchical information into speaker embeddings by projecting embeddings and speaker centers into hyperbolic space and computing hyperbolic distances. HAM-Softmax further enhances inter-class separability by introducing margin constraint on this basis. Experimental results show that H-Softmax and HAM-Softmax achieve average relative EER reductions of 27.84% and 14.23% compared with standard Softmax and AM-Softmax, respectively, demonstrating that the proposed methods effectively improve speaker verification performance and at the same time preserve the capability of hierarchical structure modeling. The code will be released at https://github.com/PunkMale/HAM-Softmax.
Related papers
- SoLA-Vision: Fine-grained Layer-wise Linear Softmax Hybrid Attention [50.99430451151184]
Linear attention reduces the cost to O(N), yet its compressed state representations can impair modeling capacity and accuracy.<n>We present an analytical study that contrasts linear and softmax attention for visual representation learning.<n>We propose SoLA-Vision, a flexible layer-wise hybrid attention backbone.
arXiv Detail & Related papers (2026-01-16T10:26:53Z) - $ε$-Softmax: Approximating One-Hot Vectors for Mitigating Label Noise [99.91399796174602]
Noisy labels pose a common challenge for training accurate deep neural networks.<n>We propose $epsilon$-softmax, which modifies the outputs of the softmax layer to approximate one-hot vectors with a controllable error.<n>We prove theoretically that $epsilon$-softmax can achieve noise-tolerant learning with controllable excess risk bound for almost any loss function.
arXiv Detail & Related papers (2025-08-04T13:10:48Z) - Unpacking Softmax: How Temperature Drives Representation Collapse, Compression, and Generalization [15.458541841436967]
We study the pivotal role of the softmax function in shaping the model's representation.<n>We introduce the concept of rank deficit bias - a phenomenon in which softmax-based deep networks find solutions of rank much lower than the number of classes.<n>We demonstrate how to exploit the softmax dynamics to learn compressed representations or to enhance their performance on out-of-distribution data.
arXiv Detail & Related papers (2025-06-02T11:38:10Z) - Self-Adjust Softmax [62.267367768385434]
The softmax function is crucial in Transformer attention, which normalizes each row of the attention scores with summation to one.<n>We propose Self-Adjust Softmax (SA-Softmax) to address this issue by modifying $softmax(x)$ to $x cdot softmax(x)$ and its normalized variant $frac(x - min(x_min,0))max(0,x_max)-min(x_min,0) cdot softmax(x)$.
arXiv Detail & Related papers (2025-02-25T15:07:40Z) - Scalable-Softmax Is Superior for Attention [0.0]
Transformer-based language models rely on Softmax to compute attention scores.<n>SSMax replaces Softmax in scenarios where the input vector size varies.<n>Models using SSMax not only achieve faster loss reduction during pretraining but also significantly improve performance in long contexts.
arXiv Detail & Related papers (2025-01-31T18:55:35Z) - MultiMax: Sparse and Multi-Modal Attention Learning [60.49318008131978]
SoftMax is a ubiquitous ingredient of modern machine learning algorithms.<n>We show that sparsity can be achieved by a family of SoftMax variants, but they often require an alternative loss function and do not preserve multi-modality.<n>We propose MultiMax, which adaptively modulates the output distribution according to input entry range.
arXiv Detail & Related papers (2024-06-03T10:51:43Z) - ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters [14.029865087214436]
Self-attention mechanism distinguishes transformer-based large language models (LLMs) apart from convolutional and recurrent neural networks.
achieving real-time LLM inference on silicon remains challenging due to the extensive use of Softmax in self-attention.
We propose Constant Softmax (ConSmax), a software- hardware co-design that serves as an efficient alternative to Softmax.
arXiv Detail & Related papers (2024-01-31T17:52:52Z) - Spectral Aware Softmax for Visible-Infrared Person Re-Identification [123.69049942659285]
Visible-infrared person re-identification (VI-ReID) aims to match specific pedestrian images from different modalities.
Existing methods still follow the softmax loss training paradigm, which is widely used in single-modality classification tasks.
We propose the spectral-aware softmax (SA-Softmax) loss, which can fully explore the embedding space with the modality information.
arXiv Detail & Related papers (2023-02-03T02:57:18Z) - Breaking the Softmax Bottleneck for Sequential Recommender Systems with
Dropout and Decoupling [0.0]
We show that there are more aspects to the Softmax bottleneck in SBRSs.
We propose a simple yet effective method, Dropout and Decoupling (D&D), to alleviate these problems.
Our method significantly improves the accuracy of a variety of Softmax-based SBRS algorithms.
arXiv Detail & Related papers (2021-10-11T16:52:23Z) - Imbalance Robust Softmax for Deep Embeeding Learning [34.95520933299555]
In recent years, one research focus is to solve the open-set problem by discriminative deep embedding learning in the field of face recognition (FR) and person re-identification (re-ID)
We find that imbalanced training data is another main factor causing the performance of FR and re-ID with softmax or its variants.
We propose a unified framework, Imbalance-Robust Softmax (IR-Softmax), which can simultaneously solve the open-set problem and reduce the influence of data imbalance.
arXiv Detail & Related papers (2020-11-23T00:43:07Z) - Optimal Approximation -- Smoothness Tradeoffs for Soft-Max Functions [73.33961743410876]
A soft-max function has two main efficiency measures: approximation and smoothness.
We identify the optimal approximation-smoothness tradeoffs for different measures of approximation and smoothness.
This leads to novel soft-max functions, each of which is optimal for a different application.
arXiv Detail & Related papers (2020-10-22T05:19:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.