Semantic Distance Measurement based on Multi-Kernel Gaussian Processes
- URL: http://arxiv.org/abs/2512.12238v1
- Date: Sat, 13 Dec 2025 08:34:00 GMT
- Title: Semantic Distance Measurement based on Multi-Kernel Gaussian Processes
- Authors: Yinzhu Cheng, Haihua Xie, Yaqing Wang, Miao He, Mingming Sun,
- Abstract summary: A semantic distance is a metric defined on a space of texts or on a representation space derived from them.<n>In this paper, a semantic distance measure based on multi- kernel Gaussian processes (MK-GP) was proposed.<n>The experimental results demonstrated the effectiveness of the proposed measure.
- Score: 7.722282116495228
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic distance measurement is a fundamental problem in computational linguistics, providing a quantitative characterization of similarity or relatedness between text segments, and underpinning tasks such as text retrieval and text classification. From a mathematical perspective, a semantic distance can be viewed as a metric defined on a space of texts or on a representation space derived from them. However, most classical semantic distance methods are essentially fixed, making them difficult to adapt to specific data distributions and task requirements. In this paper, a semantic distance measure based on multi-kernel Gaussian processes (MK-GP) was proposed. The latent semantic function associated with texts was modeled as a Gaussian process, with its covariance function given by a combined kernel combining Matérn and polynomial components. The kernel parameters were learned automatically from data under supervision, rather than being hand-crafted. This semantic distance was instantiated and evaluated in the context of fine-grained sentiment classification with large language models under an in-context learning (ICL) setup. The experimental results demonstrated the effectiveness of the proposed measure.
Related papers
- Evaluating the impact of word embeddings on similarity scoring in practical information retrieval [0.5872014229110214]
Vector Space Modelling (VSM) and neural word embeddings play a crucial role in modern machine learning and Natural Language Processing pipelines.<n>This paper evaluates an alternative approach to measuring query statement similarity that moves away from the common similarity measure of centroids of neural word embeddings.
arXiv Detail & Related papers (2026-02-05T14:57:38Z) - Categorical Data Clustering via Value Order Estimated Distance Metric Learning [53.28598689867732]
This paper introduces a novel order distance metric learning approach to intuitively represent categorical attribute values.<n>A new joint learning paradigm is developed to alternatively perform clustering and order distance metric learning.<n>The proposed method achieves superior clustering accuracy on categorical and mixed datasets.
arXiv Detail & Related papers (2024-11-19T08:23:25Z) - Symmetry Discovery for Different Data Types [52.2614860099811]
Equivariant neural networks incorporate symmetries into their architecture, achieving higher generalization performance.
We propose LieSD, a method for discovering symmetries via trained neural networks which approximate the input-output mappings of the tasks.
We validate the performance of LieSD on tasks with symmetries such as the two-body problem, the moment of inertia matrix prediction, and top quark tagging.
arXiv Detail & Related papers (2024-10-13T13:39:39Z) - Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning [50.84938730450622]
We propose a trajectory-based method TV score, which uses trajectory volatility for OOD detection in mathematical reasoning.
Our method outperforms all traditional algorithms on GLMs under mathematical reasoning scenarios.
Our method can be extended to more applications with high-density features in output spaces, such as multiple-choice questions.
arXiv Detail & Related papers (2024-05-22T22:22:25Z) - Efficient Large-scale Nonstationary Spatial Covariance Function
Estimation Using Convolutional Neural Networks [3.5455896230714194]
We use ConvNets to derive subregions from the nonstationary data.
We employ a selection mechanism to identify subregions that exhibit similar behavior to stationary fields.
We assess the performance of the proposed method with synthetic and real datasets at a large scale.
arXiv Detail & Related papers (2023-06-20T12:17:46Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - SetMargin Loss applied to Deep Keystroke Biometrics with Circle Packing
Interpretation [67.0845003374569]
This work presents a new deep learning approach for keystroke biometrics based on a novel Distance Metric Learning method (DML)
We prove experimentally the effectiveness of the proposed approach on a challenging task: keystroke biometric identification over a large set of 78,000 subjects.
arXiv Detail & Related papers (2021-09-02T13:26:57Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - Automated Discovery of Mathematical Definitions in Text with Deep Neural
Networks [6.172021438837204]
This paper focuses on automatic detection of one-sentence definitions in mathematical texts.
We apply deep learning methods such as the Convolutional Neural Network (CNN) and the Long Short-Term Memory network (LSTM)
We also present a new dataset for definition extraction from mathematical texts.
arXiv Detail & Related papers (2020-11-09T15:57:53Z) - Spatial Pyramid Based Graph Reasoning for Semantic Segmentation [67.47159595239798]
We apply graph convolution into the semantic segmentation task and propose an improved Laplacian.
The graph reasoning is directly performed in the original feature space organized as a spatial pyramid.
We achieve comparable performance with advantages in computational and memory overhead.
arXiv Detail & Related papers (2020-03-23T12:28:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.