Related papers: Description-Based Text Similarity

Related papers

Evaluating the impact of word embeddings on similarity scoring in practical information retrieval [0.5872014229110214]
Vector Space Modelling (VSM) and neural word embeddings play a crucial role in modern machine learning and Natural Language Processing pipelines.<n>This paper evaluates an alternative approach to measuring query statement similarity that moves away from the common similarity measure of centroids of neural word embeddings.
arXiv Detail & Related papers (2026-02-05T14:57:38Z)
RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns [50.401907401444404]
Large language models (LLMs) are crucial for preventing misuse and building trustworthy AI systems.<n>We propose RepreGuard, an efficient statistics-based detection method.<n> Experimental results show that RepreGuard outperforms all baselines with average 94.92% AUROC on both in-distribution (ID) and OOD scenarios.
arXiv Detail & Related papers (2025-08-18T17:59:15Z)
Explainable identification of similarities between entities for discovery in large text [0.0]
This study develops an n-gram analysis framework to compare documents automatically and uncover explainable similarities. A scoring formula is applied to assigns each of the n-grams with a weight, where the weight is higher when the n-grams are more frequent in both documents. Visualization tools like word clouds enhance the representation of these patterns, providing clearer insights.
arXiv Detail & Related papers (2025-03-22T01:20:43Z)
QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval [12.225881591629815]
In dense retrieval, embedding long texts into dense vectors can result in information loss, leading to inaccurate query-text matching. Recent studies mainly focus on improving the sentence embedding model or retrieval process. We introduce a novel text augmentation framework for dense retrieval, which transforms raw documents into information-dense text formats.
arXiv Detail & Related papers (2024-07-29T17:39:08Z)
Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation [67.89838237013078]
Named entity recognition (NER) models often struggle with noisy inputs. We propose a more realistic setting in which only noisy text and its NER labels are available. We employ a multi-view training framework that improves robust NER without retrieving text during inference.
arXiv Detail & Related papers (2024-07-26T07:30:41Z)
Test-time Contrastive Concepts for Open-world Semantic Segmentation [14.899741072838994]
Recent CLIP-like Vision-Language Models (VLMs), pre-trained on large amounts of image-text pairs, have paved the way to open-vocabulary semantic segmentation. We propose two different approaches to automatically generate, at test time, textual contrastive concepts that are query-specific.
arXiv Detail & Related papers (2024-07-06T12:18:43Z)
A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key Tokens [20.37803751979975]
When feeding a text into a large language model-based embedder, the obtained text embedding will be able to be aligned with the key tokens in the input text. We show that this phenomenon is universal and is not affected by model architecture, training strategy, and embedding method.
arXiv Detail & Related papers (2024-06-25T08:55:12Z)
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval [31.79030663958162]
We propose a new text modeling method T-MASS to enrich text embedding with a flexible and resilient semantic range. To be specific, we introduce a similarity-aware radius module to adapt the scale of the text mass upon the given text-video pairs. T-MASS achieves state-of-the-art performance on five benchmark datasets.
arXiv Detail & Related papers (2024-03-26T17:59:52Z)
STAIR: Learning Sparse Text and Image Representation in Grounded Tokens [84.14528645941128]
We show that it is possible to build a sparse semantic representation that is as powerful as, or even better than, dense presentations. We extend the CLIP model and build a sparse text and image representation (STAIR), where the image and text are mapped to a sparse token space. It significantly outperforms a CLIP model with +$4.9%$ and +$4.3%$ absolute Recall@1 improvement.
arXiv Detail & Related papers (2023-01-30T17:21:30Z)
Textual Entailment Recognition with Semantic Features from Empirical Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text. In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis. We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z)
Telling the What while Pointing the Where: Fine-grained Mouse Trace and Language Supervision for Improved Image Retrieval [60.24860627782486]
Fine-grained image retrieval often requires the ability to also express the where in the image the content they are looking for is. In this paper, we describe an image retrieval setup where the user simultaneously describes an image using both spoken natural language (the "what") and mouse traces over an empty canvas (the "where") Our model is capable of taking this spatial guidance into account, and provides more accurate retrieval results compared to text-only equivalent systems.
arXiv Detail & Related papers (2021-02-09T17:54:34Z)
An Intelligent CNN-VAE Text Representation Technology Based on Text Semantics for Comprehensive Big Data [15.680918844684454]
A text feature representation model based on convolutional neural network (CNN) and variational autoencoder (VAE) is proposed. The proposed model outperforms in k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) classification algorithms.
arXiv Detail & Related papers (2020-08-28T07:39:45Z)
Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems. We formulate the extractive summarization task as a semantic text matching problem. We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
Comparative Analysis of N-gram Text Representation on Igbo Text Document Similarity [0.0]
The improvement in Information Technology has encouraged the use of Igbo in the creation of text such as resources and news articles online. It adopted Euclidean similarity measure to determine the similarities between Igbo text documents represented with two word-based n-gram text representation (unigram and bigram) models.
arXiv Detail & Related papers (2020-04-01T12:24:47Z)
Learning to Select Bi-Aspect Information for Document-Scale Text Content Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer. In detail, the input is a set of structured records and a reference text for describing another recordset. The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.