A Two-Sample Test of Text Generation Similarity
- URL: http://arxiv.org/abs/2505.05269v1
- Date: Thu, 08 May 2025 14:15:53 GMT
- Title: A Two-Sample Test of Text Generation Similarity
- Authors: Jingbin Xu, Chen Qian, Meimei Liu, Feng Guo,
- Abstract summary: This article proposes a novel two-sample text test for comparing similarity between two groups of documents.<n>The proposed test aims to assess text similarity by comparing the entropy of the documents.<n>Various simulation studies and a real data example demonstrated that the proposed two-sample text test maintains the nominal Type one error rate.
- Score: 11.686503374742495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The surge in digitized text data requires reliable inferential methods on observed textual patterns. This article proposes a novel two-sample text test for comparing similarity between two groups of documents. The hypothesis is whether the probabilistic mapping generating the textual data is identical across two groups of documents. The proposed test aims to assess text similarity by comparing the entropy of the documents. Entropy is estimated using neural network-based language models. The test statistic is derived from an estimation-and-inference framework, where the entropy is first approximated using an estimation set, followed by inference on the remaining data set. We showed theoretically that under mild conditions, the test statistic asymptotically follows a normal distribution. A multiple data-splitting strategy is proposed to enhance test power, which combines p-values into a unified decision. Various simulation studies and a real data example demonstrated that the proposed two-sample text test maintains the nominal Type one error rate while offering greater power compared to existing methods. The proposed method provides a novel solution to assert differences in document classes, particularly in fields where large-scale textual information is crucial.
Related papers
- Sampling-Based Estimation of Jaccard Containment and Similarity [0.0]
The study introduces a binomial model for predicting the overlap between samples, demonstrating that it is both accurate and practical when sample sizes are small compared to the original sets.<n>The framework is extended to estimate set similarity, and the paper provides guidance for applying these methods in large scale data systems where only partial or sampled data is available.
arXiv Detail & Related papers (2025-07-14T07:56:29Z) - Prototype-based Aleatoric Uncertainty Quantification for Cross-modal
Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space.
However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts.
We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z) - Active Sequential Two-Sample Testing [18.99517340397671]
We consider the two-sample testing problem in a new scenario where sample measurements are inexpensive to access.
We devise the first emphactiveNIST-sample testing framework that not only sequentially but also emphactively queries.
In practice, we introduce an instantiation of our framework and evaluate it using several experiments.
arXiv Detail & Related papers (2023-01-30T02:23:49Z) - Testing High-dimensional Multinomials with Applications to Text Analysis [9.952321247299336]
A test statistic is shown to have a standard normal distribution under the null.
The proposed test is shown to achieve this optimal detection boundary across the entire parameter space of interest.
arXiv Detail & Related papers (2023-01-03T22:29:44Z) - MAUVE Scores for Generative Models: Theory and Practice [95.86006777961182]
We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images.
We find that MAUVE can quantify the gaps between the distributions of human-written text and those of modern neural language models.
We demonstrate in the vision domain that MAUVE can identify known properties of generated images on par with or better than existing metrics.
arXiv Detail & Related papers (2022-12-30T07:37:40Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - Toward the Understanding of Deep Text Matching Models for Information
Retrieval [72.72380690535766]
This paper aims at testing whether existing deep text matching methods satisfy some fundamental gradients in information retrieval.
Specifically, four attributions are used in our study, i.e., term frequency constraint, term discrimination constraint, length normalization constraints, and TF-length constraint.
Experimental results on LETOR 4.0 and MS Marco show that all the investigated deep text matching methods satisfy the above constraints with high probabilities in statistics.
arXiv Detail & Related papers (2021-08-16T13:33:15Z) - A Statistical Analysis of Summarization Evaluation Metrics using
Resampling Methods [60.04142561088524]
We find that the confidence intervals are rather wide, demonstrating high uncertainty in how reliable automatic metrics truly are.
Although many metrics fail to show statistical improvements over ROUGE, two recent works, QAEval and BERTScore, do in some evaluation settings.
arXiv Detail & Related papers (2021-03-31T18:28:14Z) - Significance tests of feature relevance for a blackbox learner [6.72450543613463]
We derive two consistent tests for the feature relevance of a blackbox learner.
The first evaluates a loss difference with perturbation on an inference sample.
The second splits the inference sample into two but does not require data perturbation.
arXiv Detail & Related papers (2021-03-02T00:59:19Z) - Neural Deepfake Detection with Factual Structure of Text [78.30080218908849]
We propose a graph-based model for deepfake detection of text.
Our approach represents the factual structure of a given document as an entity graph.
Our model can distinguish the difference in the factual structure between machine-generated text and human-written text.
arXiv Detail & Related papers (2020-10-15T02:35:31Z) - Two-Sample Testing on Ranked Preference Data and the Role of Modeling
Assumptions [57.77347280992548]
In this paper, we design two-sample tests for pairwise comparison data and ranking data.
Our test requires essentially no assumptions on the distributions.
By applying our two-sample test on real-world pairwise comparison data, we conclude that ratings and rankings provided by people are indeed distributed differently.
arXiv Detail & Related papers (2020-06-21T20:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.