Text Encoders Lack Knowledge: Leveraging Generative LLMs for
Domain-Specific Semantic Textual Similarity
- URL: http://arxiv.org/abs/2309.06541v1
- Date: Tue, 12 Sep 2023 19:32:45 GMT
- Title: Text Encoders Lack Knowledge: Leveraging Generative LLMs for
Domain-Specific Semantic Textual Similarity
- Authors: Joseph Gatto, Omar Sharif, Parker Seegmiller, Philip Bohlman, Sarah
Masud Preum
- Abstract summary: We show that semantic textual similarity (STS) can be cast as a text generation problem while maintaining strong performance on multiple benchmarks.
We show generative LLMs significantly outperform existing encoder-based STS models when characterizing the semantic similarity between two texts.
Our results suggest generative language models with STS-specific prompting strategies achieve state-of-the-art performance in complex, domain-specific STS tasks.
- Score: 2.861144046639872
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Amidst the sharp rise in the evaluation of large language models (LLMs) on
various tasks, we find that semantic textual similarity (STS) has been
under-explored. In this study, we show that STS can be cast as a text
generation problem while maintaining strong performance on multiple STS
benchmarks. Additionally, we show generative LLMs significantly outperform
existing encoder-based STS models when characterizing the semantic similarity
between two texts with complex semantic relationships dependent on world
knowledge. We validate this claim by evaluating both generative LLMs and
existing encoder-based STS models on three newly collected STS challenge sets
which require world knowledge in the domains of Health, Politics, and Sports.
All newly collected data is sourced from social media content posted after May
2023 to ensure the performance of closed-source models like ChatGPT cannot be
credited to memorization. Our results show that, on average, generative LLMs
outperform the best encoder-only baselines by an average of 22.3% on STS tasks
requiring world knowledge. Our results suggest generative language models with
STS-specific prompting strategies achieve state-of-the-art performance in
complex, domain-specific STS tasks.
Related papers
- Integrating Planning into Single-Turn Long-Form Text Generation [66.08871753377055]
We propose to use planning to generate long form content.
Our main novelty lies in a single auxiliary task that does not require multiple rounds of prompting or planning.
Our experiments demonstrate on two datasets from different domains, that LLMs fine-tuned with the auxiliary task generate higher quality documents.
arXiv Detail & Related papers (2024-10-08T17:02:40Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - Peering into the Mind of Language Models: An Approach for Attribution in Contextual Question Answering [9.86691461253151]
We introduce a novel method for attribution in contextual question answering, leveraging the hidden state representations of large language models (LLMs)
Our approach bypasses the need for extensive model retraining and retrieval model overhead, offering granular attributions and preserving the quality of generated answers.
We present Verifiability-granular, an attribution dataset which has token level annotations for LLM generations in the contextual question answering setup.
arXiv Detail & Related papers (2024-05-28T09:12:44Z) - Exploration of Masked and Causal Language Modelling for Text Generation [6.26998839917804]
This paper conducts an extensive comparison of Causal Language Modelling approaches for text generation tasks.
We first employ quantitative metrics and then perform a qualitative human evaluation to analyse coherence and grammatical correctness.
The results show that consistently outperforms CLM in text generation across all datasets.
arXiv Detail & Related papers (2024-05-21T09:33:31Z) - Enhancing LLM-based Test Generation for Hard-to-Cover Branches via Program Analysis [8.31978033489419]
We propose TELPA, a novel technique to generate tests that can reach hard-to-cover branches.
Our experimental results on 27 open-source Python projects demonstrate that TELPA significantly outperforms the state-of-the-art SBST and LLM-based techniques.
arXiv Detail & Related papers (2024-04-07T14:08:28Z) - SEED-Bench-2: Benchmarking Multimodal Large Language Models [67.28089415198338]
Multimodal large language models (MLLMs) have recently demonstrated exceptional capabilities in generating not only texts but also images given interleaved multimodal inputs.
SEED-Bench-2 comprises 24K multiple-choice questions with accurate human annotations, which spans 27 dimensions.
We evaluate the performance of 23 prominent open-source MLLMs and summarize valuable observations.
arXiv Detail & Related papers (2023-11-28T05:53:55Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning [63.80739044622555]
We introduce MuSR, a dataset for evaluating language models on soft reasoning tasks specified in a natural language narrative.
This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm.
Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning.
arXiv Detail & Related papers (2023-10-24T17:59:20Z) - AnglE-optimized Text Embeddings [4.545354973721937]
This paper proposes a novel angle-optimized text embedding model called AnglE.
The core idea of AnglE is to introduce angle optimization in a complex space.
Extensive experiments were conducted on various tasks including short-text STS, long-text STS, and domain-specific STS tasks.
arXiv Detail & Related papers (2023-09-22T13:52:42Z) - Measuring Reliability of Large Language Models through Semantic
Consistency [3.4990427823966828]
We develop a measure of semantic consistency that allows the comparison of open-ended text outputs.
We implement several versions of this consistency metric to evaluate the performance of a number of PLMs on paraphrased versions of questions.
arXiv Detail & Related papers (2022-11-10T20:21:07Z) - KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation [100.79870384880333]
We propose a knowledge-grounded pre-training (KGPT) to generate knowledge-enriched text.
We adopt three settings, namely fully-supervised, zero-shot, few-shot to evaluate its effectiveness.
Under zero-shot setting, our model achieves over 30 ROUGE-L on WebNLG while all other baselines fail.
arXiv Detail & Related papers (2020-10-05T19:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.