Clustering and Network Analysis for the Embedding Spaces of Sentences
and Sub-Sentences
- URL: http://arxiv.org/abs/2110.00697v1
- Date: Sat, 2 Oct 2021 00:47:35 GMT
- Title: Clustering and Network Analysis for the Embedding Spaces of Sentences
and Sub-Sentences
- Authors: Yuan An and Alexander Kalinowski and Jane Greenberg
- Abstract summary: This paper reports research on a set of comprehensive clustering and network analyses targeting sentence and sub-sentence embedding spaces.
Results show that one method generates the most clusterable embeddings.
In general, the embeddings of span sub-sentences have better clustering properties than the original sentences.
- Score: 69.3939291118954
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sentence embedding methods offer a powerful approach for working with short
textual constructs or sequences of words. By representing sentences as dense
numerical vectors, many natural language processing (NLP) applications have
improved their performance. However, relatively little is understood about the
latent structure of sentence embeddings. Specifically, research has not
addressed whether the length and structure of sentences impact the sentence
embedding space and topology. This paper reports research on a set of
comprehensive clustering and network analyses targeting sentence and
sub-sentence embedding spaces. Results show that one method generates the most
clusterable embeddings. In general, the embeddings of span sub-sentences have
better clustering properties than the original sentences. The results have
implications for future sentence embedding models and applications.
Related papers
- Span-Aggregatable, Contextualized Word Embeddings for Effective Phrase Mining [0.22499166814992438]
We show that when target phrases reside inside noisy context, representing the full sentence with a single dense vector is not sufficient for effective phrase retrieval.
We show that this technique is much more effective for phrase mining, yet requires considerable compute to obtain useful span representations.
arXiv Detail & Related papers (2024-05-12T12:08:05Z) - Hierarchical Indexing for Retrieval-Augmented Opinion Summarization [60.5923941324953]
We propose a method for unsupervised abstractive opinion summarization that combines the attributability and scalability of extractive approaches with the coherence and fluency of Large Language Models (LLMs)
Our method, HIRO, learns an index structure that maps sentences to a path through a semantically organized discrete hierarchy.
At inference time, we populate the index and use it to identify and retrieve clusters of sentences containing popular opinions from input reviews.
arXiv Detail & Related papers (2024-03-01T10:38:07Z) - Bridging Continuous and Discrete Spaces: Interpretable Sentence
Representation Learning via Compositional Operations [80.45474362071236]
It is unclear whether the compositional semantics of sentences can be directly reflected as compositional operations in the embedding space.
We propose InterSent, an end-to-end framework for learning interpretable sentence embeddings.
arXiv Detail & Related papers (2023-05-24T00:44:49Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Extending Multi-Sense Word Embedding to Phrases and Sentences for
Unsupervised Semantic Applications [34.71597411512625]
We propose a novel embedding method for a text sequence (a phrase or a sentence) where each sequence is represented by a distinct set of codebook embeddings.
Our experiments show that the per-sentence codebook embeddings significantly improve the performances in unsupervised sentence similarity and extractive summarization benchmarks.
arXiv Detail & Related papers (2021-03-29T04:54:28Z) - Narrative Incoherence Detection [76.43894977558811]
We propose the task of narrative incoherence detection as a new arena for inter-sentential semantic understanding.
Given a multi-sentence narrative, decide whether there exist any semantic discrepancies in the narrative flow.
arXiv Detail & Related papers (2020-12-21T07:18:08Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.