Embedding Style Beyond Topics: Analyzing Dispersion Effects Across Different Language Models
- URL: http://arxiv.org/abs/2501.00828v1
- Date: Wed, 01 Jan 2025 13:17:16 GMT
- Title: Embedding Style Beyond Topics: Analyzing Dispersion Effects Across Different Language Models
- Authors: Benjamin Icard, Evangelia Zve, Lila Sainero, Alice Breton, Jean-Gabriel Ganascia,
- Abstract summary: This study examines the role of writing style in shaping embedding spaces.
Using a literary corpus that alternates between topics and styles, we compare the sensitivity of language models across French and English.
- Score: 0.0699049312989311
- License:
- Abstract: This paper analyzes how writing style affects the dispersion of embedding vectors across multiple, state-of-the-art language models. While early transformer models primarily aligned with topic modeling, this study examines the role of writing style in shaping embedding spaces. Using a literary corpus that alternates between topics and styles, we compare the sensitivity of language models across French and English. By analyzing the particular impact of style on embedding dispersion, we aim to better understand how language models process stylistic information, contributing to their overall interpretability.
Related papers
- Style Extraction on Text Embeddings Using VAE and Parallel Dataset [1.8067835669244101]
The study aims to detect and analyze stylistic variations between translations using a Variational Autoencoder (VAE) model.
The results demonstrate that each translation exhibits a unique stylistic distribution, which can be effectively identified using the VAE model.
The study highlights the model's potential for broader applications in AI-based text generation and stylistic analysis.
arXiv Detail & Related papers (2025-02-12T00:24:28Z) - Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models [65.82564074712836]
We introduce DIFfusionHOI, a new HOI detector shedding light on text-to-image diffusion models.
We first devise an inversion-based strategy to learn the expression of relation patterns between humans and objects in embedding space.
These learned relation embeddings then serve as textual prompts, to steer diffusion models generate images that depict specific interactions.
arXiv Detail & Related papers (2024-10-26T12:00:33Z) - Examining Language Modeling Assumptions Using an Annotated Literary Dialect Corpus [0.0]
We present a dataset of 19th century American literary orthovariant tokens with a novel layer of human-annotated dialect group tags.
We find indications that the "dialect effect" produced by intentional orthographic variation employs multiple linguistic channels.
arXiv Detail & Related papers (2024-10-03T16:58:21Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Distinguishing Fictional Voices: a Study of Authorship Verification
Models for Quotation Attribution [12.300285585201767]
We explore stylistic representations of characters built by encoding their quotes with off-the-shelf pretrained Authorship Verification models.
Results suggest that the combination of stylistic and topical information captured in some of these models accurately distinguish characters among each other, but does not necessarily improve over semantic-only models when attributing quotes.
arXiv Detail & Related papers (2024-01-30T12:49:40Z) - The Scenario Refiner: Grounding subjects in images at the morphological
level [2.401993998791928]
We ask whether Vision and Language (V&L) models capture such distinctions at the morphological level.
We compare the results from V&L models to human judgements and find that models' predictions differ from those of human participants.
arXiv Detail & Related papers (2023-09-20T12:23:06Z) - Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics.
Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding.
We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Probing Contextual Language Models for Common Ground with Visual
Representations [76.05769268286038]
We design a probing model that evaluates how effective are text-only representations in distinguishing between matching and non-matching visual representations.
Our findings show that language representations alone provide a strong signal for retrieving image patches from the correct object categories.
Visually grounded language models slightly outperform text-only language models in instance retrieval, but greatly under-perform humans.
arXiv Detail & Related papers (2020-05-01T21:28:28Z) - Evaluating Transformer-Based Multilingual Text Classification [55.53547556060537]
We argue that NLP tools perform unequally across languages with different syntactic and morphological structures.
We calculate word order and morphological similarity indices to aid our empirical study.
arXiv Detail & Related papers (2020-04-29T03:34:53Z) - Sentiment Analysis with Contextual Embeddings and Self-Attention [3.0079490585515343]
In natural language the intended meaning of a word or phrase is often implicit and depends on the context.
We propose a simple yet effective method for sentiment analysis using contextual embeddings and a self-attention mechanism.
The experimental results for three languages, including morphologically rich Polish and German, show that our model is comparable to or even outperforms state-of-the-art models.
arXiv Detail & Related papers (2020-03-12T02:19:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.