Text and author-level political inference using heterogeneous knowledge
representations
- URL: http://arxiv.org/abs/2206.12293v1
- Date: Fri, 24 Jun 2022 13:45:36 GMT
- Title: Text and author-level political inference using heterogeneous knowledge
representations
- Authors: Samuel Caetano da Silva and Ivandre Paraboni
- Abstract summary: Inference of politically-charged information from text data is a popular research topic in Natural Language Processing (NLP)
The present work describes a series of experiments to compare alternative model configurations for political inference from text in both English and Portuguese languages.
Results suggest certain text representations may outperform the alternatives across multiple experimental settings.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The inference of politically-charged information from text data is a popular
research topic in Natural Language Processing (NLP) at both text- and
author-level. In recent years, studies of this kind have been implemented with
the aid of representations from transformers such as BERT. Despite considerable
success, however, we may ask whether results may be improved even further by
combining transformed-based models with additional knowledge representations.
To shed light on this issue, the present work describes a series of experiments
to compare alternative model configurations for political inference from text
in both English and Portuguese languages. Results suggest that certain text
representations - in particular, the combined use of BERT pre-trained language
models with a syntactic dependency model - may outperform the alternatives
across multiple experimental settings, making a potentially strong case for
further research in the use of heterogeneous text representations in these and
possibly other NLP tasks.
Related papers
- Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP.
We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z) - Optimizing text representations to capture (dis)similarity between
political parties [1.2891210250935146]
We look at the problem of modeling pairwise similarities between political parties.
Our research question is what level of structural information is necessary to create robust text representation.
We evaluate our models on the manifestos of German parties for the 2021 federal election.
arXiv Detail & Related papers (2022-10-21T14:24:57Z) - Syntax-informed Question Answering with Heterogeneous Graph Transformer [2.139714421848487]
We present a linguistics-informed question answering approach that extends and fine-tunes a pre-trained neural language model.
We illustrate the approach by the addition of syntactic information in the form of dependency and constituency graphic structures connecting tokens and virtual tokens.
arXiv Detail & Related papers (2022-04-01T07:48:03Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Ground-Truth, Whose Truth? -- Examining the Challenges with Annotating
Toxic Text Datasets [26.486492641924226]
This study examines selected toxic text datasets with the goal of shedding light on some of the inherent issues.
We re-annotate samples from three toxic text datasets and find that a multi-label approach to annotating toxic text samples can help to improve dataset quality.
arXiv Detail & Related papers (2021-12-07T06:58:22Z) - Intrinsic Probing through Dimension Selection [69.52439198455438]
Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks.
Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it.
In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted.
arXiv Detail & Related papers (2020-10-06T15:21:08Z) - Contextualized Spoken Word Representations from Convolutional
Autoencoders [2.28438857884398]
This paper proposes a Convolutional Autoencoder based neural architecture to model syntactically and semantically adequate contextualized representations of varying length spoken words.
The proposed model was able to demonstrate its robustness when compared to the other two language-based models.
arXiv Detail & Related papers (2020-07-06T16:48:11Z) - Coreferential Reasoning Learning for Language Representation [88.14248323659267]
We present CorefBERT, a novel language representation model that can capture the coreferential relations in context.
The experimental results show that, compared with existing baseline models, CorefBERT can achieve significant improvements consistently on various downstream NLP tasks.
arXiv Detail & Related papers (2020-04-15T03:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.