Characterizing Latent Perspectives of Media Houses Towards Public
Figures
- URL: http://arxiv.org/abs/2309.06112v1
- Date: Tue, 12 Sep 2023 10:27:39 GMT
- Title: Characterizing Latent Perspectives of Media Houses Towards Public
Figures
- Authors: Sharath Srivatsa, Srinath Srinivasa
- Abstract summary: This work proposes a zero-shot approach for non-extractive or generative characterizations of person entities from a corpus using GPT-2.
We use well-articulated articles from several well-known news media houses as a corpus to build a sound argument for this approach.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Media houses reporting on public figures, often come with their own biases
stemming from their respective worldviews. A characterization of these
underlying patterns helps us in better understanding and interpreting news
stories. For this, we need diverse or subjective summarizations, which may not
be amenable for classifying into predefined class labels. This work proposes a
zero-shot approach for non-extractive or generative characterizations of person
entities from a corpus using GPT-2. We use well-articulated articles from
several well-known news media houses as a corpus to build a sound argument for
this approach. First, we fine-tune a GPT-2 pre-trained language model with a
corpus where specific person entities are characterized. Second, we further
fine-tune this with demonstrations of person entity characterizations, created
from a corpus of programmatically constructed characterizations. This twice
fine-tuned model is primed with manual prompts consisting of entity names that
were not previously encountered in the second fine-tuning, to generate a simple
sentence about the entity. The results were encouraging, when compared against
actual characterizations from the corpus.
Related papers
- PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Reweighting Strategy based on Synthetic Data Identification for Sentence
Similarity [30.647497555295974]
We train a classifier that identifies machine-written sentences, and observe that the linguistic features of the sentences identified as written by a machine are significantly different from those of human-written sentences.
The distilled information from the classifier is then used to train a reliable sentence embedding model.
Our model trained on synthetic data generalizes well and outperforms the existing baselines.
arXiv Detail & Related papers (2022-08-29T05:42:22Z) - Zero-shot Entity and Tweet Characterization with Designed Conditional
Prompts and Contexts [6.38674533060275]
We evaluate the zero-shot language model capabilities of Generative Pretrained Transformer 2 (GPT-2) to characterize and Tweets subjectively.
We fine-tune GPT-2 with a Tweets corpus from a few popular hashtags and evaluate characterizing tweets by priming the language model with prefixes, questions, and contextual synopsis prompts.
arXiv Detail & Related papers (2022-04-18T17:01:49Z) - Explaining Latent Representations with a Corpus of Examples [72.50996504722293]
We propose SimplEx: a user-centred method that provides example-based explanations with reference to a freely selected set of examples.
SimplEx uses the corpus to improve the user's understanding of the latent space with post-hoc explanations.
We show that SimplEx empowers the user by highlighting relevant patterns in the corpus that explain model representations.
arXiv Detail & Related papers (2021-10-28T17:59:06Z) - On The Ingredients of an Effective Zero-shot Semantic Parser [95.01623036661468]
We analyze zero-shot learning by paraphrasing training examples of canonical utterances and programs from a grammar.
We propose bridging these gaps using improved grammars, stronger paraphrasers, and efficient learning methods.
Our model achieves strong performance on two semantic parsing benchmarks (Scholar, Geo) with zero labeled data.
arXiv Detail & Related papers (2021-10-15T21:41:16Z) - Cross-linguistically Consistent Semantic and Syntactic Annotation of Child-directed Speech [27.657676278734534]
This paper proposes a methodology for constructing such corpora of child directed speech paired with sentential logical forms.
The approach enforces a cross-linguistically consistent representation, building on recent advances in dependency representation and semantic parsing.
arXiv Detail & Related papers (2021-09-22T18:17:06Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Zero-shot topic generation [10.609815608017065]
We present an approach to generating topics using a model trained only for document title generation.
We leverage features that capture the relevance of a candidate span in a document for the generation of a title for that document.
The output is a weighted collection of the phrases that are most relevant for describing the document and distinguishing it within a corpus.
arXiv Detail & Related papers (2020-04-29T04:39:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.