''You should probably read this'': Hedge Detection in Text
- URL: http://arxiv.org/abs/2405.13319v1
- Date: Wed, 22 May 2024 03:25:35 GMT
- Title: ''You should probably read this'': Hedge Detection in Text
- Authors: Denys Katerenchuk, Rivka Levitan,
- Abstract summary: Humans express ideas, beliefs, and statements through language.
In this work, we apply a joint model that leverages words and part-of-speech tags to improve hedge detection in text and achieve a new top score on the CoNLL-2010 Wikipedia corpus.
- Score: 8.890331069484203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans express ideas, beliefs, and statements through language. The manner of expression can carry information indicating the author's degree of confidence in their statement. Understanding the certainty level of a claim is crucial in areas such as medicine, finance, engineering, and many others where errors can lead to disastrous results. In this work, we apply a joint model that leverages words and part-of-speech tags to improve hedge detection in text and achieve a new top score on the CoNLL-2010 Wikipedia corpus.
Related papers
- Can Large Language Models (or Humans) Disentangle Text? [6.858838842613459]
We investigate the potential of large language models (LLMs) to disentangle text variables.
We employ a range of various LLM approaches in an attempt to disentangle text by identifying and removing information about a target variable.
We show that in the strong test of removing sentiment, the statistical association between the processed text and sentiment is still detectable to machine learning classifiers.
arXiv Detail & Related papers (2024-03-25T09:51:54Z) - Document Author Classification Using Parsed Language Structure [0.0]
We explore a new possibility for detecting authorship using grammatical structure extracted using a statistical natural language.
This paper provides a proof of concept, testing author classification based on grammatical structure on a set of "proof texts"
Several features extracted from the statistical natural language were explored: all subtrees of some depth from any level; rooted subtrees of some depth, part of speech, and part of speech by level in the parse tree.
arXiv Detail & Related papers (2024-03-20T02:32:24Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Constructing Vec-tionaries to Extract Message Features from Texts: A
Case Study of Moral Appeals [5.336592570916432]
We present an approach to construct vec-tionary measurement tools that boost validated dictionaries with word embeddings.
A vec-tionary can produce additional metrics to capture the ambivalence of a message feature beyond its strength in texts.
arXiv Detail & Related papers (2023-12-10T20:37:29Z) - Towards Unsupervised Recognition of Token-level Semantic Differences in
Related Documents [61.63208012250885]
We formulate recognizing semantic differences as a token-level regression task.
We study three unsupervised approaches that rely on a masked language model.
Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels.
arXiv Detail & Related papers (2023-05-22T17:58:04Z) - Contextual information integration for stance detection via
cross-attention [59.662413798388485]
Stance detection deals with identifying an author's stance towards a target.
Most existing stance detection models are limited because they do not consider relevant contextual information.
We propose an approach to integrate contextual information as text.
arXiv Detail & Related papers (2022-11-03T15:04:29Z) - Accuracy of the Uzbek stop words detection: a case study on "School
corpus" [0.0]
We present a method to evaluate the quality of a list of stop words aimed at automatically creating techniques.
The method was tested on an automatically-generated list of stop words for the Uzbek language.
arXiv Detail & Related papers (2022-09-15T05:14:31Z) - Bridging the Modality Gap for Speech-to-Text Translation [57.47099674461832]
End-to-end speech translation aims to translate speech in one language into text in another language via an end-to-end way.
Most existing methods employ an encoder-decoder structure with a single encoder to learn acoustic representation and semantic information simultaneously.
We propose a Speech-to-Text Adaptation for Speech Translation model which aims to improve the end-to-end model performance by bridging the modality gap between speech and text.
arXiv Detail & Related papers (2020-10-28T12:33:04Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z) - Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning [29.181547214915238]
We show that an attacker can control the "meaning" of new and existing words by changing their locations in the embedding space.
An attack on the embedding can affect diverse downstream tasks, demonstrating for the first time the power of data poisoning in transfer learning scenarios.
arXiv Detail & Related papers (2020-01-14T17:48:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.