Introducing Rhetorical Parallelism Detection: A New Task with Datasets,
Metrics, and Baselines
- URL: http://arxiv.org/abs/2312.00100v1
- Date: Thu, 30 Nov 2023 15:24:57 GMT
- Title: Introducing Rhetorical Parallelism Detection: A New Task with Datasets,
Metrics, and Baselines
- Authors: Stephen Bothwell, Justin DeBenedetto, Theresa Crnkovich, Hildegund
M\"uller, David Chiang
- Abstract summary: parallelism$ is the juxtaposition of phrases which have the same sequence of linguistic features.
Despite the ubiquity of parallelism, the field of natural language processing has seldom investigated it.
We construct a formal definition of it; we provide one new Latin dataset and one adapted Chinese dataset for it; we establish a family of metrics to evaluate performance on it.
- Score: 8.405938712823565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rhetoric, both spoken and written, involves not only content but also style.
One common stylistic tool is $\textit{parallelism}$: the juxtaposition of
phrases which have the same sequence of linguistic ($\textit{e.g.}$,
phonological, syntactic, semantic) features. Despite the ubiquity of
parallelism, the field of natural language processing has seldom investigated
it, missing a chance to better understand the nature of the structure, meaning,
and intent that humans convey. To address this, we introduce the task of
$\textit{rhetorical parallelism detection}$. We construct a formal definition
of it; we provide one new Latin dataset and one adapted Chinese dataset for it;
we establish a family of metrics to evaluate performance on it; and, lastly, we
create baseline systems and novel sequence labeling schemes to capture it. On
our strictest metric, we attain $F_{1}$ scores of $0.40$ and $0.43$ on our
Latin and Chinese datasets, respectively.
Related papers
- A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - OYXOY: A Modern NLP Test Suite for Modern Greek [2.059776592203642]
This paper serves as a foundational step towards the development of a linguistically motivated evaluation suite for Greek NLP.
We introduce four expert-verified evaluation tasks, specifically targeted at natural language inference, word sense disambiguation and metaphor detection.
More than language-resourced replicas of existing tasks, we contribute two innovations which will resonate with the broader resource and evaluation community.
arXiv Detail & Related papers (2023-09-13T15:00:56Z) - Sinhala-English Parallel Word Dictionary Dataset [0.554780083433538]
We introduce three parallel English-Sinhala word dictionaries (En-Si-dict-large, En-Si-dict-filtered, En-Si-dict-FastText) which help in multilingual Natural Language Processing (NLP) tasks related to English and Sinhala languages.
arXiv Detail & Related papers (2023-08-04T10:21:35Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Don't Take This Out of Context! On the Need for Contextual Models and
Evaluations for Stylistic Rewriting [29.983234538677543]
We introduce a new composite contextual evaluation metric $textttCtxSimFit$ that combines similarity to the original sentence with contextual cohesiveness.
Our experiments show that humans significantly prefer contextual rewrites as more fitting and natural over non-contextual ones.
arXiv Detail & Related papers (2023-05-24T05:58:17Z) - More Than Words: Collocation Tokenization for Latent Dirichlet
Allocation Models [71.42030830910227]
We propose a new metric for measuring the clustering quality in settings where the models differ.
We show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those unmerged models.
arXiv Detail & Related papers (2021-08-24T14:08:19Z) - A Case Study of Spanish Text Transformations for Twitter Sentiment
Analysis [1.9694608733361543]
Sentiment analysis is a text mining task that determines the polarity of a given text, i.e., its positiveness or negativeness.
New forms of textual expressions present new challenges to analyze text given the use of slang, orthographic and grammatical errors.
arXiv Detail & Related papers (2021-06-03T17:24:31Z) - An In-depth Study on Internal Structure of Chinese Words [34.864343591706984]
This work proposes to model the deep internal structures of Chinese words as dependency trees with 11 labels for distinguishing syntactic relationships.
We manually annotate a word-internal structure treebank (WIST) consisting of over 30K multi-char words from Chinese Penn Treebank.
We present detailed and interesting analysis on WIST to reveal insights on Chinese word formation.
arXiv Detail & Related papers (2021-06-01T09:09:51Z) - Word Alignment by Fine-tuning Embeddings on Parallel Corpora [96.28608163701055]
Word alignment over parallel corpora has a wide variety of applications, including learning translation lexicons, cross-lingual transfer of language processing tools, and automatic evaluation or analysis of translation outputs.
Recently, other work has demonstrated that pre-trained contextualized word embeddings derived from multilingually trained language models (LMs) prove an attractive alternative, achieving competitive results on the word alignment task even in the absence of explicit training on parallel data.
In this paper, we examine methods to marry the two approaches: leveraging pre-trained LMs but fine-tuning them on parallel text with objectives designed to improve alignment quality, and proposing
arXiv Detail & Related papers (2021-01-20T17:54:47Z) - Words aren't enough, their order matters: On the Robustness of Grounding
Visual Referring Expressions [87.33156149634392]
We critically examine RefCOg, a standard benchmark for visual referring expression recognition.
We show that 83.7% of test instances do not require reasoning on linguistic structure.
We propose two methods, one based on contrastive learning and the other based on multi-task learning, to increase the robustness of ViLBERT.
arXiv Detail & Related papers (2020-05-04T17:09:15Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.