Discourse Context Predictability Effects in Hindi Word Order
- URL: http://arxiv.org/abs/2210.13940v1
- Date: Tue, 25 Oct 2022 11:53:01 GMT
- Title: Discourse Context Predictability Effects in Hindi Word Order
- Authors: Sidharth Ranjan, Marten van Schijndel, Sumeet Agarwal, Rajakrishnan
Rajkumar
- Abstract summary: We investigate how the words and syntactic structures in a sentence influence the word order of the following sentences.
We use a number of discourse-based features and cognitive features to make its predictions, including dependency length, surprisal, and information status.
We find that information status and LSTM-based discourse predictability influence word order choices, especially for non-canonical object-fronted orders.
- Score: 14.88833412862455
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We test the hypothesis that discourse predictability influences Hindi
syntactic choice. While prior work has shown that a number of factors (e.g.,
information status, dependency length, and syntactic surprisal) influence Hindi
word order preferences, the role of discourse predictability is underexplored
in the literature. Inspired by prior work on syntactic priming, we investigate
how the words and syntactic structures in a sentence influence the word order
of the following sentences. Specifically, we extract sentences from the
Hindi-Urdu Treebank corpus (HUTB), permute the preverbal constituents of those
sentences, and build a classifier to predict which sentences actually occurred
in the corpus against artificially generated distractors. The classifier uses a
number of discourse-based features and cognitive features to make its
predictions, including dependency length, surprisal, and information status. We
find that information status and LSTM-based discourse predictability influence
word order choices, especially for non-canonical object-fronted orders. We
conclude by situating our results within the broader syntactic priming
literature.
Related papers
- Surprise! Uniform Information Density Isn't the Whole Story: Predicting Surprisal Contours in Long-form Discourse [54.08750245737734]
We propose that speakers modulate information rate based on location within a hierarchically-structured model of discourse.
We find that hierarchical predictors are significant predictors of a discourse's information contour and that deeply nested hierarchical predictors are more predictive than shallow ones.
arXiv Detail & Related papers (2024-10-21T14:42:37Z) - Does Dependency Locality Predict Non-canonical Word Order in Hindi? [5.540151072128081]
dependency length minimization is a significant predictor of non-canonical (OSV) syntactic choices.
discourse predictability emerges as the primary determinant of constituent-order preferences.
This work sheds light on the role of expectation adaptation in word-ordering decisions.
arXiv Detail & Related papers (2024-05-13T13:24:17Z) - Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves.
We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features.
Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z) - Investigating the Utility of Surprisal from Large Language Models for
Speech Synthesis Prosody [4.081433571732691]
This paper investigates the use of word surprisal, a measure of the predictability of a word in a given context, as a feature to aid speech prosody synthesis.
We conduct experiments using a large corpus of English text and large language models (LLMs) of varying sizes.
We find that word surprisal and word prominence are moderately correlated, suggesting that they capture related but distinct aspects of language use.
arXiv Detail & Related papers (2023-06-16T12:49:44Z) - Neighboring Words Affect Human Interpretation of Saliency Explanations [65.29015910991261]
Word-level saliency explanations are often used to communicate feature-attribution in text-based models.
Recent studies found that superficial factors such as word length can distort human interpretation of the communicated saliency scores.
We investigate how the marking of a word's neighboring words affect the explainee's perception of the word's importance in the context of a saliency explanation.
arXiv Detail & Related papers (2023-05-04T09:50:25Z) - Dual Mechanism Priming Effects in Hindi Word Order [14.88833412862455]
We test the hypothesis that priming is driven by multiple different sources.
We permute the preverbal constituents of corpus sentences, and then use a logistic regression model to predict which sentences actually occurred in the corpus.
By showing that different priming influences are separable from one another, our results support the hypothesis that multiple different cognitive mechanisms underlie priming.
arXiv Detail & Related papers (2022-10-25T11:49:22Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - The Causal Structure of Semantic Ambiguities [0.0]
We identify two features: (1) joint plausibility degrees of different possible interpretations, and (2) causal structures according to which certain words play a more substantial role in the processes.
We applied this theory to a dataset of ambiguous phrases extracted from Psycholinguistics literature and their human plausibility collected by us.
arXiv Detail & Related papers (2022-06-14T12:56:34Z) - Clustering and Network Analysis for the Embedding Spaces of Sentences
and Sub-Sentences [69.3939291118954]
This paper reports research on a set of comprehensive clustering and network analyses targeting sentence and sub-sentence embedding spaces.
Results show that one method generates the most clusterable embeddings.
In general, the embeddings of span sub-sentences have better clustering properties than the original sentences.
arXiv Detail & Related papers (2021-10-02T00:47:35Z) - Exploring Discourse Structures for Argument Impact Classification [48.909640432326654]
This paper empirically shows that the discourse relations between two arguments along the context path are essential factors for identifying the persuasive power of an argument.
We propose DisCOC to inject and fuse the sentence-level structural information with contextualized features derived from large-scale language models.
arXiv Detail & Related papers (2021-06-02T06:49:19Z) - Characterizing the Effect of Sentence Context on Word Meanings: Mapping
Brain to Behavior [0.0]
This paper aims to answer whether the subjects are aware of such changes and agree with them.
Subjects were asked to judge how the word change from their generic meaning when the words were used in specific sentences.
Results support the hypothesis that word meaning change systematically depending on sentence context.
arXiv Detail & Related papers (2020-07-27T20:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.